Feature #18033
closedTime.new to parse a string
Description
Make Time.new
parse Time#inspect
and ISO-8601 like strings.
-
Time.iso8601
andTime.parse
need an extension library,date
. -
Time.iso8601
can't parseTime#inspect
string. -
Time.parse
often results in unintentional/surprising results. -
Time.new
also about 1.9 times faster thanTime.iso8601
.$ ./ruby -rtime -rbenchmark -e ' n = 1000 s = Time.now.iso8601 Benchmark.bm(12) do |x| x.report("Time.iso8601") {n.times{Time.iso8601(s)}} x.report("Time.parse") {n.times{Time.parse(s)}} x.report("Time.new") {n.times{Time.new(s)}} end' user system total real Time.iso8601 0.006919 0.000185 0.007104 ( 0.007091) Time.parse 0.018338 0.000207 0.018545 ( 0.018590) Time.new 0.003671 0.000069 0.003740 ( 0.003741)
Updated by ioquatix (Samuel Williams) over 3 years ago
It looks like good improvement.
But does any other #new
implementation work this way? Can we fix Time#parse
? I wonder if user will be confused, should they use Time.parse
or Time.new
?
Why is Time.parse
so slow?
This seems like a good performance improvement. But rather than fixing the existing #parse
, we introduce new one. Now we move the complexity to the user.
Updated by nobu (Nobuyoshi Nakada) over 3 years ago
ioquatix (Samuel Williams) wrote in #note-2:
But does any other
#new
implementation work this way? Can we fixTime#parse
? I wonder if user will be confused, should they useTime.parse
orTime.new
?
The reason not to "fix Time#parse
" is
Time.parse
often results in unintentional/surprising results.
Updated by ioquatix (Samuel Williams) over 3 years ago
Time.parse often results in unintentional/surprising results.
Can we change it so that it doesn't return unintentional/surprising results? I realise this might be impossible without extending the interface... e.g. Time.parse(..., surprising: false)
:p
I agree clean room implementation is nice. But if Time.parse
can do unexpected things, a new interface does not fix existing usage so the problem still exists, and now we have two interfaces for doing largely the same thing - one that "works" and one that does "unintentional/surprising" things.
Updated by nobu (Nobuyoshi Nakada) over 3 years ago
ioquatix (Samuel Williams) wrote in #note-4:
Time.parse often results in unintentional/surprising results.
Can we change it so that it doesn't return unintentional/surprising results? I realise this might be impossible without extending the interface... e.g.
Time.parse(..., surprising: false)
:p
Yes, it seems by design, and changing the behavior will just break something.
I thought that the same name but different behavior method would be more confusing.
I agree clean room implementation is nice. But if
Time.parse
can do unexpected things, a new interface does not fix existing usage so the problem still exists, and now we have two interfaces for doing largely the same thing - one that "works" and one that does "unintentional/surprising" things.
A wrapper version based on the new Time.new
would be possible, I think.
Updated by byroot (Jean Boussier) over 3 years ago
user system total real
Time.iso8601 0.006919 0.000185 0.007104 ( 0.007091)
Time.parse 0.018338 0.000207 0.018545 ( 0.018590)
Time.new 0.003671 0.000069 0.003740 ( 0.003741)
This new interface being faster than Time.iso8601
at parsing iso8601
seems weird to me. When I have IS0-8601 dates (pretty much the only format I really use), I'd expect that by using the dedicated method to parse them, I get the best possible performance.
Can we optimize Time.iso8601
in the same way? Otherwise I can already see people moving to the new API and losing strictness by doing so.
Updated by mame (Yusuke Endoh) over 3 years ago
- Related to Feature #16005: A variation of Time.iso8601 that can parse yyyy-MM-dd HH:mm:ss added
Updated by ioquatix (Samuel Williams) over 3 years ago
Yes, it seems by design, and changing the behavior will just break something.
I think there is value in the following interface:
Time.parse(..., strict: true)
where we have some very clearly documented interpretation of strict
.
If we want to support multiple formatS:
Time.parse(..., format: :loose/:strict)
or Time.parse(...,format: :iso8601)
etc.
Updated by ioquatix (Samuel Williams) about 3 years ago
I checked the latest update.
I'm still not sure about Time.new(string)
parsing a string.
What about Time.parse_strict
or Time.parse_iso8601
etc?
Updated by Eregon (Benoit Daloze) about 3 years ago
Like @ioquatix (Samuel Williams) and @byroot (Jean Boussier) I think Time.new
taking a iso-like String is not a good idea.
Also it's technically incompatible (the first argument is always the year for Time.new
and it even accepts strings):
> Time.new "2234-01-01T00:00:00+07:00"
=> 2234-01-01 00:00:00 +0100
I understand Time.iso8601 and Time.parse need an extension library, date.
is annoying and not intuitive.
How about simply making Time.iso8601
a core method?
Time.iso8601 can't parse Time#inspect string.
One shouldn't parse inspect
so this should be no problem in practice.
We could add Time#iso8601
though (in core), I think that's much better than time.to_datetime.iso8601
.
Updated by byroot (Jean Boussier) about 3 years ago
If I might add a nitpick, the actual format is RFC 3339, which is pretty much a subset of ISO 8601.
But yes, +1 to making Time.iso8601
a core method.
Updated by timcraft (Tim Craft) about 3 years ago
Nice performance improvement!
In terms of the interface I think it would be confusing to make Time.new
parse a string:
- Using a
.parse
method for parsing is more conventional / less surprising / more intention revealing - Having both
.new
and.parse
would be confusing because it's not clear when to use one or the other, what the trade-offs are etc
It feels very unintiutive and not "least surprise". Similarly Time.parse(...,format: :iso8601)
and Time.parse_iso8601
feel a bit clunky.
Is this awkwardness stemming from trying to fit much functionality into the Time class? It would be more work, but if there was a dedicated ISO8601 module we could shift some responsibility away from the Time class. For example:
ISO8601::Time.parse # strictly ISO8601 parsing, returns a Time object
This could be extended to encapsulate ISO8601 parsing for other classes:
ISO8601::Date.parse
ISO8601::Duration.parse
ISO8601::TimeInterval.parse
The same general pattern could be followed for other formats:
RFC3339::Time.parse
RFC2822::Time.parse
SQL92::Time.parse
Each of those methods would have the same signature, so they could be easily interchanged to get varying degrees of strictness or performance.
From a naming and readability perspective the names are less surprising, more intention revealing, and very greppable.
The higher level Time.parse
method could potentially delegate to these methods to support parsing a wider range of formats.
Each module could also be extended to encapsulate string formatting as an alternative to adding instance methods like #iso8601, #rfc3339 etc.
An alternative to using the formats as the top level modules would be to invert and nest the modules under the Time classes, for example:
Time::ISO8601.parse # instead of ISO8601::Time.parse
Same benefits—I'm not sure which would be considered more Ruby-ish?
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
- Related to Feature #18331: Kernel.#Time added
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
I remember #18331 won't work with subclasses of Time
.
Does Time.at("2021-12-04 02:00:00")
make sense?
Updated by sawa (Tsuyoshi Sawada) about 3 years ago
nobu (Nobuyoshi Nakada) wrote in #note-15:
I remember #18331 won't work with subclasses of
Time
.
Can you elaborate on that? Is your concern about certain frameworks using time-like classes other than Time
? I don't see any problem with that. If you create a Time
object using Kernel.#Time
proposed in #18331, that can easily be converted (often without noticing) to such class in case of Ruby on Rails' ActiveSupport::TimeWithZone
.
Updated by Eregon (Benoit Daloze) about 3 years ago
Yes, Kernel#Time(str)
seems better than Time.new(str)
.
However I think being precise for parsing is key for times.
Hence, why not make Time.iso8601
fast and a core method?
Then code already using the right method would just become faster, and there is no need migrate code to use a new API.
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
sawa (Tsuyoshi Sawada) wrote in #note-16:
nobu (Nobuyoshi Nakada) wrote in #note-15:
I remember #18331 won't work with subclasses of
Time
.Can you elaborate on that? Is your concern about certain frameworks using time-like classes other than
Time
? I don't see any problem with that. If you create aTime
object usingKernel.#Time
proposed in #18331, that can easily be converted (often without noticing) to such class in case of Ruby on Rails'ActiveSupport::TimeWithZone
.
The timezone name will be converted by find_timezone
method on the class of a Time
instance, so that subclasses can override it.
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
Eregon (Benoit Daloze) wrote in #note-17:
Hence, why not make
Time.iso8601
fast and a core method?
Then code already using the right method would just become faster, and there is no need migrate code to use a new API.
First, the primary target is the result of Time#inspect
, and it is not fully compliant with ISO-8601.
Second, ISO-8601 allows many variants, but I'm not going to implement them all.
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
Eregon (Benoit Daloze) wrote in #note-10:
Also it's technically incompatible (the first argument is always the year for
Time.new
and it even accepts strings):> Time.new "2234-01-01T00:00:00+07:00" => 2234-01-01 00:00:00 +0100
Actually this code raises an ArgumentError
in the master, since https://bugs.ruby-lang.org/issues/17485#change-89871.
Updated by Eregon (Benoit Daloze) about 3 years ago
nobu (Nobuyoshi Nakada) wrote in #note-19:
First, the primary target is the result of
Time#inspect
, and it is not fully compliant with ISO-8601.
Thanks for making that clear, I wasn't sure why not just improve Time.iso8601
.
Right, they differ:
irb(main):006:0> t.inspect
=> "2021-12-06 14:10:15.334531885 +0100"
irb(main):007:0> t.iso8601
=> "2021-12-06T14:10:15+01:00"
Why we do we need to parse Time#inspect
output?
It seems in general bad to rely on inspect
for serialization, since it only works for some classes.
Maybe to preserve subseconds?
#inspect
does not feel like a proper way to serialize a Time instance, so if we want to add that I think we should have a new method for it too (maybe #dump
?).
In any case, Time.new
accepts individual time components, and I think making it parse strings is just confusing.
A new method Time.try_convert
(or Time.undump
) feels more appropriate for such parsing.
Second, ISO-8601 allows many variants, but I'm not going to implement them all.
Is that what makes it faster?
The PR still uses a Regexp so I guess the main difference is the Regexp is a little bit simpler?
They look fairly similar:
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
Eregon (Benoit Daloze) wrote in #note-21:
Why we do we need to parse
Time#inspect
output?
It seems in general bad to rely oninspect
for serialization, since it only works for some classes.
Maybe to preserve subseconds?
I wrote Time#inspect
, but the "ISO 8601-like" format is not only used by it, e.g., --date=iso
of git.
#inspect
does not feel like a proper way to serialize a Time instance, so if we want to add that I think we should have a new method for it too (maybe#dump
?).
In any case,
Time.new
accepts individual time components, and I think making it parse strings is just confusing.
How about Time.at
?
A new method
Time.try_convert
(orTime.undump
) feels more appropriate for such parsing.
Time.try_convert
feels considerable, but passing the timezone option may not fit.
Second, ISO-8601 allows many variants, but I'm not going to implement them all.
Is that what makes it faster?
Not to be faster.
ISO-8601 is a large specification and allows many omissions, which makes parsing very complex and sometimes conflicts with the spec of Time
.
The PR still uses a Regexp so I guess the main difference is the Regexp is a little bit simpler?
They look fairly similar:
That PR is abandoned, please see Time.new-string or Time.at-string.
Updated by Eregon (Benoit Daloze) about 3 years ago
nobu (Nobuyoshi Nakada) wrote in #note-22:
I wrote
Time#inspect
, but the "ISO 8601-like" format is not only used by it, e.g.,--date=iso
of git.
Interesting that git
has this too:
--date=iso (or --date=iso8601) shows timestamps in a ISO 8601-like format. The differences to the strict ISO 8601 format are:
• a space instead of the T date/time delimiter
• a space between time and time zone
• no colon between hours and minutes of the time zone
--date=iso-strict (or --date=iso8601-strict) shows timestamps in strict ISO 8601 format.
Maybe we really need a name for this. Like Time#iso
and Time.iso
.
Or it could be a keyword argument: time.iso8601(format: :git)
and Time.iso8601(str, format: :git)
.
:git
is just an example, could be textual
, relaxed
or anything.
Another idea would be to actually use the strict variant of iso8601 if that is well defined, like what git log --date=iso-strict
shows.
Then we'd have Time#iso8601_strict
/Time.iso8601_strict
, or time.iso8601(format: :strict)
and Time.iso8601(str, format: :strict)
.
That feels proper to me because then we have a clear way to serialize and deserialize times and finding the dumping/parsing method is trivial (they have the same name, no need to guess).
I feel relying on #inspect
for efficient serializing is in general bad, because that is not the main purpose of #inspect
(rather it's to show a clear/debug-like representation of an object).
How about
Time.at
?
To me Time.at
sounds like "give me a Time at epoch", so I think it's unexpected it does String parsing.
Time.try_convert
feels considerable, but passing the timezone option may not fit.
I think an extra optional timezone argument would be fine.
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
Eregon (Benoit Daloze) wrote in #note-23:
Time.try_convert
feels considerable, but passing the timezone option may not fit.I think an extra optional timezone argument would be fine.
Ah, my bad.
The try_convert
methods are the implicit conversions for duck-typing, i.e, to_int
to Integer
, to_str
to String
and so on.
As parsing like Kernel#Integer
is not an implicit conversion, Time.try_convert
can't be valid.
Updated by mame (Yusuke Endoh) about 2 years ago
Updated by nobu (Nobuyoshi Nakada) about 2 years ago
- Description updated (diff)
Updated by matz (Yukihiro Matsumoto) about 2 years ago
Accepted.
Performance improvement is a great benefit.
Both Time.iso8859
and Time.parse
requires loading date
gem, and providing those methods in the core may cause overwriting.
Currently, Time.new
only takes numeric arguments, so adding extra argument pattern add only a small complexity.
Matz.
Updated by nobu (Nobuyoshi Nakada) about 2 years ago
- Status changed from Open to Closed
Applied in changeset git|8c272f44816f098c1e057c72a47451efc8cd1739.
[Feature #18033] Make Time.new parse time strings
Time.new
now parses strings such as the result of Time#inspect
and restricted ISO-8601 formats.
Updated by Eregon (Benoit Daloze) 2 months ago
matz (Yukihiro Matsumoto) wrote in #note-27:
Currently,
Time.new
only takes numeric arguments, so adding extra argument pattern add only a small complexity.
But older versions like 3.0 accept year as a String: https://bugs.ruby-lang.org/issues/18033#note-10
So we get very weird behavior between versions:
$ ruby -ve 'p Time.new "2234-10-12T01:02:03"'
ruby 3.0.6p216 (2023-03-30 revision 23a532679b) [x86_64-linux]
2234-01-01 00:00:00 +0100
$ ruby -ve 'p Time.new "2234-10-12T01:02:03"'
ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x86_64-linux]
<internal:timev>:310:in `initialize': invalid value for Integer(): "2234-10-12T01:02:03" (ArgumentError)
from -e:1:in `new'
from -e:1:in `<main>'
$ ruby -ve 'p Time.new "2234-10-12T01:02:03"'
ruby 3.2.4 (2024-04-23 revision af471c0e01) [x86_64-linux]
2234-10-12 01:02:03 +0200
$ ruby -ve 'p Time.new "2234-10-12T01:02:03"'
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux]
2234-10-12 01:02:03 +0200
Updated by Eregon (Benoit Daloze) 2 months ago
- Related to Bug #19293: The new Time.new(String) API is nice... but we still need a stricter version of this added
Updated by Eregon (Benoit Daloze) 2 months ago
- Related to Bug #20797: UTC offset seconds part is not checked added
Updated by Eregon (Benoit Daloze) 2 months ago
- Related to Bug #17504: Allow UTC offset without colons per ISO-8601 added
Updated by Eregon (Benoit Daloze) 2 months ago
- Related to Bug #19296: Time.new's argument check is incomplete added