Project

General

Profile

Actions

Feature #18033

closed

Time.new to parse a string

Added by nobu (Nobuyoshi Nakada) over 3 years ago. Updated 2 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:104552]

Description

Make Time.new parse Time#inspect and ISO-8601 like strings.

  • Time.iso8601 and Time.parse need an extension library, date.

  • Time.iso8601 can't parse Time#inspect string.

  • Time.parse often results in unintentional/surprising results.

  • Time.new also about 1.9 times faster than Time.iso8601.

    $ ./ruby -rtime -rbenchmark -e '
    n = 1000
    s = Time.now.iso8601
    Benchmark.bm(12) do |x|
      x.report("Time.iso8601") {n.times{Time.iso8601(s)}}
      x.report("Time.parse") {n.times{Time.parse(s)}}
      x.report("Time.new") {n.times{Time.new(s)}}
    end'
                       user     system      total        real
    Time.iso8601   0.006919   0.000185   0.007104 (  0.007091)
    Time.parse     0.018338   0.000207   0.018545 (  0.018590)
    Time.new       0.003671   0.000069   0.003740 (  0.003741)
    

https://github.com/ruby/ruby/pull/4825


Related issues 6 (2 open4 closed)

Related to Ruby master - Feature #16005: A variation of Time.iso8601 that can parse yyyy-MM-dd HH:mm:ssOpenActions
Related to Ruby master - Feature #18331: Kernel.#TimeOpenActions
Related to Ruby master - Bug #19293: The new Time.new(String) API is nice... but we still need a stricter version of thisClosedActions
Related to Ruby master - Bug #20797: UTC offset seconds part is not checkedClosedActions
Related to Ruby master - Bug #17504: Allow UTC offset without colons per ISO-8601ClosedActions
Related to Ruby master - Bug #19296: Time.new's argument check is incompleteClosedActions
Actions #1

Updated by nobu (Nobuyoshi Nakada) over 3 years ago

  • Description updated (diff)

Updated by ioquatix (Samuel Williams) over 3 years ago

It looks like good improvement.

But does any other #new implementation work this way? Can we fix Time#parse? I wonder if user will be confused, should they use Time.parse or Time.new?

Why is Time.parse so slow?

This seems like a good performance improvement. But rather than fixing the existing #parse, we introduce new one. Now we move the complexity to the user.

Updated by nobu (Nobuyoshi Nakada) over 3 years ago

ioquatix (Samuel Williams) wrote in #note-2:

But does any other #new implementation work this way? Can we fix Time#parse? I wonder if user will be confused, should they use Time.parse or Time.new?

The reason not to "fix Time#parse" is

Time.parse often results in unintentional/surprising results.

Updated by ioquatix (Samuel Williams) over 3 years ago

Time.parse often results in unintentional/surprising results.

Can we change it so that it doesn't return unintentional/surprising results? I realise this might be impossible without extending the interface... e.g. Time.parse(..., surprising: false) :p

I agree clean room implementation is nice. But if Time.parse can do unexpected things, a new interface does not fix existing usage so the problem still exists, and now we have two interfaces for doing largely the same thing - one that "works" and one that does "unintentional/surprising" things.

Updated by nobu (Nobuyoshi Nakada) over 3 years ago

ioquatix (Samuel Williams) wrote in #note-4:

Time.parse often results in unintentional/surprising results.

Can we change it so that it doesn't return unintentional/surprising results? I realise this might be impossible without extending the interface... e.g. Time.parse(..., surprising: false) :p

Yes, it seems by design, and changing the behavior will just break something.
I thought that the same name but different behavior method would be more confusing.

I agree clean room implementation is nice. But if Time.parse can do unexpected things, a new interface does not fix existing usage so the problem still exists, and now we have two interfaces for doing largely the same thing - one that "works" and one that does "unintentional/surprising" things.

A wrapper version based on the new Time.new would be possible, I think.

Updated by byroot (Jean Boussier) over 3 years ago

                   user     system      total        real
Time.iso8601   0.006919   0.000185   0.007104 (  0.007091)
Time.parse     0.018338   0.000207   0.018545 (  0.018590)
Time.new       0.003671   0.000069   0.003740 (  0.003741)

This new interface being faster than Time.iso8601 at parsing iso8601 seems weird to me. When I have IS0-8601 dates (pretty much the only format I really use), I'd expect that by using the dedicated method to parse them, I get the best possible performance.

Can we optimize Time.iso8601 in the same way? Otherwise I can already see people moving to the new API and losing strictness by doing so.

Actions #7

Updated by mame (Yusuke Endoh) over 3 years ago

  • Related to Feature #16005: A variation of Time.iso8601 that can parse yyyy-MM-dd HH:mm:ss added

Updated by ioquatix (Samuel Williams) over 3 years ago

Yes, it seems by design, and changing the behavior will just break something.

I think there is value in the following interface:

Time.parse(..., strict: true) where we have some very clearly documented interpretation of strict.

If we want to support multiple formatS:

Time.parse(..., format: :loose/:strict) or Time.parse(...,format: :iso8601) etc.

Updated by ioquatix (Samuel Williams) about 3 years ago

I checked the latest update.

I'm still not sure about Time.new(string) parsing a string.

What about Time.parse_strict or Time.parse_iso8601 etc?

Updated by Eregon (Benoit Daloze) about 3 years ago

Like @ioquatix (Samuel Williams) and @byroot (Jean Boussier) I think Time.new taking a iso-like String is not a good idea.

Also it's technically incompatible (the first argument is always the year for Time.new and it even accepts strings):

> Time.new "2234-01-01T00:00:00+07:00"
=> 2234-01-01 00:00:00 +0100

I understand Time.iso8601 and Time.parse need an extension library, date. is annoying and not intuitive.
How about simply making Time.iso8601 a core method?

Time.iso8601 can't parse Time#inspect string.

One shouldn't parse inspect so this should be no problem in practice.
We could add Time#iso8601 though (in core), I think that's much better than time.to_datetime.iso8601.

Updated by byroot (Jean Boussier) about 3 years ago

If I might add a nitpick, the actual format is RFC 3339, which is pretty much a subset of ISO 8601.

But yes, +1 to making Time.iso8601 a core method.

Updated by timcraft (Tim Craft) about 3 years ago

Nice performance improvement!

In terms of the interface I think it would be confusing to make Time.new parse a string:

  • Using a .parse method for parsing is more conventional / less surprising / more intention revealing
  • Having both .new and .parse would be confusing because it's not clear when to use one or the other, what the trade-offs are etc

It feels very unintiutive and not "least surprise". Similarly Time.parse(...,format: :iso8601) and Time.parse_iso8601 feel a bit clunky.

Is this awkwardness stemming from trying to fit much functionality into the Time class? It would be more work, but if there was a dedicated ISO8601 module we could shift some responsibility away from the Time class. For example:

ISO8601::Time.parse  # strictly ISO8601 parsing, returns a Time object

This could be extended to encapsulate ISO8601 parsing for other classes:

ISO8601::Date.parse
ISO8601::Duration.parse
ISO8601::TimeInterval.parse

The same general pattern could be followed for other formats:

RFC3339::Time.parse
RFC2822::Time.parse
SQL92::Time.parse

Each of those methods would have the same signature, so they could be easily interchanged to get varying degrees of strictness or performance.

From a naming and readability perspective the names are less surprising, more intention revealing, and very greppable.

The higher level Time.parse method could potentially delegate to these methods to support parsing a wider range of formats.

Each module could also be extended to encapsulate string formatting as an alternative to adding instance methods like #iso8601, #rfc3339 etc.


An alternative to using the formats as the top level modules would be to invert and nest the modules under the Time classes, for example:

Time::ISO8601.parse  # instead of ISO8601::Time.parse

Same benefits—I'm not sure which would be considered more Ruby-ish?

Actions #13

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

Eregon (Benoit Daloze) wrote in #note-10:

Also it's technically incompatible (the first argument is always the year for Time.new and it even accepts strings):

> Time.new "2234-01-01T00:00:00+07:00"
=> 2234-01-01 00:00:00 +0100

I overlooked it.
May #18331 be better?

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

I remember #18331 won't work with subclasses of Time.
Does Time.at("2021-12-04 02:00:00") make sense?

Updated by sawa (Tsuyoshi Sawada) about 3 years ago

nobu (Nobuyoshi Nakada) wrote in #note-15:

I remember #18331 won't work with subclasses of Time.

Can you elaborate on that? Is your concern about certain frameworks using time-like classes other than Time? I don't see any problem with that. If you create a Time object using Kernel.#Time proposed in #18331, that can easily be converted (often without noticing) to such class in case of Ruby on Rails' ActiveSupport::TimeWithZone.

Updated by Eregon (Benoit Daloze) about 3 years ago

Yes, Kernel#Time(str) seems better than Time.new(str).

However I think being precise for parsing is key for times.
Hence, why not make Time.iso8601 fast and a core method?
Then code already using the right method would just become faster, and there is no need migrate code to use a new API.

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

sawa (Tsuyoshi Sawada) wrote in #note-16:

nobu (Nobuyoshi Nakada) wrote in #note-15:

I remember #18331 won't work with subclasses of Time.

Can you elaborate on that? Is your concern about certain frameworks using time-like classes other than Time? I don't see any problem with that. If you create a Time object using Kernel.#Time proposed in #18331, that can easily be converted (often without noticing) to such class in case of Ruby on Rails' ActiveSupport::TimeWithZone.

The timezone name will be converted by find_timezone method on the class of a Time instance, so that subclasses can override it.

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

Eregon (Benoit Daloze) wrote in #note-17:

Hence, why not make Time.iso8601 fast and a core method?
Then code already using the right method would just become faster, and there is no need migrate code to use a new API.

First, the primary target is the result of Time#inspect, and it is not fully compliant with ISO-8601.
Second, ISO-8601 allows many variants, but I'm not going to implement them all.

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

Eregon (Benoit Daloze) wrote in #note-10:

Also it's technically incompatible (the first argument is always the year for Time.new and it even accepts strings):

> Time.new "2234-01-01T00:00:00+07:00"
=> 2234-01-01 00:00:00 +0100

Actually this code raises an ArgumentError in the master, since https://bugs.ruby-lang.org/issues/17485#change-89871.

Updated by Eregon (Benoit Daloze) about 3 years ago

nobu (Nobuyoshi Nakada) wrote in #note-19:

First, the primary target is the result of Time#inspect, and it is not fully compliant with ISO-8601.

Thanks for making that clear, I wasn't sure why not just improve Time.iso8601.

Right, they differ:

irb(main):006:0> t.inspect
=> "2021-12-06 14:10:15.334531885 +0100"
irb(main):007:0> t.iso8601
=> "2021-12-06T14:10:15+01:00"

Why we do we need to parse Time#inspect output?
It seems in general bad to rely on inspect for serialization, since it only works for some classes.
Maybe to preserve subseconds?

#inspect does not feel like a proper way to serialize a Time instance, so if we want to add that I think we should have a new method for it too (maybe #dump?).

In any case, Time.new accepts individual time components, and I think making it parse strings is just confusing.
A new method Time.try_convert (or Time.undump) feels more appropriate for such parsing.

Second, ISO-8601 allows many variants, but I'm not going to implement them all.

Is that what makes it faster?
The PR still uses a Regexp so I guess the main difference is the Regexp is a little bit simpler?
They look fairly similar:

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

Eregon (Benoit Daloze) wrote in #note-21:

Why we do we need to parse Time#inspect output?
It seems in general bad to rely on inspect for serialization, since it only works for some classes.
Maybe to preserve subseconds?

I wrote Time#inspect, but the "ISO 8601-like" format is not only used by it, e.g., --date=iso of git.

#inspect does not feel like a proper way to serialize a Time instance, so if we want to add that I think we should have a new method for it too (maybe #dump?).

In any case, Time.new accepts individual time components, and I think making it parse strings is just confusing.

How about Time.at?

A new method Time.try_convert (or Time.undump) feels more appropriate for such parsing.

Time.try_convert feels considerable, but passing the timezone option may not fit.

Second, ISO-8601 allows many variants, but I'm not going to implement them all.

Is that what makes it faster?

Not to be faster.
ISO-8601 is a large specification and allows many omissions, which makes parsing very complex and sometimes conflicts with the spec of Time.

The PR still uses a Regexp so I guess the main difference is the Regexp is a little bit simpler?
They look fairly similar:

That PR is abandoned, please see Time.new-string or Time.at-string.

Updated by Eregon (Benoit Daloze) about 3 years ago

nobu (Nobuyoshi Nakada) wrote in #note-22:

I wrote Time#inspect, but the "ISO 8601-like" format is not only used by it, e.g., --date=iso of git.

Interesting that git has this too:

           --date=iso (or --date=iso8601) shows timestamps in a ISO 8601-like format. The differences to the strict ISO 8601 format are:
                        
           •   a space instead of the T date/time delimiter
           
           •   a space between time and time zone

           •   no colon between hours and minutes of the time zone

           --date=iso-strict (or --date=iso8601-strict) shows timestamps in strict ISO 8601 format.

Maybe we really need a name for this. Like Time#iso and Time.iso.

Or it could be a keyword argument: time.iso8601(format: :git) and Time.iso8601(str, format: :git).
:git is just an example, could be textual, relaxed or anything.

Another idea would be to actually use the strict variant of iso8601 if that is well defined, like what git log --date=iso-strict shows.
Then we'd have Time#iso8601_strict/Time.iso8601_strict, or time.iso8601(format: :strict) and Time.iso8601(str, format: :strict).
That feels proper to me because then we have a clear way to serialize and deserialize times and finding the dumping/parsing method is trivial (they have the same name, no need to guess).
I feel relying on #inspect for efficient serializing is in general bad, because that is not the main purpose of #inspect (rather it's to show a clear/debug-like representation of an object).

How about Time.at?

To me Time.at sounds like "give me a Time at epoch", so I think it's unexpected it does String parsing.

Time.try_convert feels considerable, but passing the timezone option may not fit.

I think an extra optional timezone argument would be fine.

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

Eregon (Benoit Daloze) wrote in #note-23:

Time.try_convert feels considerable, but passing the timezone option may not fit.

I think an extra optional timezone argument would be fine.

Ah, my bad.
The try_convert methods are the implicit conversions for duck-typing, i.e, to_int to Integer, to_str to String and so on.
As parsing like Kernel#Integer is not an implicit conversion, Time.try_convert can't be valid.

Actions #26

Updated by nobu (Nobuyoshi Nakada) about 2 years ago

  • Description updated (diff)

Updated by matz (Yukihiro Matsumoto) about 2 years ago

Accepted.

Performance improvement is a great benefit.
Both Time.iso8859 and Time.parse requires loading date gem, and providing those methods in the core may cause overwriting.
Currently, Time.new only takes numeric arguments, so adding extra argument pattern add only a small complexity.

Matz.

Actions #28

Updated by nobu (Nobuyoshi Nakada) about 2 years ago

  • Status changed from Open to Closed

Applied in changeset git|8c272f44816f098c1e057c72a47451efc8cd1739.


[Feature #18033] Make Time.new parse time strings

Time.new now parses strings such as the result of Time#inspect
and restricted ISO-8601 formats.

Updated by Eregon (Benoit Daloze) 2 months ago

matz (Yukihiro Matsumoto) wrote in #note-27:

Currently, Time.new only takes numeric arguments, so adding extra argument pattern add only a small complexity.

But older versions like 3.0 accept year as a String: https://bugs.ruby-lang.org/issues/18033#note-10
So we get very weird behavior between versions:

$ ruby -ve 'p Time.new "2234-10-12T01:02:03"'
ruby 3.0.6p216 (2023-03-30 revision 23a532679b) [x86_64-linux]
2234-01-01 00:00:00 +0100

$ ruby -ve 'p Time.new "2234-10-12T01:02:03"'
ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x86_64-linux]
<internal:timev>:310:in `initialize': invalid value for Integer(): "2234-10-12T01:02:03" (ArgumentError)
	from -e:1:in `new'
	from -e:1:in `<main>'

$ ruby -ve 'p Time.new "2234-10-12T01:02:03"'
ruby 3.2.4 (2024-04-23 revision af471c0e01) [x86_64-linux]
2234-10-12 01:02:03 +0200

$ ruby -ve 'p Time.new "2234-10-12T01:02:03"'
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux]
2234-10-12 01:02:03 +0200
Actions #30

Updated by Eregon (Benoit Daloze) 2 months ago

  • Related to Bug #19293: The new Time.new(String) API is nice... but we still need a stricter version of this added
Actions #31

Updated by Eregon (Benoit Daloze) 2 months ago

  • Related to Bug #20797: UTC offset seconds part is not checked added
Actions #32

Updated by Eregon (Benoit Daloze) 2 months ago

  • Related to Bug #17504: Allow UTC offset without colons per ISO-8601 added
Actions #33

Updated by Eregon (Benoit Daloze) 2 months ago

  • Related to Bug #19296: Time.new's argument check is incomplete added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0