Project

General

Profile

Bug #10733

Time.httpdate raises ArgumentError when DateTime.now.httpdate is provided as input

Added by mcls (Maarten Claes) about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-darwin14]
[ruby-core:67538]

Description

An irb session demonstrating the bug:

irb(main):001:0> require 'time'
=> true
irb(main):002:0> Time.httpdate("Mon, 12 Jan 2015 12:04:15 GMT")
=> 2015-01-12 12:04:15 UTC
irb(main):003:0> DateTime.now.httpdate.to_s
=> "Mon, 12 Jan 2015 12:04:56 GMT"
irb(main):004:0> Time.httpdate(DateTime.now.httpdate.to_s)
ArgumentError: not RFC 2616 compliant date: "Mon, 12 Jan 2015 12:05:08 GMT"
        from /Users/mcls/.rbenv/versions/2.2.0/lib/ruby/2.2.0/time.rb:544:in `httpdate'
        from (irb):4
        from /Users/mcls/.rbenv/versions/2.2.0/bin/irb:11:in `<main>'

To reproduce:

require 'time'
Time.httpdate("Mon, 12 Jan 2015 12:04:15 GMT") # works
Time.httpdate(DateTime.now.httpdate.to_s) # => ArgumentError

It seems that this only occurs on 2.2.0. (Tested on 2.1.5 and it works fine)


Related issues

Is duplicate of Ruby master - Bug #10670: char-class matching same character with different encodings raises exceptionClosed12/29/2014Actions

History

Updated by zzip (Dale Hofkens) about 5 years ago

Another example for Time.httpdate failing.

The third call of Time.httpdate fails, which should be the same as the first call of Time.httpdate.

require 'time'
puts RUBY_VERSION

datetime_in_httpdate = DateTime.now.httpdate.to_s
datetime_as_string =  "Mon, 12 Jan 2015 12:09:19 GMT"

#works
puts Time.httpdate(datetime_in_httpdate)
#works
puts Time.httpdate(datetime_as_string)
#fails with ruby 2.2.0
puts Time.httpdate(datetime_in_httpdate)

Updated by mcls (Maarten Claes) about 5 years ago

After some experimentation it looks like this has something do with the fact that DateTime#httpdate returns an US-ASCII encoded string.

require 'time'

from_httpdate = DateTime.now.httpdate
as_string     = "Mon, 12 Jan 2015 12:09:19 GMT"

puts from_httpdate.encoding # => US-ASCII
puts Time.httpdate(from_httpdate) # Works
puts Time.httpdate(as_string) # Works

begin
  puts(Time.httpdate(from_httpdate)) # => ArgumentError
rescue => e
  p e
end

puts from_httpdate.encode!(Encoding.find('UTF-8'))

begin
  puts Time.httpdate(from_httpdate) # Works
rescue => e
  p e
end

Updated by leriksen (Leif Eriksen) about 5 years ago

This is a tricky one. It is a spooky interaction when calling Time.httpdate with an US-ASCII encoding after calling Time.httpdate with an UTF-8 encoding.

If you just pass the result of DateTime.now.httpdate (which has a US-ASCII encoding), by itself, it works fine.

The space chars in the US-ASCII and UTF-8 strings are identical as \x20 chars, which is what the regex wants.

I can 'fix' the issue by replacing the \x20's in the regexen with \s, but \s has different semantics, it means a space-like char

  • space character
  • tab character
  • carriage return character
  • new line character
  • vertical tab character
  • form feed character

And most of those mean the regex will not limit itself to matching RFC2616-compliant times stamps.

I suspect something is getting upset in the Regexp library, but how to trace it I dont know....yet. I will try to move out of the Time/DateTime libraries and just work with a regex that will be fed different encodings.

Updated by leriksen (Leif Eriksen) about 5 years ago

defintely a regex issue, not related to DateTime or Time at all (other than them being affected the same)

def local_httpdate(date)
  if /\A\s*
      (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun),\x20
      (\d{2})\x20
      (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\x20
      (\d{4})\x20
      (\d{2}):(\d{2}):(\d{2})\x20
      GMT
      \s*\z/ix =~ date
    puts "regexp ok"
  else
    raise ArgumentError, "not RFC 2616 compliant date: #{date.inspect}"
  end
end

date_UTF      = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))

puts local_httpdate(date_UTF) # Works

begin
  puts local_httpdate(date_US_ASCII) # => ArgumentError
rescue => e
  p e
end

Updated by leriksen (Leif Eriksen) about 5 years ago

ok, the US-ASCII encoding is not having its space consumed as the regex matches! I've extracted the part of the regex from Time.httpdate that is working, and looking at the post-match parts - they should be the same.

def local_httpdate(date)
  if /
      \A
      \s*
      (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun),\x20
    /ix =~ date
    puts $'
  end
end

date_UTF      = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))

local_httpdate(date_UTF) # Works
local_httpdate(date_US_ASCII) # => ArgumentError
~/src/bugs/10733 bundle exec ruby 10733.rb
19 Jan 2015 08:43:19 GMT
 19 Jan 2015 08:43:19 GMT

What should I do ? Raise a separate issue ? I know from the dumping the strings and looking at the ordinal values that the spaces in both encodings are \x20 chars.

Updated by leriksen (Leif Eriksen) about 5 years ago

Oh and if you comment out the local_httpdate(date_UTF) line, the space is consumed.

~/src/bugs/10733 bundle exec ruby 10733.rb
19 Jan 2015 08:43:19 GMT

So definitely spooky action in the regex engine - which is written in C and will obfuscate the cause a lot.

I guess I'll have to try to trace in in gdb, unless someone knows a good way to debug the internals of the regexp engine...

Updated by duerst (Martin Dürst) about 5 years ago

Maybe it's worth trying with some other encodings (e.g. 'iso-8859-1' or so). Or change the order. I'd also suggest to find out what encoding the regexp has, and try and create the regexp from a string with a different encoding. In theory, because both the regexp and the date have only ASCII data, it shouldn't matter, but there's some chance that the /i throws some spanners in the works, especially if the encoding of the regexp is UTF-8.

Leif Eriksen wrote:

date_UTF      = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))

Probably unrelated, but this can be written much simpler, e.g. the first line like

date_UTF      = "Mon, 19 Jan 2015 08:43:19 GMT".encode 'UTF-8'
#8

Updated by nobu (Nobuyoshi Nakada) about 5 years ago

  • Is duplicate of Bug #10670: char-class matching same character with different encodings raises exception added

Updated by nobu (Nobuyoshi Nakada) about 5 years ago

  • Status changed from Open to Closed
  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN to 2.0.0: DONTNEED, 2.1: DONTNEED, 2.2: DONE

Also available in: Atom PDF