Project

General

Profile

Actions

Bug #10733

closed

Time.httpdate raises ArgumentError when DateTime.now.httpdate is provided as input

Added by mcls (Maarten Claes) almost 10 years ago. Updated almost 10 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-darwin14]
[ruby-core:67538]

Description

An irb session demonstrating the bug:

irb(main):001:0> require 'time'
=> true
irb(main):002:0> Time.httpdate("Mon, 12 Jan 2015 12:04:15 GMT")
=> 2015-01-12 12:04:15 UTC
irb(main):003:0> DateTime.now.httpdate.to_s
=> "Mon, 12 Jan 2015 12:04:56 GMT"
irb(main):004:0> Time.httpdate(DateTime.now.httpdate.to_s)
ArgumentError: not RFC 2616 compliant date: "Mon, 12 Jan 2015 12:05:08 GMT"
        from /Users/mcls/.rbenv/versions/2.2.0/lib/ruby/2.2.0/time.rb:544:in `httpdate'
        from (irb):4
        from /Users/mcls/.rbenv/versions/2.2.0/bin/irb:11:in `<main>'

To reproduce:

require 'time'
Time.httpdate("Mon, 12 Jan 2015 12:04:15 GMT") # works
Time.httpdate(DateTime.now.httpdate.to_s) # => ArgumentError

It seems that this only occurs on 2.2.0. (Tested on 2.1.5 and it works fine)


Related issues 1 (0 open1 closed)

Is duplicate of Ruby master - Bug #10670: char-class matching same character with different encodings raises exceptionClosed12/29/2014Actions

Updated by zzip (Dale Hofkens) almost 10 years ago

Another example for Time.httpdate failing.

The third call of Time.httpdate fails, which should be the same as the first call of Time.httpdate.

require 'time'
puts RUBY_VERSION

datetime_in_httpdate = DateTime.now.httpdate.to_s
datetime_as_string =  "Mon, 12 Jan 2015 12:09:19 GMT"

#works
puts Time.httpdate(datetime_in_httpdate)
#works
puts Time.httpdate(datetime_as_string)
#fails with ruby 2.2.0
puts Time.httpdate(datetime_in_httpdate)

Updated by mcls (Maarten Claes) almost 10 years ago

After some experimentation it looks like this has something do with the fact that DateTime#httpdate returns an US-ASCII encoded string.

require 'time'

from_httpdate = DateTime.now.httpdate
as_string     = "Mon, 12 Jan 2015 12:09:19 GMT"

puts from_httpdate.encoding # => US-ASCII
puts Time.httpdate(from_httpdate) # Works
puts Time.httpdate(as_string) # Works

begin
  puts(Time.httpdate(from_httpdate)) # => ArgumentError
rescue => e
  p e
end

puts from_httpdate.encode!(Encoding.find('UTF-8'))

begin
  puts Time.httpdate(from_httpdate) # Works
rescue => e
  p e
end

Updated by leriksen (Leif Eriksen) almost 10 years ago

This is a tricky one. It is a spooky interaction when calling Time.httpdate with an US-ASCII encoding after calling Time.httpdate with an UTF-8 encoding.

If you just pass the result of DateTime.now.httpdate (which has a US-ASCII encoding), by itself, it works fine.

The space chars in the US-ASCII and UTF-8 strings are identical as \x20 chars, which is what the regex wants.

I can 'fix' the issue by replacing the \x20's in the regexen with \s, but \s has different semantics, it means a space-like char

  • space character
  • tab character
  • carriage return character
  • new line character
  • vertical tab character
  • form feed character

And most of those mean the regex will not limit itself to matching RFC2616-compliant times stamps.

I suspect something is getting upset in the Regexp library, but how to trace it I dont know....yet. I will try to move out of the Time/DateTime libraries and just work with a regex that will be fed different encodings.

Updated by leriksen (Leif Eriksen) almost 10 years ago

defintely a regex issue, not related to DateTime or Time at all (other than them being affected the same)

def local_httpdate(date)
  if /\A\s*
      (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun),\x20
      (\d{2})\x20
      (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\x20
      (\d{4})\x20
      (\d{2}):(\d{2}):(\d{2})\x20
      GMT
      \s*\z/ix =~ date
    puts "regexp ok"
  else
    raise ArgumentError, "not RFC 2616 compliant date: #{date.inspect}"
  end
end

date_UTF      = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))

puts local_httpdate(date_UTF) # Works

begin
  puts local_httpdate(date_US_ASCII) # => ArgumentError
rescue => e
  p e
end

Updated by leriksen (Leif Eriksen) almost 10 years ago

ok, the US-ASCII encoding is not having its space consumed as the regex matches! I've extracted the part of the regex from Time.httpdate that is working, and looking at the post-match parts - they should be the same.

def local_httpdate(date)
  if /
      \A
      \s*
      (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun),\x20
    /ix =~ date
    puts $'
  end
end

date_UTF      = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))

local_httpdate(date_UTF) # Works
local_httpdate(date_US_ASCII) # => ArgumentError
~/src/bugs/10733 bundle exec ruby 10733.rb
19 Jan 2015 08:43:19 GMT
 19 Jan 2015 08:43:19 GMT

What should I do ? Raise a separate issue ? I know from the dumping the strings and looking at the ordinal values that the spaces in both encodings are \x20 chars.

Updated by leriksen (Leif Eriksen) almost 10 years ago

Oh and if you comment out the local_httpdate(date_UTF) line, the space is consumed.

~/src/bugs/10733 bundle exec ruby 10733.rb
19 Jan 2015 08:43:19 GMT

So definitely spooky action in the regex engine - which is written in C and will obfuscate the cause a lot.

I guess I'll have to try to trace in in gdb, unless someone knows a good way to debug the internals of the regexp engine...

Updated by duerst (Martin Dürst) almost 10 years ago

Maybe it's worth trying with some other encodings (e.g. 'iso-8859-1' or so). Or change the order. I'd also suggest to find out what encoding the regexp has, and try and create the regexp from a string with a different encoding. In theory, because both the regexp and the date have only ASCII data, it shouldn't matter, but there's some chance that the /i throws some spanners in the works, especially if the encoding of the regexp is UTF-8.

Leif Eriksen wrote:

date_UTF      = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))

Probably unrelated, but this can be written much simpler, e.g. the first line like

date_UTF      = "Mon, 19 Jan 2015 08:43:19 GMT".encode 'UTF-8'
Actions #8

Updated by nobu (Nobuyoshi Nakada) almost 10 years ago

  • Is duplicate of Bug #10670: char-class matching same character with different encodings raises exception added

Updated by nobu (Nobuyoshi Nakada) almost 10 years ago

  • Status changed from Open to Closed
  • Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN to 2.0.0: DONTNEED, 2.1: DONTNEED, 2.2: DONE
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0