Bug #10733
closedTime.httpdate raises ArgumentError when DateTime.now.httpdate is provided as input
Description
An irb session demonstrating the bug:
irb(main):001:0> require 'time'
=> true
irb(main):002:0> Time.httpdate("Mon, 12 Jan 2015 12:04:15 GMT")
=> 2015-01-12 12:04:15 UTC
irb(main):003:0> DateTime.now.httpdate.to_s
=> "Mon, 12 Jan 2015 12:04:56 GMT"
irb(main):004:0> Time.httpdate(DateTime.now.httpdate.to_s)
ArgumentError: not RFC 2616 compliant date: "Mon, 12 Jan 2015 12:05:08 GMT"
from /Users/mcls/.rbenv/versions/2.2.0/lib/ruby/2.2.0/time.rb:544:in `httpdate'
from (irb):4
from /Users/mcls/.rbenv/versions/2.2.0/bin/irb:11:in `<main>'
To reproduce:
require 'time'
Time.httpdate("Mon, 12 Jan 2015 12:04:15 GMT") # works
Time.httpdate(DateTime.now.httpdate.to_s) # => ArgumentError
It seems that this only occurs on 2.2.0. (Tested on 2.1.5 and it works fine)
Updated by zzip (Dale Hofkens) over 9 years ago
Another example for Time.httpdate failing.
The third call of Time.httpdate fails, which should be the same as the first call of Time.httpdate.
require 'time'
puts RUBY_VERSION
datetime_in_httpdate = DateTime.now.httpdate.to_s
datetime_as_string = "Mon, 12 Jan 2015 12:09:19 GMT"
#works
puts Time.httpdate(datetime_in_httpdate)
#works
puts Time.httpdate(datetime_as_string)
#fails with ruby 2.2.0
puts Time.httpdate(datetime_in_httpdate)
Updated by mcls (Maarten Claes) over 9 years ago
After some experimentation it looks like this has something do with the fact that DateTime#httpdate
returns an US-ASCII
encoded string.
require 'time'
from_httpdate = DateTime.now.httpdate
as_string = "Mon, 12 Jan 2015 12:09:19 GMT"
puts from_httpdate.encoding # => US-ASCII
puts Time.httpdate(from_httpdate) # Works
puts Time.httpdate(as_string) # Works
begin
puts(Time.httpdate(from_httpdate)) # => ArgumentError
rescue => e
p e
end
puts from_httpdate.encode!(Encoding.find('UTF-8'))
begin
puts Time.httpdate(from_httpdate) # Works
rescue => e
p e
end
Updated by leriksen (Leif Eriksen) over 9 years ago
This is a tricky one. It is a spooky interaction when calling Time.httpdate
with an US-ASCII encoding after calling Time.httpdate
with an UTF-8 encoding.
If you just pass the result of DateTime.now.httpdate
(which has a US-ASCII encoding), by itself, it works fine.
The space chars in the US-ASCII and UTF-8 strings are identical as \x20
chars, which is what the regex wants.
I can 'fix' the issue by replacing the \x20
's in the regexen with \s
, but \s
has different semantics, it means a space-like char
- space character
- tab character
- carriage return character
- new line character
- vertical tab character
- form feed character
And most of those mean the regex will not limit itself to matching RFC2616-compliant times stamps.
I suspect something is getting upset in the Regexp library, but how to trace it I dont know....yet. I will try to move out of the Time/DateTime libraries and just work with a regex that will be fed different encodings.
Updated by leriksen (Leif Eriksen) over 9 years ago
defintely a regex issue, not related to DateTime or Time at all (other than them being affected the same)
def local_httpdate(date)
if /\A\s*
(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun),\x20
(\d{2})\x20
(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\x20
(\d{4})\x20
(\d{2}):(\d{2}):(\d{2})\x20
GMT
\s*\z/ix =~ date
puts "regexp ok"
else
raise ArgumentError, "not RFC 2616 compliant date: #{date.inspect}"
end
end
date_UTF = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))
puts local_httpdate(date_UTF) # Works
begin
puts local_httpdate(date_US_ASCII) # => ArgumentError
rescue => e
p e
end
Updated by leriksen (Leif Eriksen) over 9 years ago
ok, the US-ASCII encoding is not having its space consumed as the regex matches! I've extracted the part of the regex from Time.httpdate that is working, and looking at the post-match parts - they should be the same.
def local_httpdate(date)
if /
\A
\s*
(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun),\x20
/ix =~ date
puts $'
end
end
date_UTF = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))
local_httpdate(date_UTF) # Works
local_httpdate(date_US_ASCII) # => ArgumentError
~/src/bugs/10733 bundle exec ruby 10733.rb
19 Jan 2015 08:43:19 GMT
19 Jan 2015 08:43:19 GMT
What should I do ? Raise a separate issue ? I know from the dumping the strings and looking at the ordinal values that the spaces in both encodings are \x20 chars.
Updated by leriksen (Leif Eriksen) over 9 years ago
Oh and if you comment out the local_httpdate(date_UTF) line, the space is consumed.
~/src/bugs/10733 bundle exec ruby 10733.rb
19 Jan 2015 08:43:19 GMT
So definitely spooky action in the regex engine - which is written in C and will obfuscate the cause a lot.
I guess I'll have to try to trace in in gdb, unless someone knows a good way to debug the internals of the regexp engine...
Updated by duerst (Martin Dürst) over 9 years ago
Maybe it's worth trying with some other encodings (e.g. 'iso-8859-1' or so). Or change the order. I'd also suggest to find out what encoding the regexp has, and try and create the regexp from a string with a different encoding. In theory, because both the regexp and the date have only ASCII data, it shouldn't matter, but there's some chance that the /i throws some spanners in the works, especially if the encoding of the regexp is UTF-8.
Leif Eriksen wrote:
date_UTF = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8')) date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))
Probably unrelated, but this can be written much simpler, e.g. the first line like
date_UTF = "Mon, 19 Jan 2015 08:43:19 GMT".encode 'UTF-8'
Updated by nobu (Nobuyoshi Nakada) over 9 years ago
- Is duplicate of Bug #10670: char-class matching same character with different encodings raises exception added
Updated by nobu (Nobuyoshi Nakada) over 9 years ago
- Status changed from Open to Closed
- Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN to 2.0.0: DONTNEED, 2.1: DONTNEED, 2.2: DONE