Bug #10733
closed
Time.httpdate raises ArgumentError when DateTime.now.httpdate is provided as input
Added by mcls (Maarten Claes) almost 10 years ago.
Updated almost 10 years ago.
Description
An irb session demonstrating the bug:
irb(main):001:0> require 'time'
=> true
irb(main):002:0> Time.httpdate("Mon, 12 Jan 2015 12:04:15 GMT")
=> 2015-01-12 12:04:15 UTC
irb(main):003:0> DateTime.now.httpdate.to_s
=> "Mon, 12 Jan 2015 12:04:56 GMT"
irb(main):004:0> Time.httpdate(DateTime.now.httpdate.to_s)
ArgumentError: not RFC 2616 compliant date: "Mon, 12 Jan 2015 12:05:08 GMT"
from /Users/mcls/.rbenv/versions/2.2.0/lib/ruby/2.2.0/time.rb:544:in `httpdate'
from (irb):4
from /Users/mcls/.rbenv/versions/2.2.0/bin/irb:11:in `<main>'
To reproduce:
require 'time'
Time.httpdate("Mon, 12 Jan 2015 12:04:15 GMT") # works
Time.httpdate(DateTime.now.httpdate.to_s) # => ArgumentError
It seems that this only occurs on 2.2.0. (Tested on 2.1.5 and it works fine)
Another example for Time.httpdate failing.
The third call of Time.httpdate fails, which should be the same as the first call of Time.httpdate.
require 'time'
puts RUBY_VERSION
datetime_in_httpdate = DateTime.now.httpdate.to_s
datetime_as_string = "Mon, 12 Jan 2015 12:09:19 GMT"
#works
puts Time.httpdate(datetime_in_httpdate)
#works
puts Time.httpdate(datetime_as_string)
#fails with ruby 2.2.0
puts Time.httpdate(datetime_in_httpdate)
After some experimentation it looks like this has something do with the fact that DateTime#httpdate
returns an US-ASCII
encoded string.
require 'time'
from_httpdate = DateTime.now.httpdate
as_string = "Mon, 12 Jan 2015 12:09:19 GMT"
puts from_httpdate.encoding # => US-ASCII
puts Time.httpdate(from_httpdate) # Works
puts Time.httpdate(as_string) # Works
begin
puts(Time.httpdate(from_httpdate)) # => ArgumentError
rescue => e
p e
end
puts from_httpdate.encode!(Encoding.find('UTF-8'))
begin
puts Time.httpdate(from_httpdate) # Works
rescue => e
p e
end
This is a tricky one. It is a spooky interaction when calling Time.httpdate
with an US-ASCII encoding after calling Time.httpdate
with an UTF-8 encoding.
If you just pass the result of DateTime.now.httpdate
(which has a US-ASCII encoding), by itself, it works fine.
The space chars in the US-ASCII and UTF-8 strings are identical as \x20
chars, which is what the regex wants.
I can 'fix' the issue by replacing the \x20
's in the regexen with \s
, but \s
has different semantics, it means a space-like char
- space character
- tab character
- carriage return character
- new line character
- vertical tab character
- form feed character
And most of those mean the regex will not limit itself to matching RFC2616-compliant times stamps.
I suspect something is getting upset in the Regexp library, but how to trace it I dont know....yet. I will try to move out of the Time/DateTime libraries and just work with a regex that will be fed different encodings.
defintely a regex issue, not related to DateTime or Time at all (other than them being affected the same)
def local_httpdate(date)
if /\A\s*
(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun),\x20
(\d{2})\x20
(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\x20
(\d{4})\x20
(\d{2}):(\d{2}):(\d{2})\x20
GMT
\s*\z/ix =~ date
puts "regexp ok"
else
raise ArgumentError, "not RFC 2616 compliant date: #{date.inspect}"
end
end
date_UTF = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))
puts local_httpdate(date_UTF) # Works
begin
puts local_httpdate(date_US_ASCII) # => ArgumentError
rescue => e
p e
end
ok, the US-ASCII encoding is not having its space consumed as the regex matches! I've extracted the part of the regex from Time.httpdate that is working, and looking at the post-match parts - they should be the same.
def local_httpdate(date)
if /
\A
\s*
(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun),\x20
/ix =~ date
puts $'
end
end
date_UTF = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))
local_httpdate(date_UTF) # Works
local_httpdate(date_US_ASCII) # => ArgumentError
~/src/bugs/10733 bundle exec ruby 10733.rb
19 Jan 2015 08:43:19 GMT
19 Jan 2015 08:43:19 GMT
What should I do ? Raise a separate issue ? I know from the dumping the strings and looking at the ordinal values that the spaces in both encodings are \x20 chars.
Oh and if you comment out the local_httpdate(date_UTF) line, the space is consumed.
~/src/bugs/10733 bundle exec ruby 10733.rb
19 Jan 2015 08:43:19 GMT
So definitely spooky action in the regex engine - which is written in C and will obfuscate the cause a lot.
I guess I'll have to try to trace in in gdb, unless someone knows a good way to debug the internals of the regexp engine...
Maybe it's worth trying with some other encodings (e.g. 'iso-8859-1' or so). Or change the order. I'd also suggest to find out what encoding the regexp has, and try and create the regexp from a string with a different encoding. In theory, because both the regexp and the date have only ASCII data, it shouldn't matter, but there's some chance that the /i throws some spanners in the works, especially if the encoding of the regexp is UTF-8.
Leif Eriksen wrote:
date_UTF = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))
date_US_ASCII = "Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))
Probably unrelated, but this can be written much simpler, e.g. the first line like
date_UTF = "Mon, 19 Jan 2015 08:43:19 GMT".encode 'UTF-8'
- Is duplicate of Bug #10670: char-class matching same character with different encodings raises exception added
- Status changed from Open to Closed
- Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN to 2.0.0: DONTNEED, 2.1: DONTNEED, 2.2: DONE
Also available in: Atom
PDF
Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0