The third call of Time.httpdate fails, which should be the same as the first call of Time.httpdate.
require'time'putsRUBY_VERSIONdatetime_in_httpdate=DateTime.now.httpdate.to_sdatetime_as_string="Mon, 12 Jan 2015 12:09:19 GMT"#worksputsTime.httpdate(datetime_in_httpdate)#worksputsTime.httpdate(datetime_as_string)#fails with ruby 2.2.0putsTime.httpdate(datetime_in_httpdate)
This is a tricky one. It is a spooky interaction when calling Time.httpdate with an US-ASCII encoding after calling Time.httpdate with an UTF-8 encoding.
If you just pass the result of DateTime.now.httpdate (which has a US-ASCII encoding), by itself, it works fine.
The space chars in the US-ASCII and UTF-8 strings are identical as \x20 chars, which is what the regex wants.
I can 'fix' the issue by replacing the \x20's in the regexen with \s, but \s has different semantics, it means a space-like char
space character
tab character
carriage return character
new line character
vertical tab character
form feed character
And most of those mean the regex will not limit itself to matching RFC2616-compliant times stamps.
I suspect something is getting upset in the Regexp library, but how to trace it I dont know....yet. I will try to move out of the Time/DateTime libraries and just work with a regex that will be fed different encodings.
ok, the US-ASCII encoding is not having its space consumed as the regex matches! I've extracted the part of the regex from Time.httpdate that is working, and looking at the post-match parts - they should be the same.
deflocal_httpdate(date)if/
\A
\s*
(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun),\x20
/ix=~dateputs$'endenddate_UTF="Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))date_US_ASCII="Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))local_httpdate(date_UTF)# Workslocal_httpdate(date_US_ASCII)# => ArgumentError
~/src/bugs/10733 bundle exec ruby 10733.rb
19 Jan 2015 08:43:19 GMT
19 Jan 2015 08:43:19 GMT
What should I do ? Raise a separate issue ? I know from the dumping the strings and looking at the ordinal values that the spaces in both encodings are \x20 chars.
Maybe it's worth trying with some other encodings (e.g. 'iso-8859-1' or so). Or change the order. I'd also suggest to find out what encoding the regexp has, and try and create the regexp from a string with a different encoding. In theory, because both the regexp and the date have only ASCII data, it shouldn't matter, but there's some chance that the /i throws some spanners in the works, especially if the encoding of the regexp is UTF-8.
Leif Eriksen wrote:
date_UTF="Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('UTF-8'))date_US_ASCII="Mon, 19 Jan 2015 08:43:19 GMT".encode(Encoding.find('US-ASCII'))
Probably unrelated, but this can be written much simpler, e.g. the first line like
date_UTF="Mon, 19 Jan 2015 08:43:19 GMT".encode'UTF-8'