Bug #6799

Digest::*.hexdigest returns an ASCII-8BIT String

Added by Benoit Daloze over 2 years ago. Updated over 2 years ago.

[ruby-core:46792]
Status:Closed
Priority:Normal
Assignee:Benoit Daloze
ruby -v:ruby 2.0.0dev (2012-07-15 trunk 36395) [x86_64-darwin10.8.0] Backport:

Description

$ ruby -rdigest/sha1 -e 'p Digest::SHA1.hexdigest("a").encoding'

#Encoding:ASCII-8BIT

I'm happy to provide a patch.

0001-ext-digest-digest.c-hexencode_str_new-return-an-ASCI.patch Magnifier (2.59 KB) Benoit Daloze, 07/27/2012 09:01 PM

noname (500 Bytes) Anonymous, 07/29/2012 10:53 AM

secure_random.patch Magnifier (863 Bytes) Aaron Patterson, 09/06/2012 08:58 AM

noname (500 Bytes) Anonymous, 09/15/2012 03:53 PM

Associated revisions

Revision 36588
Added by Benoit Daloze over 2 years ago

ext/digest/digest.c (hexencode_str_new): return an ASCII string

  • test/digest: tests for all kind of digests encodings [Bug #6799]

Revision 36588
Added by Benoit Daloze over 2 years ago

ext/digest/digest.c (hexencode_str_new): return an ASCII string

  • test/digest: tests for all kind of digests encodings [Bug #6799]

Revision 37269
Added by Benoit Daloze over 2 years ago

set encoding to ASCII for appropriate String#unpack modifiers

  • pack.c (pack_unpack): set encoding of the 'H','h','B' and 'B' modifiers to US-ASCII.
  • test/ruby/test_pack.rb: tests for the above. [Bug #7050]
  • test/test_securerandom.rb: tests for SecureRandom.hex from tenderlove. [Bug #6799]

Revision 37269
Added by Benoit Daloze over 2 years ago

set encoding to ASCII for appropriate String#unpack modifiers

  • pack.c (pack_unpack): set encoding of the 'H','h','B' and 'B' modifiers to US-ASCII.
  • test/ruby/test_pack.rb: tests for the above. [Bug #7050]
  • test/test_securerandom.rb: tests for SecureRandom.hex from tenderlove. [Bug #6799]

History

#1 Updated by Martin Dürst over 2 years ago

So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]

#2 Updated by Shyouhei Urabe over 2 years ago

  • Status changed from Open to Feedback

#3 Updated by Benoit Daloze over 2 years ago

duerst (Martin Dürst) wrote:

So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]

I thought it was clear enough, but indeed I should have mentioned it.
An hexadecimal String like this should have the US-ASCII encoding, because, by definition, it only has ASCII characters ([a-f0-9]).

This for example, makes YAML dump it as a binary String, and may cause other problems as the String is believed to be "binary", while it's not (#digest on the other hand has the ASCII-8BIT encoding, which is right).

#5 Updated by Anonymous over 2 years ago

On Fri, Jul 27, 2012 at 07:34:30PM +0900, Eregon (Benoit Daloze) wrote:

Issue #6799 has been updated by Eregon (Benoit Daloze).

duerst (Martin Dürst) wrote:

So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]

I thought it was clear enough, but indeed I should have mentioned it.
An hexadecimal String like this should have the US-ASCII encoding, because, by definition, it only has ASCII characters ([a-f0-9]).

FWIW, I would like this feature too. In Rails, we have to work around
this issue in some database drivers because the ASCII-8BIT string will
be considered to be binary.

--
Aaron Patterson
http://tenderlovemaking.com/

#6 Updated by Martin Dürst over 2 years ago

  • Status changed from Feedback to Assigned
  • Assignee set to Benoit Daloze

Given that the string has only ASCII characters, an encoding of US-ASCII indeed seems best. Benoit, unless somebody objects soon, I think you should go ahead with your patch.

#7 Updated by Benoit Daloze over 2 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r36588.
Benoit, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


ext/digest/digest.c (hexencode_str_new): return an ASCII string

  • test/digest: tests for all kind of digests encodings [Bug #6799]

#8 Updated by Benoit Daloze over 2 years ago

duerst (Martin Dürst) wrote:

Given that the string has only ASCII characters, an encoding of US-ASCII indeed seems best. Benoit, unless somebody objects soon, I think you should go ahead with your patch.

Thank you, I just did.

#9 Updated by Aaron Patterson over 2 years ago

Could we do this for SecureRandom.hex as well? I've attached a patch. /cc akr

#10 Updated by Benoit Daloze over 2 years ago

@Aaron: Maybe String#unpack should be fixed instead to return an ASCII string for the 'H', 'h', 'B' and 'b' directives?
It makes sense to me since these will always contain 0-9a-f (0-1 for 'B','b') chars.
This is already done for the 'm', 'M' and 'u' directives in Array#pack.
What do you think?

P.S.: Does anyone know the rationale behind which method to use for 'u','m','M' and 'H','h','B','b' ?
To have a base64 representation you need [s].pack('m') (and so #pack is used for 'u','m','M' to obtain the specified format), but for 'H','h','B','b' you need s.unpack(directive). I think the latter makes more sense when compared to other types such as 'l', in which #unpack is used to obtain the desired format.

#11 Updated by Anonymous over 2 years ago

On Fri, Sep 14, 2012 at 07:35:46PM +0900, Eregon (Benoit Daloze) wrote:

Issue #6799 has been updated by Eregon (Benoit Daloze).

@Aaron: Maybe String#unpack should be fixed instead to return an ASCII string for the 'H', 'h', 'B' and 'b' directives?
It makes sense to me since these will always contain 0-9a-f (0-1 for 'B','b') chars.
This is already done for the 'm', 'M' and 'u' directives in Array#pack.
What do you think?

I agree (and would prefer that). It just seems like a larger change
than making this method work for me. ;-)

--
Aaron Patterson
http://tenderlovemaking.com/

Also available in: Atom PDF