Project

General

Profile

Bug #6799

Digest::*.hexdigest returns an ASCII-8BIT String

Added by Benoit Daloze about 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
ruby -v:
ruby 2.0.0dev (2012-07-15 trunk 36395) [x86_64-darwin10.8.0]
[ruby-core:46792]

Description

$ ruby -rdigest/sha1 -e 'p Digest::SHA1.hexdigest("a").encoding'

#Encoding:ASCII-8BIT

I'm happy to provide a patch.

0001-ext-digest-digest.c-hexencode_str_new-return-an-ASCI.patch View (2.59 KB) Benoit Daloze, 07/27/2012 09:01 PM

noname (500 Bytes) Anonymous, 07/29/2012 10:53 AM

secure_random.patch View (863 Bytes) Aaron Patterson, 09/06/2012 08:58 AM

noname (500 Bytes) Anonymous, 09/15/2012 03:53 PM

Associated revisions

Revision 36588
Added by Benoit Daloze about 4 years ago

ext/digest/digest.c (hexencode_str_new): return an ASCII string

  • test/digest: tests for all kind of digests encodings [Bug #6799]

Revision 36588
Added by Benoit Daloze about 4 years ago

ext/digest/digest.c (hexencode_str_new): return an ASCII string

  • test/digest: tests for all kind of digests encodings [Bug #6799]

Revision 36588
Added by Benoit Daloze about 4 years ago

ext/digest/digest.c (hexencode_str_new): return an ASCII string

  • test/digest: tests for all kind of digests encodings [Bug #6799]

Revision 37269
Added by Benoit Daloze almost 4 years ago

set encoding to ASCII for appropriate String#unpack modifiers

  • pack.c (pack_unpack): set encoding of the 'H','h','B' and 'B' modifiers to US-ASCII.
  • test/ruby/test_pack.rb: tests for the above. [Bug #7050]
  • test/test_securerandom.rb: tests for SecureRandom.hex from tenderlove. [Bug #6799]

Revision 37269
Added by Benoit Daloze almost 4 years ago

set encoding to ASCII for appropriate String#unpack modifiers

  • pack.c (pack_unpack): set encoding of the 'H','h','B' and 'B' modifiers to US-ASCII.
  • test/ruby/test_pack.rb: tests for the above. [Bug #7050]
  • test/test_securerandom.rb: tests for SecureRandom.hex from tenderlove. [Bug #6799]

Revision 37269
Added by Benoit Daloze almost 4 years ago

set encoding to ASCII for appropriate String#unpack modifiers

  • pack.c (pack_unpack): set encoding of the 'H','h','B' and 'B' modifiers to US-ASCII.
  • test/ruby/test_pack.rb: tests for the above. [Bug #7050]
  • test/test_securerandom.rb: tests for SecureRandom.hex from tenderlove. [Bug #6799]

History

#1 [ruby-core:46795] Updated by Martin Dürst about 4 years ago

So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]

#2 [ruby-core:46796] Updated by Shyouhei Urabe about 4 years ago

  • Status changed from Open to Feedback

#3 [ruby-core:46808] Updated by Benoit Daloze about 4 years ago

duerst (Martin Dürst) wrote:

So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]

I thought it was clear enough, but indeed I should have mentioned it.
An hexadecimal String like this should have the US-ASCII encoding, because, by definition, it only has ASCII characters ([a-f0-9]).

This for example, makes YAML dump it as a binary String, and may cause other problems as the String is believed to be "binary", while it's not (#digest on the other hand has the ASCII-8BIT encoding, which is right).

#5 [ruby-core:46846] Updated by Anonymous about 4 years ago

On Fri, Jul 27, 2012 at 07:34:30PM +0900, Eregon (Benoit Daloze) wrote:

Issue #6799 has been updated by Eregon (Benoit Daloze).

duerst (Martin Dürst) wrote:

So what should it return? [If you create a bug, saying what the expected behavior is (and also why you expect it) is quite important.]

I thought it was clear enough, but indeed I should have mentioned it.
An hexadecimal String like this should have the US-ASCII encoding, because, by definition, it only has ASCII characters ([a-f0-9]).

FWIW, I would like this feature too. In Rails, we have to work around
this issue in some database drivers because the ASCII-8BIT string will
be considered to be binary.

--
Aaron Patterson
http://tenderlovemaking.com/

#6 [ruby-core:46871] Updated by Martin Dürst about 4 years ago

  • Assignee set to Benoit Daloze
  • Status changed from Feedback to Assigned

Given that the string has only ASCII characters, an encoding of US-ASCII indeed seems best. Benoit, unless somebody objects soon, I think you should go ahead with your patch.

#7 Updated by Benoit Daloze about 4 years ago

  • % Done changed from 0 to 100
  • Status changed from Assigned to Closed

This issue was solved with changeset r36588.
Benoit, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


ext/digest/digest.c (hexencode_str_new): return an ASCII string

  • test/digest: tests for all kind of digests encodings [Bug #6799]

#8 [ruby-core:46914] Updated by Benoit Daloze about 4 years ago

duerst (Martin Dürst) wrote:

Given that the string has only ASCII characters, an encoding of US-ASCII indeed seems best. Benoit, unless somebody objects soon, I think you should go ahead with your patch.

Thank you, I just did.

#9 [ruby-core:47440] Updated by Aaron Patterson almost 4 years ago

Could we do this for SecureRandom.hex as well? I've attached a patch. /cc akr

#10 [ruby-core:47528] Updated by Benoit Daloze almost 4 years ago

@Aaron: Maybe String#unpack should be fixed instead to return an ASCII string for the 'H', 'h', 'B' and 'b' directives?
It makes sense to me since these will always contain 0-9a-f (0-1 for 'B','b') chars.
This is already done for the 'm', 'M' and 'u' directives in Array#pack.
What do you think?

P.S.: Does anyone know the rationale behind which method to use for 'u','m','M' and 'H','h','B','b' ?
To have a base64 representation you need [s].pack('m') (and so #pack is used for 'u','m','M' to obtain the specified format), but for 'H','h','B','b' you need s.unpack(directive). I think the latter makes more sense when compared to other types such as 'l', in which #unpack is used to obtain the desired format.

#11 [ruby-core:47539] Updated by Anonymous almost 4 years ago

On Fri, Sep 14, 2012 at 07:35:46PM +0900, Eregon (Benoit Daloze) wrote:

Issue #6799 has been updated by Eregon (Benoit Daloze).

@Aaron: Maybe String#unpack should be fixed instead to return an ASCII string for the 'H', 'h', 'B' and 'b' directives?
It makes sense to me since these will always contain 0-9a-f (0-1 for 'B','b') chars.
This is already done for the 'm', 'M' and 'u' directives in Array#pack.
What do you think?

I agree (and would prefer that). It just seems like a larger change
than making this method work for me. ;-)

--
Aaron Patterson
http://tenderlovemaking.com/

Also available in: Atom PDF