Bug #8286

Can't decode non-MIME Base64

Added by Alan Da Costa 12 months ago. Updated 12 months ago.

[ruby-core:54415]
Status:Closed
Priority:Normal
Assignee:-
Category:-
Target version:-
ruby -v:2.0.0-p0 Backport:1.9.3: UNKNOWN, 2.0.0: UNKNOWN

Description

=begin
In https://github.com/ruby/ruby/blob/trunk/lib/base64.rb#L42 , RFC 2045 is mentioned for encode64/decode64 support, which is the MIME RFC. I don't believe this is the correct RFC to reference, as RFC 4648 is the correct RFC for Base64. Further, RFC 4648 has an explicit section about Line Feeds in Encoded Data, http://tools.ietf.org/html/rfc4648#section-3.1 . This section states:

MIME [4] is often used as a reference for base 64 encoding. However,
MIME does not define "base 64" per se, but rather a "base 64 Content-
Transfer-Encoding" for use within MIME. As such, MIME enforces a
limit on line length of base 64-encoded data to 76 characters. MIME
inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating
that it is "virtually identical"; however, PEM uses a line length of
64 characters. The MIME and PEM limits are both due to limits within
SMTP.

Implementations MUST NOT add line feeds to base-encoded data unless
the specification referring to this document explicitly directs base
encoders to add line feeds after a specific number of characters.

In my case, I have a separate implementation that has not added line feeds to the Base64 (non-MIME) and as a result, Base64.decode64 can not decode the non-MIME encoded data. I believe this also indicates Base64#encode64 has the wrong behavior of MIME encoding Base64.

I have an example of the issue at https://github.com/adacosta/base64_compatible/blob/master/test/test_coding.rb#LC25 .
=end

Associated revisions

Revision 40342
Added by Yui NARUSE 12 months ago

  • pack.c (pack_unpack): output characters even if the input doesn't have paddings. [Bug #8286]

Revision 40344
Added by Nobuyoshi Nakada 12 months ago

pack.c: refix unpack base64

  • pack.c (pack_unpack): increase buffer size to fix buffer overflow, and fix garbages just after unpacking without missing paddings. [Bug #8286]

History

#1 Updated by Yui NARUSE 12 months ago

Could you show self-contained reproducible example?

#2 Updated by Yui NARUSE 12 months ago

  • Status changed from Open to Rejected

Your Base64Compatible.encode64 is buggy.

Base64Compatibleencode64("Lorem ipsum dolor sit amet, consectetur adipiscing elit.") returns
"TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4",
but its length is 75.
It must add padding and it shall be "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4=".

Anyway you can use Base64.strict_encode64 if you don't need line feeds.

#3 Updated by Alan Da Costa 12 months ago

Hi Naruse, thank you for looking at my issue and sorry for wasting your time. I should have spent more time verifying my code.

#4 Updated by Alan Da Costa 12 months ago

My confusion on this issue might have stemmed from a client sending Base64 without padding, and decode64 not managing to decode the base64, for example:

Base64.decode64 "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4"
=> "Lorem ipsum dolor sit amet, consectetur adipiscing eli"

Using python
base64.b64decode('TG9yZW0gaXBzdW0gbase64Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4')
TypeError: Incorrect padding

Using node.js

new Buffer("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4", 'base64').toString('ascii')
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'

This was an old issue for me. My problem seems to have been the padding is not necessary to decode the string. I suppose the behavior of how this is handled is not explicit. I don't expect Ruby to magically figure out the issue, but its behavior seems to be different from other languages in that it doesn't raise an error or succeed with the output.

What are your thoughts on this? Are there other common language implementations that won't decode the "t." ?

#5 Updated by Martin Dürst 12 months ago

On 2013/04/18 8:20, naruse (Yui NARUSE) wrote:

Issue #8286 has been updated by naruse (Yui NARUSE).

Status changed from Open to Rejected

Your Base64Compatible.encode64 is buggy.

Base64Compatibleencode64("Lorem ipsum dolor sit amet, consectetur adipiscing elit.") returns
"TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4",
but its length is 75.
It must add padding and it shall be "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4".

The two strings are exactly the same (checked mechanically). Yui, can
you give the correct example?

Thanks, Martin.

Anyway you can use Base64.strict_encode64 if you don't need line feeds.

#6 Updated by Yui NARUSE 12 months ago

duerst (Martin Dürst) wrote:

On 2013/04/18 8:20, naruse (Yui NARUSE) wrote:

Issue #8286 has been updated by naruse (Yui NARUSE).

Status changed from Open to Rejected

Your Base64Compatible.encode64 is buggy.

Base64Compatibleencode64("Lorem ipsum dolor sit amet, consectetur adipiscing elit.") returns
"TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4",
but its length is 75.
It must add padding and it shall be "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4".

The two strings are exactly the same (checked mechanically). Yui, can
you give the correct example?

Oops, correct one is "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4=".

#7 Updated by Yui NARUSE 12 months ago

2013/4/18 adacosta (Alan Da Costa) alandacosta@gmail.com

Issue #8286 has been updated by adacosta (Alan Da Costa).

My confusion on this issue might have stemmed from a client sending Base64
without padding, and decode64 not managing to decode the base64, for
example:

Base64.decode64
"TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4"
=> "Lorem ipsum dolor sit amet, consectetur adipiscing eli"

Using python

base64.b64decode('TG9yZW0gaXBzdW0gbase64Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4')
TypeError: Incorrect padding

Using node.js

new
Buffer("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4",
'base64').toString('ascii')
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'

This was an old issue for me. My problem seems to have been the padding is
not necessary to decode the string. I suppose the behavior of how this is
handled is not explicit. I don't expect Ruby to magically figure out the
issue, but its behavior seems to be different from other languages in that
it doesn't raise an error or succeed with the output.

What are your thoughts on this? Are there other common language
implementations that won't decode the "t." ?

After I read RFC again, I found base64url allows to skip padding.
So I'll allow Base64.decode64 to implicit paddings.
Note that Base64.strict_decode64 raises error like python

#8 Updated by Nobuyoshi Nakada 12 months ago

  • Status changed from Rejected to Closed
  • % Done changed from 0 to 100

#9 Updated by Martin Bosslet 12 months ago

Excuses for the shameless plug, but I thought it might help Alan:

In krypt[1], we follow the lenient parsing/strict encoding principle.

require 'krypt'

decoded1 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4")
decoded2 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4=")
decoded3 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC5=")

puts decoded1
puts decoded2
puts decoded3
puts decoded1 == decoded2 # => true
puts decoded2 == decoded3 # => true

Even if the input is not strictly by the (RFC) book, it will still try to make sense of the input.
This is possible because of how Base64 decoding works internally, it is possible to flip some bits and still get the
correct answer - some of the input bits are simply irrelevant to the decoding process.

When encoding however, it will always produce the canonical form. By default, it won't generate any line breaks,
but you may tell it to produce line breaks after every n-th character by passing n as an optional second argument:

plain_text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
p Krypt::Base64.encode(plain_text) # with the '=' at the end 
p Krypt::Base64.encode(plain_text, 4) # produces \r\n after every fourth character

If you are dealing with large inputs, there is also a streaming version[2] for encoding and decoding.

[1] https://github.com/krypt/krypt
[2] https://github.com/krypt/krypt/blob/master/lib/krypt/codec/base64.rb

Also available in: Atom PDF