Project

General

Profile

Actions

Bug #8286

closed

Can't decode non-MIME Base64

Added by adacosta (Alan Da Costa) about 11 years ago. Updated about 11 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
2.0.0-p0
[ruby-core:54415]

Description

=begin
In https://github.com/ruby/ruby/blob/trunk/lib/base64.rb#L42 , RFC 2045 is mentioned for encode64/decode64 support, which is the MIME RFC. I don't believe this is the correct RFC to reference, as RFC 4648 is the correct RFC for Base64. Further, RFC 4648 has an explicit section about Line Feeds in Encoded Data, http://tools.ietf.org/html/rfc4648#section-3.1 . This section states:

MIME [4] is often used as a reference for base 64 encoding. However,
MIME does not define "base 64" per se, but rather a "base 64 Content-
Transfer-Encoding" for use within MIME. As such, MIME enforces a
limit on line length of base 64-encoded data to 76 characters. MIME
inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating
that it is "virtually identical"; however, PEM uses a line length of
64 characters. The MIME and PEM limits are both due to limits within
SMTP.

Implementations MUST NOT add line feeds to base-encoded data unless
the specification referring to this document explicitly directs base
encoders to add line feeds after a specific number of characters.

In my case, I have a separate implementation that has not added line feeds to the Base64 (non-MIME) and as a result, Base64.decode64 can not decode the non-MIME encoded data. I believe this also indicates Base64#encode64 has the wrong behavior of MIME encoding Base64.

I have an example of the issue at https://github.com/adacosta/base64_compatible/blob/master/test/test_coding.rb#LC25 .
=end

Updated by naruse (Yui NARUSE) about 11 years ago

Could you show self-contained reproducible example?

Updated by naruse (Yui NARUSE) about 11 years ago

  • Status changed from Open to Rejected

Your Base64Compatible.encode64 is buggy.

Base64Compatibleencode64("Lorem ipsum dolor sit amet, consectetur adipiscing elit.") returns
"TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4",
but its length is 75.
It must add padding and it shall be "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4=".

Anyway you can use Base64.strict_encode64 if you don't need line feeds.

Updated by adacosta (Alan Da Costa) about 11 years ago

Hi Naruse, thank you for looking at my issue and sorry for wasting your time. I should have spent more time verifying my code.

Updated by adacosta (Alan Da Costa) about 11 years ago

My confusion on this issue might have stemmed from a client sending Base64 without padding, and decode64 not managing to decode the base64, for example:

Base64.decode64 "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4"
=> "Lorem ipsum dolor sit amet, consectetur adipiscing eli"

Using python
base64.b64decode('TG9yZW0gaXBzdW0gbase64Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4')
TypeError: Incorrect padding

Using node.js

new Buffer("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4", 'base64').toString('ascii')
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'

This was an old issue for me. My problem seems to have been the padding is not necessary to decode the string. I suppose the behavior of how this is handled is not explicit. I don't expect Ruby to magically figure out the issue, but its behavior seems to be different from other languages in that it doesn't raise an error or succeed with the output.

What are your thoughts on this? Are there other common language implementations that won't decode the "t." ?

Updated by duerst (Martin Dürst) about 11 years ago

On 2013/04/18 8:20, naruse (Yui NARUSE) wrote:

Issue #8286 has been updated by naruse (Yui NARUSE).

Status changed from Open to Rejected

Your Base64Compatible.encode64 is buggy.

Base64Compatibleencode64("Lorem ipsum dolor sit amet, consectetur adipiscing elit.") returns
"TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4",
but its length is 75.
It must add padding and it shall be "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4".

The two strings are exactly the same (checked mechanically). Yui, can
you give the correct example?

Thanks, Martin.

Anyway you can use Base64.strict_encode64 if you don't need line feeds.

Updated by naruse (Yui NARUSE) about 11 years ago

duerst (Martin Dürst) wrote:

On 2013/04/18 8:20, naruse (Yui NARUSE) wrote:

Issue #8286 has been updated by naruse (Yui NARUSE).

Status changed from Open to Rejected

Your Base64Compatible.encode64 is buggy.

Base64Compatibleencode64("Lorem ipsum dolor sit amet, consectetur adipiscing elit.") returns
"TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4",
but its length is 75.
It must add padding and it shall be "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4".

The two strings are exactly the same (checked mechanically). Yui, can
you give the correct example?

Oops, correct one is "TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4=".

Updated by naruse (Yui NARUSE) about 11 years ago

2013/4/18 adacosta (Alan Da Costa)

Issue #8286 has been updated by adacosta (Alan Da Costa).

My confusion on this issue might have stemmed from a client sending Base64
without padding, and decode64 not managing to decode the base64, for
example:

Base64.decode64
"TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4"
=> "Lorem ipsum dolor sit amet, consectetur adipiscing eli"

Using python

base64.b64decode('TG9yZW0gaXBzdW0gbase64Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4')
TypeError: Incorrect padding

Using node.js

new
Buffer("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4",
'base64').toString('ascii')
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'

This was an old issue for me. My problem seems to have been the padding is
not necessary to decode the string. I suppose the behavior of how this is
handled is not explicit. I don't expect Ruby to magically figure out the
issue, but its behavior seems to be different from other languages in that
it doesn't raise an error or succeed with the output.

What are your thoughts on this? Are there other common language
implementations that won't decode the "t." ?

After I read RFC again, I found base64url allows to skip padding.
So I'll allow Base64.decode64 to implicit paddings.
Note that Base64.strict_decode64 raises error like python

Updated by nobu (Nobuyoshi Nakada) about 11 years ago

  • Status changed from Rejected to Closed
  • % Done changed from 0 to 100

Updated by MartinBosslet (Martin Bosslet) about 11 years ago

Excuses for the shameless plug, but I thought it might help Alan:

In krypt[1], we follow the lenient parsing/strict encoding principle.

require 'krypt'

decoded1 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4")
decoded2 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4=")
decoded3 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC5=")

puts decoded1
puts decoded2
puts decoded3
puts decoded1 == decoded2 # => true
puts decoded2 == decoded3 # => true

Even if the input is not strictly by the (RFC) book, it will still try to make sense of the input.
This is possible because of how Base64 decoding works internally, it is possible to flip some bits and still get the
correct answer - some of the input bits are simply irrelevant to the decoding process.

When encoding however, it will always produce the canonical form. By default, it won't generate any line breaks,
but you may tell it to produce line breaks after every n-th character by passing n as an optional second argument:

plain_text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
p Krypt::Base64.encode(plain_text) # with the '=' at the end 
p Krypt::Base64.encode(plain_text, 4) # produces \r\n after every fourth character

If you are dealing with large inputs, there is also a streaming version[2] for encoding and decoding.

[1] https://github.com/krypt/krypt
[2] https://github.com/krypt/krypt/blob/master/lib/krypt/codec/base64.rb

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0