Bug #20412
openUTF-8 String encoding behavior differs between 3.2, 3.3 and master
Description
When a String that contains only a \0
byte is mutated by an extension to an invalid UTF-8 sequence, calling .encode('UTF-8')
does not consistently raise UndefinedConversionError
across ruby versions. When the string is longer than 1 byte, all versions I've tested correctly raise UndefinedConversionError
.
For Ruby 3.2, UndefinedConversionError
being raised appears to depend on where the string was originally allocated.
For Ruby 3.3, UndefinedConversionError
is never raised.
For master ad90fdd24c, UndefinedConversionError
is always correctly raised.
I haven't been able to find a bug for this, but it seems like there is a fix in master that should be backported to at least 3.2 and 3.3.
I have not tested 3.1.
The attached reproducer depends on rbnacl
because it is minimized from a cryptographic project, and I wasn't able to reduce it further.
Expected Output¶
For all versions:
$ ruby repro.rb 1
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
$ ruby repro.rb 2
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
Actual Output¶
Ruby 3.2¶
$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 1
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"
$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 2
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
Ruby 3.3¶
$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 1
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"
$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 2
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
Ruby Master¶
$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 1
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 2
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
Files
Updated by nobu (Nobuyoshi Nakada) 5 months ago
Maybe related to code range cached flags (#19902 ?).
Updated by etienne (Étienne Barrié) 5 months ago
Hey,
I cannot reproduce using the ruby:3.2.3 docker image and with my local installation of Ruby 3.2.3 and 3.2.2.
In all these cases, I get "OK" "is not valid UTF-8". I just changed the script to always use size 1 and bundler/inline:
# encoding: ASCII-8BIT
# frozen_string_literal: false
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "rbnacl"
end
p "RUBY: #{RUBY_VERSION}"
require 'rbnacl'
class Encrypter
extend RbNaCl::Sodium
sodium_type :stream
sodium_primitive :xchacha20
sodium_function :stream_xchacha20_xor,
:crypto_stream_xchacha20_xor,
%i[pointer pointer ulong_long pointer pointer]
attr_reader :key
def initialize(key)
@key = key
end
def encrypt_with_rbnacl_buffer(nonce, message)
c = RbNaCl::Util.zeros(message.bytesize)
self.class.stream_xchacha20_xor(c, message, message.bytesize, nonce, key)
c
end
def encrypt_with_local_buffer(nonce, message)
c = "\0" * message.bytesize
self.class.stream_xchacha20_xor(c, message, message.bytesize, nonce, key)
c
end
end
begin
"\xC0".encode('UTF-8')
p 'FAIL: plaintext is not valid UTF-8 and did not error during encoding to UTF-8'
rescue StandardError
end
SIZE = 1
input = ("\xC0" * SIZE) + ' '
nonce = 'B' * 24
key = 'A' * 32
enc = Encrypter.new(key)
ciphertext_rbnacl = enc.encrypt_with_rbnacl_buffer(nonce, input)
ciphertext_local = enc.encrypt_with_local_buffer(nonce, input)
plaintext_rbnacl = enc.encrypt_with_rbnacl_buffer(nonce, ciphertext_rbnacl)
plaintext_local = enc.encrypt_with_local_buffer(nonce, ciphertext_local)
begin
input.encode('UTF-8')
p 'FAIL: input is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
end
begin
ciphertext_rbnacl.encode('UTF-8')
p 'FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
p 'OK: ciphertext_rbnacl is not valid UTF-8'
end
begin
ciphertext_local.encode('UTF-8')
p 'FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
p 'OK: ciphertext_local is not valid UTF-8'
end
begin
plaintext_rbnacl.encode('UTF-8')
p 'FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
p 'OK: plaintext_rbnacl is not valid UTF-8'
end
begin
plaintext_local.encode('UTF-8')
p 'FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
p 'OK: plaintext_local is not valid UTF-8'
end
Which version of libsodium are you using? Perhaps some specific version mutates a char * string?
Updated by bannable (Joe Truba) 5 months ago · Edited
@eti
etienne (Étienne Barrié) wrote in #note-3:
Hey,
I cannot reproduce using the ruby:3.2.3 docker image and with my local installation of Ruby 3.2.3 and 3.2.2.
In all these cases, I get "OK" "is not valid UTF-8". I just changed the script to always use size 1 and bundler/inline:
The input size isn't set correctly after your change to the script:
➜ ~ ASDF_RUBY_VERSION=3.2.3 ruby repro.eti.rb 1
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
➜ ~ ASDF_RUBY_VERSION=3.2.3 ruby repro.rb
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"
➜ ~ diff repro.eti.rb repro.rb
44,46c44
< SIZE = ARGV[0].to_i || 32
<
< input = ("\xC0" * SIZE) + ' '
---
> input = "\xC0"
➜ ~
Edit: Looking again, I think I uploaded the wrong version of my repro script originally? It looks like it was also adding the space to the input