Project

General

Profile

Actions

Bug #20412

open

UTF-8 String encoding behavior differs between 3.2, 3.3 and master

Added by bannable (Joe Truba) about 2 months ago. Updated about 2 months ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:117449]

Description

When a String that contains only a \0 byte is mutated by an extension to an invalid UTF-8 sequence, calling .encode('UTF-8') does not consistently raise UndefinedConversionError across ruby versions. When the string is longer than 1 byte, all versions I've tested correctly raise UndefinedConversionError.

For Ruby 3.2, UndefinedConversionError being raised appears to depend on where the string was originally allocated.

For Ruby 3.3, UndefinedConversionError is never raised.

For master ad90fdd24c, UndefinedConversionError is always correctly raised.

I haven't been able to find a bug for this, but it seems like there is a fix in master that should be backported to at least 3.2 and 3.3.

I have not tested 3.1.

The attached reproducer depends on rbnacl because it is minimized from a cryptographic project, and I wasn't able to reduce it further.

Expected Output

For all versions:

$ ruby repro.rb 1
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ruby repro.rb 2
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

Actual Output

Ruby 3.2

$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 1
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 2
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

Ruby 3.3

$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 1
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"

$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 2
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

Ruby Master

$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 1
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 2
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"

Files

repro.rb (2.31 KB) repro.rb bannable (Joe Truba), 04/06/2024 08:50 PM
Actions #1

Updated by bannable (Joe Truba) about 2 months ago

  • Description updated (diff)

Updated by nobu (Nobuyoshi Nakada) about 2 months ago

Maybe related to code range cached flags (#19902 ?).

Updated by etienne (Étienne Barrié) about 2 months ago

Hey,

I cannot reproduce using the ruby:3.2.3 docker image and with my local installation of Ruby 3.2.3 and 3.2.2.

In all these cases, I get "OK" "is not valid UTF-8". I just changed the script to always use size 1 and bundler/inline:

# encoding: ASCII-8BIT
# frozen_string_literal: false

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "rbnacl"
end

p "RUBY: #{RUBY_VERSION}"
require 'rbnacl'

class Encrypter
  extend RbNaCl::Sodium

  sodium_type :stream

  sodium_primitive :xchacha20

  sodium_function :stream_xchacha20_xor,
                  :crypto_stream_xchacha20_xor,
                  %i[pointer pointer ulong_long pointer pointer]

  attr_reader :key

  def initialize(key)
    @key = key
  end

  def encrypt_with_rbnacl_buffer(nonce, message)
    c = RbNaCl::Util.zeros(message.bytesize)
    self.class.stream_xchacha20_xor(c, message, message.bytesize, nonce, key)
    c
  end

  def encrypt_with_local_buffer(nonce, message)
    c = "\0" * message.bytesize
    self.class.stream_xchacha20_xor(c, message, message.bytesize, nonce, key)
    c
  end
end

begin
  "\xC0".encode('UTF-8')
  p 'FAIL: plaintext is not valid UTF-8 and did not error during encoding to UTF-8'
rescue StandardError
end

SIZE = 1

input = ("\xC0" * SIZE) + ' '
nonce = 'B' * 24
key = 'A' * 32

enc = Encrypter.new(key)

ciphertext_rbnacl = enc.encrypt_with_rbnacl_buffer(nonce, input)
ciphertext_local = enc.encrypt_with_local_buffer(nonce, input)

plaintext_rbnacl = enc.encrypt_with_rbnacl_buffer(nonce, ciphertext_rbnacl)
plaintext_local = enc.encrypt_with_local_buffer(nonce, ciphertext_local)

begin
  input.encode('UTF-8')
  p 'FAIL: input is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
end

begin
  ciphertext_rbnacl.encode('UTF-8')
  p 'FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
  p 'OK: ciphertext_rbnacl is not valid UTF-8'
end

begin
  ciphertext_local.encode('UTF-8')
  p 'FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
  p 'OK: ciphertext_local is not valid UTF-8'
end

begin
  plaintext_rbnacl.encode('UTF-8')
  p 'FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
  p 'OK: plaintext_rbnacl is not valid UTF-8'
end

begin
  plaintext_local.encode('UTF-8')
  p 'FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8'
rescue Encoding::UndefinedConversionError
  p 'OK: plaintext_local is not valid UTF-8'
end

Which version of libsodium are you using? Perhaps some specific version mutates a char * string?

Updated by bannable (Joe Truba) about 2 months ago · Edited

@eti

etienne (Étienne Barrié) wrote in #note-3:

Hey,

I cannot reproduce using the ruby:3.2.3 docker image and with my local installation of Ruby 3.2.3 and 3.2.2.

In all these cases, I get "OK" "is not valid UTF-8". I just changed the script to always use size 1 and bundler/inline:

The input size isn't set correctly after your change to the script:

➜  ~ ASDF_RUBY_VERSION=3.2.3 ruby repro.eti.rb 1
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
➜  ~ ASDF_RUBY_VERSION=3.2.3 ruby repro.rb
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"
➜  ~ diff repro.eti.rb repro.rb
44,46c44
< SIZE = ARGV[0].to_i || 32
<
< input = ("\xC0" * SIZE) + ' '
---
> input = "\xC0"
➜  ~

Edit: Looking again, I think I uploaded the wrong version of my repro script originally? It looks like it was also adding the space to the input

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0