Project

General

Profile

Actions

Bug #20745

open

IO::Buffer#copy triggers UB when src/dest buffers overlap

Added by hanazuki (Kasumi Hanazuki) about 1 month ago. Updated 20 days ago.

Status:
Open
Target version:
-
ruby -v:
ruby 3.4.0dev (2024-09-15T01:06:11Z master 532af89e3b) +PRISM [x86_64-linux]
[ruby-core:119208]

Description

The current implementation of IO::Buffer#copy uses memcpy to copy data between the two memory regions. memcpy has a requirement that the source and destination must not overlap; otherwise the behavior is undefined.

When copying between the same instance of IO::Buffer (or slices sharing the same underlying memory), the rule can be violated, and the data is corrupted with some libc implementation / architecture combinations (note that Alpine uses musl libc).

% docker run --platform=linux/amd64 --rm ruby:3.3.5-alpine3.20 ruby -e 'b=IO::Buffer.new(10); b.set_string("0123456789"); b.copy(b, 3, 7, 0); p b'
-e:1: warning: IO::Buffer is experimental and both the Ruby and C interface may change in the future!
#<IO::Buffer 0x00007fb439d2c450+10 INTERNAL>
0x00000000  30 31 32 30 31 32 30 31 32 30                   0120120120

% docker run --platform=linux/arm64 --rm ruby:3.3.5-alpine3.20 ruby -e 'b=IO::Buffer.new(10); b.set_string("0123456789"); b.copy(b, 3, 7, 0); p b'
-e:1: warning: IO::Buffer is experimental and both the Ruby and C interface may change in the future!
#<IO::Buffer 0x00007fcc5c580360+10 INTERNAL>
0x00000000  30 31 32 30 31 32 33 34 35 36                   0120123456

Copying "0123456789" three bytes behind onto itself, "0120123456" is the expected result.


This error can also be detected with ASAN.

% CXX=clang++ CC=clang ./configure cppflags="-fsanitize=address -fno-omit-frame-pointer" optflags=-O0 LDFLAGS="-fsanitize=address -fno-omit-frame-pointer"

% ./ruby -e 'b=IO::Buffer.new(10); b.copy(b, 0, 9, 1)'
`RubyGems' were not loaded.
`error_highlight' was not loaded.
`did_you_mean' was not loaded.
`syntax_suggest' was not loaded.
-e:1: warning: IO::Buffer is experimental and both the Ruby and C interface may change in the future!
=================================================================
==1655425==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x5020000107b0,0x5020000107b9) and [0x5020000107b1, 0x5020000107ba) overlap
    #0 0x55dcfbb72d90 in __asan_memcpy (/home/kasumi/.local/src/github.com/ruby/ruby/ruby+0x1fdd90) (BuildId: 2591ca8e9e713537a8f388383df19d1f4284b722)
    #1 0x55dcfbcb31e3 in ruby_nonempty_memcpy /home/kasumi/.local/src/github.com/ruby/ruby/./include/ruby/internal/memory.h:662:16
    #2 0x55dcfbcb3867 in io_buffer_memcpy /home/kasumi/.local/src/github.com/ruby/ruby/io_buffer.c:2347:5
    #3 0x55dcfbcb354a in io_buffer_copy_from /home/kasumi/.local/src/github.com/ruby/ruby/io_buffer.c:2384:5
    #4 0x55dcfbcafd2f in io_buffer_copy /home/kasumi/.local/src/github.com/ruby/ruby/io_buffer.c:2490:12
    #5 0x55dcfc0b955f in ractor_safe_call_cfunc_m1 /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3597:12
    #6 0x55dcfc09ca28 in vm_call_cfunc_with_frame_ /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3788:11
    #7 0x55dcfc09cf2a in vm_call_cfunc_with_frame /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3834:12
    #8 0x55dcfc09c0e9 in vm_call_cfunc_other /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3860:16
    #9 0x55dcfc081811 in vm_call_cfunc /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3942:12
    #10 0x55dcfc07f347 in vm_call_method_each_type /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:4766:16
    #11 0x55dcfc07ebc2 in vm_call_method /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:4892:20
    #12 0x55dcfc02d9a4 in vm_call_general /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:4936:12
    #13 0x55dcfc02fb24 in vm_sendish /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:5955:15
    #14 0x55dcfc03eb00 in vm_exec_core /home/kasumi/.local/src/github.com/ruby/ruby/insns.def:898:11
    #15 0x55dcfc0306a2 in rb_vm_exec /home/kasumi/.local/src/github.com/ruby/ruby/vm.c:2564:22
    #16 0x55dcfc06ee9f in rb_iseq_eval_main /home/kasumi/.local/src/github.com/ruby/ruby/vm.c:2830:11
    #17 0x55dcfbbb7a84 in rb_ec_exec_node /home/kasumi/.local/src/github.com/ruby/ruby/eval.c:281:9
    #18 0x55dcfbbb74b2 in ruby_run_node /home/kasumi/.local/src/github.com/ruby/ruby/eval.c:319:30
    #19 0x55dcfbbaf81e in rb_main /home/kasumi/.local/src/github.com/ruby/ruby/./main.c:43:12
    #20 0x55dcfbbaf699 in main /home/kasumi/.local/src/github.com/ruby/ruby/./main.c:62:12
    #21 0x7fb604914db9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #22 0x7fb604914e74 in __libc_start_main csu/../csu/libc-start.c:360:3
    #23 0x55dcfbad8fe0 in _start (/home/kasumi/.local/src/github.com/ruby/ruby/ruby+0x163fe0) (BuildId: 2591ca8e9e713537a8f388383df19d1f4284b722)
[snip]

Updated by hanazuki (Kasumi Hanazuki) about 1 month ago

#set_string in combination with .for with a block has the same problem.

% docker run --platform=linux/amd64 --rm ruby:3.3.5-alpine3.20 ruby -e 's=+"0123456789"; IO::Buffer.for(s) {|b| b.set_string(s, 3, 7, 0) }; p s'
"0120120120"

Updated by nobu (Nobuyoshi Nakada) about 1 month ago

Since the doc says "using memcpy", it may be intentionally undetermined/unsupported.

Efficiently copy from a source IO::Buffer into the buffer, at +offset+
using +memcpy+. For copying String instances, see #set_string.

Updated by hanazuki (Kasumi Hanazuki) about 1 month ago

If this is an intentional restriction to gain performance, I would like to propose adding documentation that the methods are unsafe and should not be used for overlapping copies. Not all Ruby users are familiar to C.

But IO::Buffer is designed to be a safe abstraction of raw memory (e.g. it does not allow out-of-bounds access or double free in exchange for a small runtime overhead), and so, IMO, it's not good idea to directly expose unsafeness rooted in C memcpy to the Ruby world.

Here is my proposal to make overlapped copies safe: https://github.com/ruby/ruby/pull/11640. In theory this must have slight performance penalty in non-overlapping cases.

Updated by byroot (Jean Boussier) about 1 month ago

I don't think we can legitimately expose an API that can lead to a crash if used incorrectly.

So yeah, I think we should check for overlapping pointer and either degrade to another copy routine that support overlapping pointers or raise an error.

Updated by hanazuki (Kasumi Hanazuki) 26 days ago

After reviewing memcpy and memmove from open-source libc implementations, I found some optimize for small copies that fit within registers. These optimizations handle overlapping source and destination memory without explicitly checking for overlaps. Therefore, checking for buffer overlap on the Ruby side to choose between memcpy or memmove would negate these optimizations, and I think using memmove alone (as in my proposal) will suffice.


  • glibc:
  • FreeBSD:
    • x86_64:
    • aarch64:
      • memmove calls memcpy for <= 96B as it provides efficient copy for overlapped memory.
  • musl:
    • x86_64:
      • memmove calls memcpy if possible. No special optimization for both.
    • aarch64:
      • memmove is generic one, calling optimized memcpy if non-overlapping.

Updated by mame (Yusuke Endoh) 20 days ago

  • Assignee set to ioquatix (Samuel Williams)
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0