Bug #20745
closedIO::Buffer#copy triggers UB when src/dest buffers overlap
Description
The current implementation of IO::Buffer#copy
uses memcpy
to copy data between the two memory regions. memcpy
has a requirement that the source and destination must not overlap; otherwise the behavior is undefined.
When copying between the same instance of IO::Buffer
(or slices sharing the same underlying memory), the rule can be violated, and the data is corrupted with some libc implementation / architecture combinations (note that Alpine uses musl libc).
% docker run --platform=linux/amd64 --rm ruby:3.3.5-alpine3.20 ruby -e 'b=IO::Buffer.new(10); b.set_string("0123456789"); b.copy(b, 3, 7, 0); p b'
-e:1: warning: IO::Buffer is experimental and both the Ruby and C interface may change in the future!
#<IO::Buffer 0x00007fb439d2c450+10 INTERNAL>
0x00000000 30 31 32 30 31 32 30 31 32 30 0120120120
% docker run --platform=linux/arm64 --rm ruby:3.3.5-alpine3.20 ruby -e 'b=IO::Buffer.new(10); b.set_string("0123456789"); b.copy(b, 3, 7, 0); p b'
-e:1: warning: IO::Buffer is experimental and both the Ruby and C interface may change in the future!
#<IO::Buffer 0x00007fcc5c580360+10 INTERNAL>
0x00000000 30 31 32 30 31 32 33 34 35 36 0120123456
Copying "0123456789" three bytes behind onto itself, "0120123456" is the expected result.
This error can also be detected with ASAN.
% CXX=clang++ CC=clang ./configure cppflags="-fsanitize=address -fno-omit-frame-pointer" optflags=-O0 LDFLAGS="-fsanitize=address -fno-omit-frame-pointer"
% ./ruby -e 'b=IO::Buffer.new(10); b.copy(b, 0, 9, 1)'
`RubyGems' were not loaded.
`error_highlight' was not loaded.
`did_you_mean' was not loaded.
`syntax_suggest' was not loaded.
-e:1: warning: IO::Buffer is experimental and both the Ruby and C interface may change in the future!
=================================================================
==1655425==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x5020000107b0,0x5020000107b9) and [0x5020000107b1, 0x5020000107ba) overlap
#0 0x55dcfbb72d90 in __asan_memcpy (/home/kasumi/.local/src/github.com/ruby/ruby/ruby+0x1fdd90) (BuildId: 2591ca8e9e713537a8f388383df19d1f4284b722)
#1 0x55dcfbcb31e3 in ruby_nonempty_memcpy /home/kasumi/.local/src/github.com/ruby/ruby/./include/ruby/internal/memory.h:662:16
#2 0x55dcfbcb3867 in io_buffer_memcpy /home/kasumi/.local/src/github.com/ruby/ruby/io_buffer.c:2347:5
#3 0x55dcfbcb354a in io_buffer_copy_from /home/kasumi/.local/src/github.com/ruby/ruby/io_buffer.c:2384:5
#4 0x55dcfbcafd2f in io_buffer_copy /home/kasumi/.local/src/github.com/ruby/ruby/io_buffer.c:2490:12
#5 0x55dcfc0b955f in ractor_safe_call_cfunc_m1 /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3597:12
#6 0x55dcfc09ca28 in vm_call_cfunc_with_frame_ /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3788:11
#7 0x55dcfc09cf2a in vm_call_cfunc_with_frame /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3834:12
#8 0x55dcfc09c0e9 in vm_call_cfunc_other /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3860:16
#9 0x55dcfc081811 in vm_call_cfunc /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:3942:12
#10 0x55dcfc07f347 in vm_call_method_each_type /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:4766:16
#11 0x55dcfc07ebc2 in vm_call_method /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:4892:20
#12 0x55dcfc02d9a4 in vm_call_general /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:4936:12
#13 0x55dcfc02fb24 in vm_sendish /home/kasumi/.local/src/github.com/ruby/ruby/./vm_insnhelper.c:5955:15
#14 0x55dcfc03eb00 in vm_exec_core /home/kasumi/.local/src/github.com/ruby/ruby/insns.def:898:11
#15 0x55dcfc0306a2 in rb_vm_exec /home/kasumi/.local/src/github.com/ruby/ruby/vm.c:2564:22
#16 0x55dcfc06ee9f in rb_iseq_eval_main /home/kasumi/.local/src/github.com/ruby/ruby/vm.c:2830:11
#17 0x55dcfbbb7a84 in rb_ec_exec_node /home/kasumi/.local/src/github.com/ruby/ruby/eval.c:281:9
#18 0x55dcfbbb74b2 in ruby_run_node /home/kasumi/.local/src/github.com/ruby/ruby/eval.c:319:30
#19 0x55dcfbbaf81e in rb_main /home/kasumi/.local/src/github.com/ruby/ruby/./main.c:43:12
#20 0x55dcfbbaf699 in main /home/kasumi/.local/src/github.com/ruby/ruby/./main.c:62:12
#21 0x7fb604914db9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#22 0x7fb604914e74 in __libc_start_main csu/../csu/libc-start.c:360:3
#23 0x55dcfbad8fe0 in _start (/home/kasumi/.local/src/github.com/ruby/ruby/ruby+0x163fe0) (BuildId: 2591ca8e9e713537a8f388383df19d1f4284b722)
[snip]
Updated by hanazuki (Kasumi Hanazuki) 2 months ago
#set_string
in combination with .for
with a block has the same problem.
% docker run --platform=linux/amd64 --rm ruby:3.3.5-alpine3.20 ruby -e 's=+"0123456789"; IO::Buffer.for(s) {|b| b.set_string(s, 3, 7, 0) }; p s'
"0120120120"
Updated by nobu (Nobuyoshi Nakada) 2 months ago
Since the doc says "using memcpy
", it may be intentionally undetermined/unsupported.
Efficiently copy from a source IO::Buffer into the buffer, at +offset+
using +memcpy+. For copying String instances, see #set_string.
Updated by hanazuki (Kasumi Hanazuki) 2 months ago
If this is an intentional restriction to gain performance, I would like to propose adding documentation that the methods are unsafe and should not be used for overlapping copies. Not all Ruby users are familiar to C.
But IO::Buffer
is designed to be a safe abstraction of raw memory (e.g. it does not allow out-of-bounds access or double free in exchange for a small runtime overhead), and so, IMO, it's not good idea to directly expose unsafeness rooted in C memcpy
to the Ruby world.
Here is my proposal to make overlapped copies safe: https://github.com/ruby/ruby/pull/11640. In theory this must have slight performance penalty in non-overlapping cases.
Updated by byroot (Jean Boussier) 2 months ago
I don't think we can legitimately expose an API that can lead to a crash if used incorrectly.
So yeah, I think we should check for overlapping pointer and either degrade to another copy routine that support overlapping pointers or raise an error.
Updated by hanazuki (Kasumi Hanazuki) 2 months ago
After reviewing memcpy
and memmove
from open-source libc implementations, I found some optimize for small copies that fit within registers. These optimizations handle overlapping source and destination memory without explicitly checking for overlaps. Therefore, checking for buffer overlap on the Ruby side to choose between memcpy
or memmove
would negate these optimizations, and I think using memmove
alone (as in my proposal) will suffice.
- glibc:
-
x86_64 for various vector extension:
-
memcpy
is aliased tomemmove
. - Has optimization for small copies (<= 8 * vector_size).
-
-
aarch64 generic:
- Has optimization to skip overlap check for copies of <= 128B.
-
x86_64 for various vector extension:
- FreeBSD:
- musl:
Updated by mame (Yusuke Endoh) about 2 months ago
- Assignee set to ioquatix (Samuel Williams)
Updated by hanazuki (Kasumi Hanazuki) 20 days ago
- Status changed from Open to Closed
Applied in changeset git|35bf6603372360c7680653328274a670fa1d9f38.
io_buffer.c: Allow copies between overlapping buffers with #copy and #set_string (#11640)
The current implementation of IO::Buffer#copy
and #set_string
has
an undefined behavior when the source and destination memory overlaps,
due to the underlying use of the memcpy
C function.
This patch guarantees the methods to be safe even when copying between
overlapping buffers by replacing memcpy
with memmove
,
Fixes: [Bug #20745]
Updated by ioquatix (Samuel Williams) 20 days ago
Thanks for your work and research about the performance cost.