Bug #20237
closedUnable to unshare(CLONE_NEWUSER) in Linux because of timer thread
Description
Backgrounds¶
unshare(2) is a syscall in Linux to move the calling process into a fresh execution context. With unshare(CLONE_NEWUSER)
you can move a process into a new user_namespace(7), where the process gains the full capability on the resources within the namespace. This is fundamental for Linux containers to achieve privilege separation. unshare(CLONE_NEWUSER)
requires the calling process to be single-threaded (or no background threads are running). So, it is often invoked after fork(2)
as forking propagates only the calling thread to the child process.
Problem¶
It becomes a problem that Ruby 3.3 on Linux uses timer threads even for a single-Thread
ed application. Because Kernel#fork
spawns a thread in the child process before the control returns to the user code, there is no chance to call unshare(CLONE_NEWUSER)
in Ruby.
The following snippet is a reproducer of this problem. This program first forks and then shows the user namespace to which the process belongs before and after calling unshare(2). It also shows the threads of the child process after forking.
p(RUBY_DESCRIPTION:)
require 'fiddle/import'
module C
extend Fiddle::Importer
dlload 'libc.so.6'
extern 'int unshare(int flags)'
CLONE_NEWUSER = 0x10000000
def self.raise_system_call_error
raise SystemCallError.new(Fiddle.last_error)
end
end
pid = fork do
system("ps -O tid -T -p #$$")
system("ls -l /proc/self/ns/user")
if C.unshare(C::CLONE_NEWUSER) != 0
C.raise_system_call_error # => EINVAL with Ruby 3.3
end
system("ls -l /proc/self/ns/user")
end
p Process.wait2(pid)
The program successfully changes the user namespace with Ruby 3.2, but it raises EINVAL with Ruby 3.3. You can see Ruby 3.3 has two threads running after forking.
% rbenv shell 3.2 && ruby ./test.rb
{:RUBY_DESCRIPTION=>"ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]"}
PID TID S TTY TIME COMMAND
1585787 1585787 S pts/12 00:00:00 ruby ./test.rb
lrwxrwxrwx 1 kasumi kasumi 0 Feb 5 02:25 /proc/self/ns/user -> 'user:[4026531837]'
lrwxrwxrwx 1 nobody nogroup 0 Feb 5 02:25 /proc/self/ns/user -> 'user:[4026532675]'
[1585787, #<Process::Status: pid 1585787 exit 0>]
% rbenv shell 3.3 && ruby ./test.rb
{:RUBY_DESCRIPTION=>"ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]"}
PID TID S TTY TIME COMMAND
1585849 1585849 S pts/12 00:00:00 ruby ./test.rb
1585849 1585851 S pts/12 00:00:00 ruby ./test.rb
lrwxrwxrwx 1 kasumi kasumi 0 Feb 5 02:25 /proc/self/ns/user -> 'user:[4026531837]'
./test.rb:10:in `raise_system_call_error': Invalid argument (Errno::EINVAL)
from ./test.rb:24:in `block in <main>'
from ./test.rb:19:in `fork'
from ./test.rb:19:in `<main>'
[1585849, #<Process::Status: pid 1585849 exit 1>]
% rbenv shell master && ruby ./test.rb
{:RUBY_DESCRIPTION=>"ruby 3.4.0dev (2024-02-04T16:05:02Z master 8bc6fff322) [x86_64-linux]"}
PID TID S TTY TIME COMMAND
1585965 1585965 S pts/12 00:00:00 ruby ./test.rb
1585965 1585967 S pts/12 00:00:00 ruby ./test.rb
lrwxrwxrwx 1 kasumi kasumi 0 Feb 5 02:25 /proc/self/ns/user -> 'user:[4026531837]'
./test.rb:10:in `raise_system_call_error': Invalid argument (Errno::EINVAL)
from ./test.rb:24:in `block in <main>'
from ./test.rb:19:in `fork'
from ./test.rb:19:in `<main>'
[1585965, #<Process::Status: pid 1585965 exit 1>]
Workaround¶
My workaround is to rebuild ruby with rb_thread_stop_timer_thread
and rb_thread_start_timer_thread
exported, and use a C-ext that stops the timer thread before calling unshare
. This seems not robust because the process cannot know when the terminated thread is reclaimed by the kernel, after which the process is considered single-threaded.
#define _GNU_SOURCE 1
#include <sched.h>
#include <ruby/ruby.h>
static VALUE Unshare_s_unshare(VALUE _self, VALUE rflags) {
int const flags = NUM2INT(rflags);
rb_thread_stop_timer_thread();
usleep(1000); // FIXME: It takes some time for the kernel to remove the stopped thread?
int const ret = unshare(flags);
rb_thread_start_timer_thread();
if(ret != 0) rb_sys_fail_str(rb_sprintf("unshare(%#x)", flags));
return Qnil;
}
RUBY_FUNC_EXPORTED void
Init_unshare(void) {
VALUE rb_mUnshare = rb_define_module("Unshare");
rb_define_singleton_method(rb_mUnshare, "unshare", Unshare_s_unshare, 1);
rb_define_const(rb_mUnshare, "CLONE_NEWUSER", INT2FIX(CLONE_NEWUSER));
}
Questions¶
- Is this a limitation of Ruby?
- Is it safe (or even possible) to stop the timer thread during execution?
- If so, can we export it as the public API?
- But it may not so useful for this problem as explained in the workaround.
- Is it guaranteed that no other threads are running after forks?
- Are there any better ways to solve this issue?
- Can we somehow delay the start of the timer thread after forking, or hook into
fork
to run some code in the child process immediately after it spawns. - Can they be Ruby API instead of C API?
- Can we somehow delay the start of the timer thread after forking, or hook into