Project

General

Profile

Actions

Bug #20237

open

Unable to unshare(CLONE_NEWUSER) in Linux because of timer thread

Added by hanazuki (Kasumi Hanazuki) 3 months ago. Updated 3 months ago.

Status:
Assigned
Target version:
-
ruby -v:
ruby 3.4.0dev (2024-02-04T16:05:02Z master 8bc6fff322) [x86_64-linux]
[ruby-core:116581]

Description

Backgrounds

unshare(2) is a syscall in Linux to move the calling process into a fresh execution context. With unshare(CLONE_NEWUSER) you can move a process into a new user_namespace(7), where the process gains the full capability on the resources within the namespace. This is fundamental for Linux containers to achieve privilege separation. unshare(CLONE_NEWUSER) requires the calling process to be single-threaded (or no background threads are running). So, it is often invoked after fork(2) as forking propagates only the calling thread to the child process.

Problem

It becomes a problem that Ruby 3.3 on Linux uses timer threads even for a single-Threaded application. Because Kernel#fork spawns a thread in the child process before the control returns to the user code, there is no chance to call unshare(CLONE_NEWUSER) in Ruby.

The following snippet is a reproducer of this problem. This program first forks and then shows the user namespace to which the process belongs before and after calling unshare(2). It also shows the threads of the child process after forking.

p(RUBY_DESCRIPTION:)
require 'fiddle/import'
module C
  extend Fiddle::Importer
  dlload 'libc.so.6'

  extern 'int unshare(int flags)'
  CLONE_NEWUSER = 0x10000000

  def self.raise_system_call_error
    raise SystemCallError.new(Fiddle.last_error)
  end
end

pid = fork do
  system("ps -O tid -T -p #$$")
  system("ls -l /proc/self/ns/user")

  if C.unshare(C::CLONE_NEWUSER) != 0
    C.raise_system_call_error  # => EINVAL with Ruby 3.3
  end

  system("ls -l /proc/self/ns/user")
end

p Process.wait2(pid)

The program successfully changes the user namespace with Ruby 3.2, but it raises EINVAL with Ruby 3.3. You can see Ruby 3.3 has two threads running after forking.

% rbenv shell 3.2 && ruby ./test.rb
{:RUBY_DESCRIPTION=>"ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]"}
    PID     TID S TTY          TIME COMMAND
1585787 1585787 S pts/12   00:00:00 ruby ./test.rb
lrwxrwxrwx 1 kasumi kasumi 0 Feb  5 02:25 /proc/self/ns/user -> 'user:[4026531837]'
lrwxrwxrwx 1 nobody nogroup 0 Feb  5 02:25 /proc/self/ns/user -> 'user:[4026532675]'
[1585787, #<Process::Status: pid 1585787 exit 0>]

% rbenv shell 3.3 && ruby ./test.rb
{:RUBY_DESCRIPTION=>"ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]"}
    PID     TID S TTY          TIME COMMAND
1585849 1585849 S pts/12   00:00:00 ruby ./test.rb
1585849 1585851 S pts/12   00:00:00 ruby ./test.rb
lrwxrwxrwx 1 kasumi kasumi 0 Feb  5 02:25 /proc/self/ns/user -> 'user:[4026531837]'
./test.rb:10:in `raise_system_call_error': Invalid argument (Errno::EINVAL)
        from ./test.rb:24:in `block in <main>'
        from ./test.rb:19:in `fork'
        from ./test.rb:19:in `<main>'
[1585849, #<Process::Status: pid 1585849 exit 1>]

% rbenv shell master && ruby ./test.rb
{:RUBY_DESCRIPTION=>"ruby 3.4.0dev (2024-02-04T16:05:02Z master 8bc6fff322) [x86_64-linux]"}
    PID     TID S TTY          TIME COMMAND
1585965 1585965 S pts/12   00:00:00 ruby ./test.rb
1585965 1585967 S pts/12   00:00:00 ruby ./test.rb
lrwxrwxrwx 1 kasumi kasumi 0 Feb  5 02:25 /proc/self/ns/user -> 'user:[4026531837]'
./test.rb:10:in `raise_system_call_error': Invalid argument (Errno::EINVAL)
        from ./test.rb:24:in `block in <main>'
        from ./test.rb:19:in `fork'
        from ./test.rb:19:in `<main>'
[1585965, #<Process::Status: pid 1585965 exit 1>]

Workaround

My workaround is to rebuild ruby with rb_thread_stop_timer_thread and rb_thread_start_timer_thread exported, and use a C-ext that stops the timer thread before calling unshare. This seems not robust because the process cannot know when the terminated thread is reclaimed by the kernel, after which the process is considered single-threaded.

#define _GNU_SOURCE 1
#include <sched.h>
#include <ruby/ruby.h>

static VALUE Unshare_s_unshare(VALUE _self, VALUE rflags) {
  int const flags = NUM2INT(rflags);
  rb_thread_stop_timer_thread();
  usleep(1000);  // FIXME: It takes some time for the kernel to remove the stopped thread?
  int const ret  = unshare(flags);
  rb_thread_start_timer_thread();
  if(ret != 0) rb_sys_fail_str(rb_sprintf("unshare(%#x)", flags));
  return Qnil;
}


RUBY_FUNC_EXPORTED void
Init_unshare(void) {
  VALUE rb_mUnshare = rb_define_module("Unshare");
  rb_define_singleton_method(rb_mUnshare, "unshare", Unshare_s_unshare, 1);
  rb_define_const(rb_mUnshare, "CLONE_NEWUSER", INT2FIX(CLONE_NEWUSER));
}

Questions

  • Is this a limitation of Ruby?
  • Is it safe (or even possible) to stop the timer thread during execution?
    • If so, can we export it as the public API?
    • But it may not so useful for this problem as explained in the workaround.
  • Is it guaranteed that no other threads are running after forks?
  • Are there any better ways to solve this issue?
    • Can we somehow delay the start of the timer thread after forking, or hook into fork to run some code in the child process immediately after it spawns.
    • Can they be Ruby API instead of C API?
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0