Project

General

Profile

Actions

Feature #6648

open

Provide a standard API for retrieving all command-line flags passed to Ruby

Added by headius (Charles Nutter) about 12 years ago. Updated 22 days ago.

Status:
Assigned
Target version:
-
[ruby-core:45867]

Description

Currently there are no standard mechanisms to get the flags passed to the currently running Ruby implementation. The available mechanisms are not ideal:

  • Scanning globals and hoping they have not been tweaked to new settings
  • Using external wrappers to launch Ruby
  • ???

Inability to get the full set of command-line flags, including flags passed to the VM itself (and probably VM-specific) makes it impossible to launch subprocess Ruby instances with the same settings.

A real world example of this is "((%bundle exec%))" when called with a command line that sets various flags, a la ((%jruby -Xsome.vm.setting --1.9 -S bundle exec%)). None of these flags can propagate to the subprocess, so odd behaviors result. The only option is to put the flags into an env var (((|JRUBY_OPTS|)) or ((|RUBYOPT|))) but this breaks the flow of calling a simple command line.

JRuby provides mechanisms to get all its command line options, but they require calling Java APIs from Ruby's API set. Rubinius provides its own API for accessing comand-line options, but I do not know if it includes VM-level flags as well as standard Ruby flags.

I know there is a (({RubyVM})) namespace in the 2.0 line. If that namespace is intended to be general-purpose for VM-level features, it would be a good host for this API. Something like...

  class << RubyVM
    def vm_args; end # returns array of command line args *not* passed to the target script

    def script; end # returns the script being executed...though this overlaps with $0

    def script_args; end # returns args passed to the script...though this overlaps with ARGV, but that is perhaps warranted since ARGV can be modified (i.e. you probably want the original args)
  end

Related issues 1 (0 open1 closed)

Is duplicate of Ruby master - Feature #4046: Saving C's **argv and cwd allows Ruby programs to reliably restart themselvesFeedbackActions

Updated by headius (Charles Nutter) about 12 years ago

Oops, this should be a feature request.

Actions #2

Updated by nobu (Nobuyoshi Nakada) about 12 years ago

  • Tracker changed from Bug to Feature

I'm positive to the feature, but RubyVM wouldn't be right place.
It is CRuby specific and unexpected to be in other implementations.

Updated by nobu (Nobuyoshi Nakada) about 12 years ago

  • Description updated (diff)
Actions #4

Updated by headius (Charles Nutter) almost 12 years ago

ARGV is a special class; perhaps ARGV could have the methods?

Updated by headius (Charles Nutter) almost 12 years ago

I was mistaken...it is ARGF, not ARGV that is a special class. ARGV is a normal array.

Another option: ENV, which is a special Hash-like class. ENV.vm_args, ENV.script, and ENV.script_args aren't bad.

Updated by headius (Charles Nutter) over 11 years ago

Ping...we'd still like to have this to be able to build a unifying benchmark tool, which needs to be able to report the actual command-line arguments passed to the runtime. Current tricks are too ugly (parsing ps output, for example), and this would not be difficult to add.

I'm leaning toward ENV having a few special methods for this, but I'm open to other ideas.

Updated by mame (Yusuke Endoh) over 11 years ago

  • Status changed from Open to Assigned
  • Assignee set to matz (Yukihiro Matsumoto)
  • Target version set to 2.6

I'm sorry that matz didn't noticed this.

--
Yusuke Endoh

Actions #8

Updated by naruse (Yui NARUSE) over 6 years ago

  • Target version deleted (2.6)
Actions #9

Updated by hsbt (Hiroshi SHIBATA) 3 months ago

  • Description updated (diff)

Updated by Dan0042 (Daniel DeLorme) 2 months ago · Edited

I'd like to revive this proposal.

The OP mentions calling a subcommand with the same options/flags as the current interpreter, and that's a fine use case. As for me I'm also interested in re-executing the current script while keeping ruby options/flags.

Some time ago I tried writing a rbenv alternative based on the idea of adding "-r versionchecker" to RUBYOPT and then re-executing the current script with a different interpreter if we find a .ruby-version file that specifies a different version. No bash, no shims! But it was not to be; the lack of this proposed API made it infeasible. In particular if ruby is executed with the -e argument it appears impossible to get back the value.

I imagine this feature would also be very useful for web servers that need to re-execute upon receiving USR2. Currently they need to have all their options in RUBYOPT.

Since the path to the current interpreter is already in RbConfig.ruby I would suggest RbConfig.ruby_args for this API.

Then we could have a copy of the original $0 in RbConfig.script and a copy of the original ARGV in RbConfig.script_args, and to re-execute we can do

exec(RbConfig.ruby, *RbConfig.ruby_args, *RbConfig.script, *RbConfig.script_args)

Extra features I'd like, if possible:

  1. if ruby is invoked with -e argument(s), $0 is "-e" but RbConfig.script should be an array of the arguments:
ruby -e 'p 42' -e 'p RbConfig.script'
42
["-e", "42", "-e", "p RbConfig.script"]
  1. if ruby is invoked with script on stdin, $0 is "-" but RbConfig.script should be an array with "-e":
echo 'p RbConfig.script' | ruby
["-e", "p RbConfig.script"]

If either of those extra features are impossible/undesirable, RbConfig.script should be false so that exec/system fails with TypeError rather than executing random things.

Updated by Eregon (Benoit Daloze) 2 months ago · Edited

I fully agree with the proposal of @Dan0042 (Daniel DeLorme).
This is also needed for MSpec, which currently works around the lack of it by requiring to pass any ruby option through -T-option (which is awkward and error-prone), it would be much nicer if we could have RbConfig.ruby_args.

In fact MSpec is also forced to create an extra process due to the lack of this API (which is a noticeable overhead, even more so on Ruby implementations with a slower startup than CRuby), because that's currently the only way to ensure the main process specs and subprocesses created by specs have the same VM options.
We cannot know if the initial process from the mspec executable has the same ruby options as the options passed through -T (typically not), hence the extra process.

Updated by kddnewton (Kevin Newton) 2 months ago

As another note, this would be useful within CRuby itself. Right now there are lots of tests that run assert_in_out_err, which in turn calls EnvUtil.invoke_ruby. EnvUtil.invoke_ruby does not pass along some command-line options like RJIT, YJIT, Prism, etc. So there appear to be some tests that are being run in the CRuby CI that aren't testing what they should be testing.

Updated by Eregon (Benoit Daloze) about 2 months ago

@matz (Yukihiro Matsumoto) Do you agree with RbConfig.ruby_args, is it OK to add it?

Updated by matz (Yukihiro Matsumoto) about 2 months ago

If RbConfig is a convenient place for you, it is OK to add ruby_args.

Matz.

Updated by nobu (Nobuyoshi Nakada) about 2 months ago

RbConfig is for build time informations, and does not look a right place for runtime informations.

Updated by headius (Charles Nutter) about 2 months ago

Note that for this to be most effective it would be the arguments unprocessed as they appear on the command line, but that may not be possible to do if the shell removes quoting.

I don't think that should be a reason not to implement this issue, but if there are quoted arguments on the command line they might have to be re-quoted by the user if they are passed through another shell to launch. They should work fine if passed as direct arguments to spawn.

Updated by Dan0042 (Daniel DeLorme) about 2 months ago

nobu (Nobuyoshi Nakada) wrote in #note-15:

RbConfig is for build time informations, and does not look a right place for runtime informations.

Isn't it ok to relax the semantics a little bit? RbConfig seems to me the most logical place for "ruby configuration", both run time and build time.

But this does bring the excellent point that RbConfig.ruby is not necessarily the location of the ruby interpreter as I previously thought:

$ ruby -e 'p RbConfig.ruby'
"/opt/ruby/3.2/bin/ruby"
$ cp /opt/ruby/3.2/bin/ruby rubyyyy
$ ./rubyyyy -e 'p RbConfig.ruby'
"/opt/ruby/3.2/bin/ruby"

So it's not quite suitable for re-executing. So we could either

  • change RbConfig.ruby to be the current ruby interpreter (because TBH I'm not sure what's the use of this current RbConfig.ruby)
  • add a new method like RbConfig.ruby_executable
  • use a different namespace like Process.ruby and Process.ruby_args

headius (Charles Nutter) wrote in #note-16:

if there are quoted arguments on the command line they might have to be re-quoted by the user if they are passed through another shell to launch.

Wouldn't you normally use Shellwords for this? The original quoting is not available to ruby anyway.

Updated by Dan0042 (Daniel DeLorme) about 1 month ago · Edited

  • change RbConfig.ruby to be the current ruby interpreter (because TBH I'm not sure what's the use of this current RbConfig.ruby)

@nobu (Nobuyoshi Nakada) what are your thoughts on the above?
For example in the test suite, in test/set/test_sorted_set.rb we can see r = system(RbConfig.ruby, *options, '-e', ruby)
and it seems to me like that's wrong; the system method is executing the installed ruby rather than the compiled ruby that is supposedly under test.

If you think this is correct and it's fine that RbConfig.ruby returns a static path, we need a different place for ruby_args
Or if you're not ok with relaxing the semantics of RbConfig then we also need a different place, maybe Process.ruby_args

Updated by Eregon (Benoit Daloze) 27 days ago

@Dan0042 (Daniel DeLorme)
I think it's everyone's understanding that RbConfig.ruby should always be the path of the currently-running ruby.
In fact it is already the case e.g. on TruffleRuby.
And I suspect it's also already the case on CRuby with --enable-load-relative (but it would be nice if someone can check, if it's not we should fix that).
cp /opt/ruby/3.2/bin/ruby rubyyyy is simply unsupported on non---enable-load-relative CRuby.
Finding the path of the current executable is something that is not available on every platform yet it is supported on all major platforms.

Given the existence of RbConfig.ruby, I think RbConfig.ruby_args is the best fit.

(BTW there is Process.argv0 which is about (Ruby) ARGV[0] and not (C) argv[0], so it seems better to me to put the method somewhere else than Process, to avoid mixing levels there)

Updated by Dan0042 (Daniel DeLorme) 26 days ago

I think it's everyone's understanding that RbConfig.ruby should always be the path of the currently-running ruby.

Yes I believe that is everyone's understanding. At least it was mine. And it turns out to be incorrect. Sure in the vast majority of cases the static install path and the currently-running ruby are going to be the same thing, so one might say it's too small a detail to care about. But I happen to care about small details.

And I suspect it's also already the case on CRuby with --enable-load-relative (but it would be nice if someone can check, if it's not we should fix that).

I tried, and --enable-load-relative doesn't appear to be a supported option in any version of ruby.,

Given the existence of RbConfig.ruby, I think RbConfig.ruby_args is the best fit.

I agree.

(BTW there is Process.argv0 which is about (Ruby) ARGV[0] and not (C) argv[0]

I'm afraid not; Process.argv0 is about ruby $0 which is very different from ARGV[0]

Updated by Eregon (Benoit Daloze) 25 days ago

Dan0042 (Daniel DeLorme) wrote in #note-20:

I tried, and --enable-load-relative doesn't appear to be a supported option in any version of ruby.,

It's a ./configure option: ./configure --enable-load-relative.
Sorry I should have made that clear.

(BTW there is Process.argv0 which is about (Ruby) ARGV[0] and not (C) argv[0]

I'm afraid not; Process.argv0 is about ruby $0 which is very different from ARGV[0]

Ah right, the name of that method is so confusing (IMO it shouldn't exist, redundant with $0).
It's mostly like argv[0] in C but it returns the Ruby script path being run (vs path of the current executable) and yet it's not ARGV[0].
So sort of related to this issue, but so awfully confusing I don't think we want to follow that unfortunate naming.

I like your proposed naming in https://bugs.ruby-lang.org/issues/6648#note-10 but I think we should add RbConfig.ruby_args before the rest and file a new ticket for the rest.
(re-executing the same script with the same arguments is a special case, there are more use cases for RbConfig.ruby_args)

Updated by Dan0042 (Daniel DeLorme) 25 days ago

IMO it shouldn't exist, redundant with $0

Keep in mind that $0 can be set as process name, so Process.argv0 is not redundant (despite the unfortunate naming).

I like your proposed naming in https://bugs.ruby-lang.org/issues/6648#note-10 but I think we should add RbConfig.ruby_args before the rest and file a new ticket for the rest.

Agreed. This will also allow me to make a clearer point for the security risk of re-executing $0 when it is equal to "-e"

Updated by mame (Yusuke Endoh) 24 days ago

I am afraid if it is more difficult than expected to do "launch subprocess Ruby instances with the same settings".

I am not very familiar with Windows, but I have heard that there is no concept of "an array of command-line arguments" in Windows. A command line is represented as a single string. On Windows, system("exe", "ary1", "ary2") is converted to a single string and executed via the shell (sometimes, I am not sure the condition). This exotic command line argument handling in Windows can lead to vulnerabilities.

What I'm trying to say is, it could be difficult to guarantee exec(RbConfig.ruby, *RbConfig.ruby_args, RbConfig.script, *RbConfig.script_args) will always achieve "launch subprocess Ruby instances with the same settings".

If you really want to "launch subprocess Ruby instances with the same settings", we might want to consider a more dedicated API for it, instead of parsing the command line to a string array and passing it to Kernel#exec.

Updated by Eregon (Benoit Daloze) 24 days ago

@mame (Yusuke Endoh) CRuby already needs to get arguments as an array to parse command-line flags, so RbConfig.ruby_args just exposes that.
If CRuby can parse these Ruby command-line flags, for sure we can save them in some kind of array.

IIRC these extra complications are only relevant in .bat files, the C main still receives an array of arguments on Windows.

Updated by shyouhei (Shyouhei Urabe) 24 days ago

Eregon (Benoit Daloze) wrote in #note-24:

@mame (Yusuke Endoh) CRuby already needs to get arguments as an array to parse command-line flags, so RbConfig.ruby_args just exposes that.
If CRuby can parse these Ruby command-line flags, for sure we can save them in some kind of array.

This is true. Technically we can provide such array. But for what reason? The question is its usage.

IIRC these extra complications are only relevant in .bat files, the C main still receives an array of arguments on Windows.

Background: This is how we execute external process in Windows: https://github.com/ruby/ruby/blob/029d92b8988d26955d0622f0cbb8ef3213200749/win32/win32.c#L1541-L1544
Also background: Windows API for creating a process: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessw

So this is not about receiving arguments but calling a process. As you see there is no Windows API that takes char**. We cannot safely pass through what we have. You have to concatenate them into one argument string (LPWSTR lpCommandLine), with proper escaping of whitespace etc. This is where the security concern arises. Because process arguments come from out of the process itself by nature, there is no guarantee that they are written by good will. I have to say it is at least dangerous to "escape" them to be "safe" to pass to a process invoking API. Our current implementation is not ready for that... Is it even possible?

Updated by shyouhei (Shyouhei Urabe) 23 days ago

In short the problem we see is feeding strings from untrusted sources to generic Kernel#exec. Sounds ultra risky, no?

Let's not do so. If what is needed is just launching a ruby process, we could perhaps design a workaround.

Updated by Dan0042 (Daniel DeLorme) 23 days ago

shyouhei (Shyouhei Urabe) wrote in #note-26:

In short the problem we see is feeding strings from untrusted sources to generic Kernel#exec. Sounds ultra risky, no?

It also sounds nothing like what this proposal is about. If the current script was executed with ruby --enable=jit foo.rb then it is, by definition, safe to run exec("ruby", "--enable=jit", "foo.rb")
It's already possible to run exec("ruby", "foo.rb"); changing it to exec("ruby", *RbConfig.ruby_args, "foo.rb") does not reduce security, in fast it increases security.

Updated by Dan0042 (Daniel DeLorme) 23 days ago

As you see there is no Windows API that takes char**. We cannot safely pass through what we have. You have to concatenate them into one argument string (LPWSTR lpCommandLine), with proper escaping of whitespace etc.

But this issue is not specific to RbConfig.ruby_args is it? You have to do the concatenation in exec/system anyway; RbConfig.ruby_args will not change this situation either for better or worse.

Because process arguments come from out of the process itself by nature, there is no guarantee that they are written by good will.

Can you explain that one? I don't understand how valid ruby options like --enable=jit could be "not written by good will".

Updated by shyouhei (Shyouhei Urabe) 23 days ago

Please note that I'm not necessarily against a way to call the current ruby executable. I just say doing so using exec is a bad idea, because exec is not designed for that purpose.

The current situation is that ruby is not the only valid executable that the method takes. Allowing untrusted inputs for it means it has to be secure for everything. This is too much a hustle. Better find a fine-grained alternative.

Updated by Eregon (Benoit Daloze) 22 days ago

shyouhei (Shyouhei Urabe) wrote in #note-29:

The current situation is that ruby is not the only valid executable that the method takes. Allowing untrusted inputs for it means it has to be secure for everything. This is too much a hustle. Better find a fine-grained alternative.

There is no untrusted input involved here, because the user chooses what flags to pass to ruby.
If ruby flags can be injected by an attacker, then all is lost regardless of this change (e.g. they can just inject -r/backdoor.rb).

Regarding the Windows concern, it is the exact same problem for e.g. spawn("dir", "*.mp3", "/s").
From what I can see, it is completely separate from this ticket.
The code for this on Windows must already escape as much as feasible, and if it fails it's a bug of that code which should be fixed to fix spawn etc in general, nothing to change in RbConfig.ruby_args.
Or if the escaping fails maybe it's just considered a Windows limitation, independent of this ticket.

For example, the user is running ruby --yjit -rmytracing script.rb, the only addition is the script can now find out the ruby flags it was called with (["--yjit", "-rmytracing"]).
If the script spawn subprocesses, it already did before, so nothing changes there.
It can choose to use RbConfig.ruby_args, and that's fine, the user running the script is responsible for whether it's safe to run the script, as always.

Let's take the MSpec use-case (mentioned before in https://bugs.ruby-lang.org/issues/6648#note-11), what we want is to run Ruby subprocess with the same Ruby flags.
So if e.g. ruby --yjit -rmytracing path/to/mspec is called, then if specs create subprocesses (via ruby_exe()), then those subprocesses (running some fixture) also have --yjit -rmytracing, as desired.
You might argue RUBYOPT could be used instead, but that is problematic for various reasons: some flags are not allowed in RUBYOPT, RUBYOPT gets propagated arbitrarily far which is not necessarily desired (including to other Ruby implementations and executables written in Ruby, etc).

A concrete example I often run into is running ruby/spec with TruffleRuby,
I pass --core-load-path=.../src/main/ruby/truffleruby to use core library files from disk in development.
It is critical that subprocesses in specs also use that (otherwise we'd get an inconsistent core library).
The current workaround is to pass that flag both to the ruby process and as -T, which is quite ugly but it also slow, because it means mspec must create an extra subprocess just to apply these -T flags (it actually uses exec but that's just as slow):

$ mxbuild/truffleruby-jvm/bin/ruby \
  --experimental-options --core-load-path=src/main/ruby/truffleruby \
  spec/mspec/bin/mspec run \
  --config spec/truffleruby.mspec \
  -t .../mxbuild/truffleruby-jvm/bin/ruby \
  --excl-tag fails --excl-tag slow \
  -T--vm.ea -T--vm.esa \
  -T--experimental-options -T--core-load-path=src/main/ruby/truffleruby

(as you can see there are already some bugs there because the -T and regular flags don't match exactly).
(if you think this could be wrapped in some helper script, it already is, but it changes nothing because it must accept arbitrary ruby flags to be passed)

With RbConfig.ruby_args, MSpec can know which ruby flags it was passed, which would avoid needing the extra subprocess, and it would be:

$ mxbuild/truffleruby-jvm/bin/ruby \
  --vm.ea --vm.esa \
  --experimental-options --core-load-path=src/main/ruby/truffleruby \
  spec/mspec/bin/mspec run \
  --config spec/truffleruby.mspec \
  -t .../mxbuild/truffleruby-jvm/bin/ruby \
  --excl-tag fails --excl-tag slow

This would be a killer feature when attaching a debugger, because then one could just myruby -rmydebug spec/mspec/bin/mspec and it would start running specs with the debugger (e.g. for TruffleRuby with the Java debugger).
Instead of the current situation where the debugger is started on this mspec "wrapper" which just exec's to handle -T flags and is very annoying.

This happens in CRuby just as much, for example https://github.com/ruby/ruby/blob/69c0b1438a45938e79e63407035f116de4634dcb/spec/default.mspec#L27-L31 is a workaround causing some duplication.
And it makes much more messy to e.g. running a single spec under gdb/lldb.

A very similar situation happens for make test-all I would imagine.
I guess currently any Ruby subprocesses incorrectly omits Ruby flags, which means the test coverage is lower than intended (e.g. for --yjit, --rjit and all other flags).
This would also be convenient for the way to run built-but-not-installed-ruby, which every CRuby developer uses.

Updated by Eregon (Benoit Daloze) 22 days ago

mame (Yusuke Endoh) wrote in #note-23:

[...] instead of parsing the command line to a string array and passing it to Kernel#exec.

Don't we use execve() (which takes char**) as well on Windows for Kernel#exec?
It does seem to exist: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/execve-wexecve?view=msvc-170

Did you mean Kernel#spawn/Kernel#system maybe?

Of course, RbConfig.ruby_args would return an Array, not a String.

Updated by Eregon (Benoit Daloze) 22 days ago

And from a quick look there is also _spawnv which does take a char**, maybe we could use that on Windows?

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0