Project

General

Profile

Actions

Feature #16168

closed

Add keyword argument separation to C functions using rb_scan_args

Added by jeremyevans0 (Jeremy Evans) over 4 years ago. Updated over 4 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:94941]

Description

Most Ruby methods implemented as C functions that accept a variable number of arguments use rb_scan_args for argument handling. rb_scan_args supports a : character for option hash/keyword arguments, which operates similarly to how keyword arguments work for Ruby methods in Ruby 2.6.

If a C function accepts optional or variable arguments in addition to the option hash/keyword arguments, the keyword argument issues discussed in #14183 also apply. If the C function only accepts mandatory arguments and an option hash, then the argument handling is not ambiguous, and there shouldn't be issues with it. For this reason, I propose to change rb_scan_args so that if : is used with optional or variable arguments, it is treated as keywords (keyword mode), whereas if only used with mandatory arguments, it is treated as an optional positional argument (regular mode).

For example, I would like to treat the following rb_scan_args calls similar to the following Ruby method definitions:

rb_scan_args(argc, argv, ":", ...)     # def foo(opts = {})
rb_scan_args(argc, argv, "1:", ...)    # def foo(v1, opts = {})

rb_scan_args(argc, argv, "01:", ...)   # def foo(v1=nil, **opts)
rb_scan_args(argc, argv, "11:", ...)   # def foo(v1, v2=nil, **opts)

rb_scan_args(argc, argv, "*:", ...)    # def foo(*args, **opts)
rb_scan_args(argc, argv, "1*:", ...)   # def foo(v1, *args, **opts)
rb_scan_args(argc, argv, "01*:", ...)  # def foo(v1=nil, *args, **opts)
rb_scan_args(argc, argv, "11*:", ...)  # def foo(v1, v2=nil, *args, **opts)

In 2.7, I would like to keep rb_scan_args behavior the same as it is now, but add warnings for cases that will change in 3.0. These mirror the handling of keyword arguments for Ruby methods in 2.7 and 3.0.

One behavior change in 2.7 compared to previous versions is that in keyword mode, calling the C method with an empty keyword splat (e.g. **{}) will make rb_scan_args not consider the last positional argument as possible keyword arguments. This is an important change to make, so that you can call the C method with a way to disable the option parsing.

There are also some related issues in rb_scan_args that I would like to fix (with warnings in 2.7 and behavior changes in 3.0):

  • rb_scan_args currently splits hashes, separating the non-symbol keys into a separate positional argument. I would like to remove this splitting in both keyword mode and regular mode, as Ruby will no longer split hashes in 3.0 for Ruby methods.

  • rb_scan_args currently treat a nil value as an empty hash in some cases. Ruby methods do not do this, and I think it would be best if C methods stopped doing this in both regular mode and keyword mode.

  • rb_scan_args will use keyword arguments if necessary to fill a mandatory positional argument. I think we should keep this behavior in regular mode and remove it in keyword mode, similarly to how Ruby methods handle the situation.

  • rb_scan_args, when called with an empty keyword splat, currently breaks compatibility with Ruby 2.6, which passes an empty hash in this case. Restore backwards compatibility by adding the empty hash if needed for a mandatory positional argument, but this behavior should be removed in 3.0.

There are a few places in Ruby where rb_scan_args is called indirectly, where a Ruby method is implemented in C, and it calls rb_scan_args with the arguments passed from Ruby, but then it calls another C function with a different set of arguments. In this case, we cannot use the rb_keyword_given_p and rb_empty_keyword_given_p functions to determine how the arguments should be handled, as these rely on VM frame flags, and the frame flags may not match the arguments passed to rb_scan_args. Handle this case by allowing prepending the rb_scan_args string with a k, e, or n prefix:

  • k: Treat as if the keyword-given flag is set (last argument should be a hash).

  • e: Treat as if the empty-keyword-given flag is set (in keyword mode, do not look for a last positional hash).

  • n: In keyword mode, assume the call did not set the keyword or empty keyword flags, but do not issue a warning if the last argument is a hash that is treated as keywords.

With this approach, external C extensions will be able to get backwards-compatible behavior on Ruby < 2.7 by using a macro. On 2.7+, the macro can be defined to "k", "e", or "n", and on < 2.7, it can be defined to "".

I have a pull request that has passed CI (https://github.com/ruby/ruby/pull/2460) that implements all of the above. It is also attached as patch. It fixes all cases in the tests, core, extensions, stdlib where warnings were raised due to changes. The stdlib and test fixes (Ruby level) are mostly adding keyword splats instead of passing as positional arguments, or making sure to use an empty hash instead of nil as an option hash. The core and extension changes are mostly switching to *_kw functions to appropriately pass that arguments should be treated as keywords if called with keywords.


Files

rb_scan_args-keyword-argument-separation.patch (61.8 KB) rb_scan_args-keyword-argument-separation.patch jeremyevans0 (Jeremy Evans), 09/15/2019 08:56 PM

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago

At the last developer meeting, matz requested that instead of "k", "e", and "n" modifiers to rb_scan_args, a new rb_scan_args_kw method be added for that behavior. I have updated the pull request to add that method.

matz also had a question:

Why is rb_scan_args(":") treated as def foo(opt = {})? If there is no special reason, I’d like to handle it as def foo(**opt)

The benefit of treating it as opt = {} is that it is better for backwards compatibility with existing code, because you can continue to pass in a hash instead of a keyword splat. Passing in a hash is also better for performance, as a keyword splat requires a hash allocation.

The benefit of treating it as **opt is that you could add optional or variable arguments to the method later and retain backwards compatibility. If you treat it as opt = {}, then adding optional or variable arguments would not be backwards compatible.

If a method accepts optional or variable arguments and the method accepts keyword arguments, you cannot treat : as opt = {} as it would not be backwards compatible. Also, methods that accept optional or variable arguments and keyword arguments are the main source of issues with keywords, and the primary reason we are introducing keyword argument separation.

Treating : as **opt in methods that otherwise only accept mandatory arguments requires changing all call sites that pass in a positional hash to pass a keyword splat. This causes additional work when upgrading Ruby, it hurts performance, and it can actually change behavior, because the empty keyword splat and an empty hash can be treated differently by the code. That's likely to be a bug in the related code, but it is going to happen, and actually does happen in stdlib (in the json library).

As a test, I worked on always treating : as **opt, and the changes necessary to get make check passing were not that extensive (which surprised me). If you want to try out this behavior, I have added a branch on GitHub for it: https://github.com/jeremyevans/ruby/tree/rb_scan_args-colon-always-keyword

I still recommend treating : as opt={} for methods that do not accept optional or variable arguments, as I think backwards compatibility with existing code and better performance outweigh the ability to add optional or variable arguments later in a backwards compatible manner. However, treating : always as **opt may be more consistent. Whatever the decision is, hopefully the decision on this can be made before 2.7.0 preview2 is released.

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago

  • Status changed from Open to Closed

matz decided to always treat : in rb_scan_args as **opt, so I have merged my rb_scan_args-colon-always-keyword branch at 80b5a0ff2a7709367178f29d4ebe1c54122b1c27.

Actions #3

Updated by sawa (Tsuyoshi Sawada) over 4 years ago

  • Description updated (diff)
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0