Bug #18007
closedHelp developers of C extensions meet requirements in "doc/extension.rdoc"
Description
A pull request for this feature has been submitted at https://github.com/ruby/ruby/pull/4604
Problem being solved¶
This option is intended to help developers of C extensions to check if their code meets the requirements explained in "doc/extension.rdoc". Specifically, I want to ensure that T_DATA
object classes undefine or redefine the alloc function.
Since Object.allocate allocates an ordinary T_OBJECT type (instead of T_DATA), it's important to either use rb_define_alloc_func() to overwrite it or rb_undef_alloc_func() to delete it.
(note: which matters when using TypedData_Make_Struct/TypedData_Wrap_Struct as the native pointer is supplied without calling the class alloc function).
There is currently no easy way for an author of a C extension to easily see where they have made the mistake of letting the default alloc function remain for their class (and therefore class.new
creating a T_OBJECT instead of T_DATA and not setting the data pointer).
Description of the solution¶
Compiled with this option, Ruby will warn when a T_DATA
object is created whose class has not undefined or redefined the alloc function.
A new function is defined, rb_data_object_check
. That function is called from rb_data_object_wrap()
and
rb_data_typed_object_wrap()
(which implement the Data_Wrap_Struct
family of macros).
The warning, when emitted, looks like this:
warning: T_DATA class Nokogiri::XML::Document should undefine or redefine the alloc function, please see doc/extension.rdoc
Examples of this problem in the wild¶
Using this option, I found that many of Nokogiri's classes needed to undefine allocate
.
This PR also updates these core Ruby classes by undefining allocate
:
ObjectSpace::InternalObjectWrapper
Socket::Ifaddr
Questions for reviewers¶
Does this check really need to be behind a configuration option? Performance impact is very small (see benchmarks below), but I put it behind a flag because I am worried that there may be a many C extensions that would emit warnings at runtime, and the users of those extensions cannot fix the problem and so would mostly just be annoyed.
Should this warning be emitted with the deprecated
category?
Benchmarking¶
I benchmarked this code by allocating Nokogiri::XML::NodeSet
s in a loop. This is a class with a relatively simple allocate
function.
The runs cover the four combinations of enabled/disabled, and warnings/no-warnings.
ruby 3.1.0dev (2021-06-25T04:02:18Z flavorjones-extens.. de943189aa) [x86_64-linux]
Warming up --------------------------------------
disabled, warn=false 490.143k i/100ms
Calculating -------------------------------------
disabled, warn=false 4.863M (± 1.5%) i/s - 49.014M in 10.081177s
ruby 3.1.0dev (2021-06-25T04:02:18Z flavorjones-extens.. de943189aa) [x86_64-linux]
Warming up --------------------------------------
disabled, warn=true 483.070k i/100ms
Calculating -------------------------------------
disabled, warn=true 4.839M (± 1.4%) i/s - 48.790M in 10.083899s
Comparison:
disabled, warn=false: 4863064.0 i/s
disabled, warn=true: 4839310.1 i/s - same-ish: difference falls within error
ruby 3.1.0dev (2021-06-25T04:02:18Z flavorjones-extens.. de943189aa) [x86_64-linux]
Warming up --------------------------------------
enabled, warn=false 484.398k i/100ms
Calculating -------------------------------------
enabled, warn=false 4.840M (± 1.9%) i/s - 48.440M in 10.011854s
Comparison:
disabled, warn=false: 4863064.0 i/s
enabled, warn=false: 4840123.2 i/s - same-ish: difference falls within error
disabled, warn=true: 4839310.1 i/s - same-ish: difference falls within error
ruby 3.1.0dev (2021-06-25T04:02:18Z flavorjones-extens.. de943189aa) [x86_64-linux]
Warming up --------------------------------------
enabled, warn=true 492.200k i/100ms
Calculating -------------------------------------
enabled, warn=true 4.866M (± 2.1%) i/s - 48.728M in 10.017455s
Comparison:
enabled, warn=true: 4866434.8 i/s
disabled, warn=false: 4863064.0 i/s - same-ish: difference falls within error
enabled, warn=false: 4840123.2 i/s - same-ish: difference falls within error
disabled, warn=true: 4839310.1 i/s - same-ish: difference falls within error
My conclusion is that the performance impact is very small, and we could omit the option if the Ruby core maintainers decide this behavior should be on by default.
Updated by nobu (Nobuyoshi Nakada) over 3 years ago
I don't think the configuration option is needed.
And Check_Type(klass, T_CLASS);
can be in rb_data_object_check
.
Updated by mdalessio (Mike Dalessio) over 3 years ago
nobu (Nobuyoshi Nakada) wrote in #note-1:
I don't think the configuration option is needed.
AndCheck_Type(klass, T_CLASS);
can be inrb_data_object_check
.
Thank you for your review. I've updated the pull request.
Updated by naruse (Yui NARUSE) over 3 years ago
I want Ruby 3.1 to be compatible with 3.0. Therefore even if it is accepted, you can merge this in 3.2.
Updated by mdalessio (Mike Dalessio) over 3 years ago
Naruse, thank you for your time.
Would you consider for 3.1 if this change was behind a configuration option that defaulted to disabled?
Updated by mame (Yusuke Endoh) over 3 years ago
How about raising an exception when attempting to allocate a T_DATA
object with the default allocator?
Updated by nobu (Nobuyoshi Nakada) over 3 years ago
mame (Yusuke Endoh) wrote in #note-5:
How about raising an exception when attempting to allocate a
T_DATA
object with the default allocator?
Though I agreed it once, no way to tell if the given class is T_DATA
in the default allocator.
Updated by Eregon (Benoit Daloze) over 3 years ago
I'm confused, why does Class#allocate
not simply use the alloc function set with rb_define_alloc_func
?
Then there would be no need to undefine/redefine allocate, isn't it?
(IMHO users should never call allocate
directly because it returns an uninitialized object which is likely to break when used, but that probably needs documentation, more thoughts, etc)
Updated by nobu (Nobuyoshi Nakada) over 3 years ago
Eregon (Benoit Daloze) wrote in #note-7:
I'm confused, why does
Class#allocate
not simply use the alloc function set withrb_define_alloc_func
?
You're confused by Class#allocate
which calls the allocator set for each T_DATA
and rb_class_allocate
which implements the default allocator for T_OBJECT
.
The former method now calls the latter function set with rb_define_alloc_func
.
Updated by matz (Yukihiro Matsumoto) over 3 years ago
In my opinion, it should be merged, and it should cause errors (not warnings) in the future. But we need a migration path to ease the pain of the change.
Matz.
Updated by nobu (Nobuyoshi Nakada) over 3 years ago
@naruse (Yui NARUSE) objects to that warning appearing every time that instance is created.
So I propose to undefine the allocator in that check, so an exception will be raised when new
or allocate
are called later.
static inline void
rb_data_object_check(VALUE klass)
{
if (klass != rb_cObject && (rb_get_alloc_func(klass) == rb_class_allocate_instance)) {
rb_undef_alloc_func(klass);
#if 0 // TODO: enable at the next release
rb_warn("allocator of T_DATA class %"PRIsVALUE" got undefined", klass);
#endif
}
}
Updated by mdalessio (Mike Dalessio) over 3 years ago
Thank you for your time, Matz, Naruse, and Nobu.
I've made the requested change in the pull request to undefine the allocator in the check, and the warning is hidden by #if 0
.
Updated by Eregon (Benoit Daloze) over 3 years ago
My confusion was that the description of this issue talks about the allocate method, but actually it's about the alloc function.
In CRuby there is typically only one allocate method, Class#allocate, and that uses the alloc function associated with that class and goes up superclasses until it find one or finds UNDEF_ALLOC_FUNC
(which causes TypeError "allocator undefined for Foo").
rb_define_alloc_func()
and rb_undef_alloc_func()
set the alloc function, they don't change any method (but of course they affect the behavior of Class#allocate which looks at the alloc function).
Updated by mdalessio (Mike Dalessio) over 3 years ago
Benoit, thanks for helping clarify the language in the description. I've also updated the PR commit log to use this language.
Updated by nobu (Nobuyoshi Nakada) over 3 years ago
- Status changed from Open to Closed
Updated by nobu (Nobuyoshi Nakada) over 3 years ago
- Tracker changed from Feature to Bug
- Backport set to 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED
At least c0f4e4ca needs to be backported, since p ObjectSpace::InternalObjectWrapper.new
segfaults.
Updated by nagachika (Tomoyuki Chikanaga) over 3 years ago
- Backport changed from 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED to 2.6: REQUIRED, 2.7: REQUIRED, 3.0: DONE
ruby_3_0 c42208f8e24402fe1aa8747901fba275bfb0d56b merged revision(s) c0f4e4ca6d0f76985bca79314b232b787c8f008e.