Project

General

Profile

Actions

Bug #18007

closed

Help developers of C extensions meet requirements in "doc/extension.rdoc"

Added by mdalessio (Mike Dalessio) 4 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:104401]

Description

A pull request for this feature has been submitted at https://github.com/ruby/ruby/pull/4604

Problem being solved

This option is intended to help developers of C extensions to check if their code meets the requirements explained in "doc/extension.rdoc". Specifically, I want to ensure that T_DATA object classes undefine or redefine the alloc function.

https://github.com/ruby/ruby/blob/6963f8f743b42f9004a0879cd66c550f18987352/doc/extension.rdoc#label-Write+the+C+Code says:

Since Object.allocate allocates an ordinary T_OBJECT type (instead of T_DATA), it's important to either use rb_define_alloc_func() to overwrite it or rb_undef_alloc_func() to delete it.

(note: which matters when using TypedData_Make_Struct/TypedData_Wrap_Struct as the native pointer is supplied without calling the class alloc function).

There is currently no easy way for an author of a C extension to easily see where they have made the mistake of letting the default alloc function remain for their class (and therefore class.new creating a T_OBJECT instead of T_DATA and not setting the data pointer).

Description of the solution

Compiled with this option, Ruby will warn when a T_DATA object is created whose class has not undefined or redefined the alloc function.

A new function is defined, rb_data_object_check. That function is called from rb_data_object_wrap() and
rb_data_typed_object_wrap() (which implement the Data_Wrap_Struct family of macros).

The warning, when emitted, looks like this:

warning: T_DATA class Nokogiri::XML::Document should undefine or redefine the alloc function, please see doc/extension.rdoc

Examples of this problem in the wild

Using this option, I found that many of Nokogiri's classes needed to undefine allocate.

This PR also updates these core Ruby classes by undefining allocate:

  • ObjectSpace::InternalObjectWrapper
  • Socket::Ifaddr

Questions for reviewers

Does this check really need to be behind a configuration option? Performance impact is very small (see benchmarks below), but I put it behind a flag because I am worried that there may be a many C extensions that would emit warnings at runtime, and the users of those extensions cannot fix the problem and so would mostly just be annoyed.

Should this warning be emitted with the deprecated category?

Benchmarking

I benchmarked this code by allocating Nokogiri::XML::NodeSets in a loop. This is a class with a relatively simple allocate function.

The runs cover the four combinations of enabled/disabled, and warnings/no-warnings.

ruby 3.1.0dev (2021-06-25T04:02:18Z flavorjones-extens.. de943189aa) [x86_64-linux]
Warming up --------------------------------------
disabled, warn=false   490.143k i/100ms
Calculating -------------------------------------
disabled, warn=false      4.863M (± 1.5%) i/s -     49.014M in  10.081177s

ruby 3.1.0dev (2021-06-25T04:02:18Z flavorjones-extens.. de943189aa) [x86_64-linux]
Warming up --------------------------------------
 disabled, warn=true   483.070k i/100ms
Calculating -------------------------------------
 disabled, warn=true      4.839M (± 1.4%) i/s -     48.790M in  10.083899s

Comparison:
disabled, warn=false:  4863064.0 i/s
 disabled, warn=true:  4839310.1 i/s - same-ish: difference falls within error


ruby 3.1.0dev (2021-06-25T04:02:18Z flavorjones-extens.. de943189aa) [x86_64-linux]
Warming up --------------------------------------
 enabled, warn=false   484.398k i/100ms
Calculating -------------------------------------
 enabled, warn=false      4.840M (± 1.9%) i/s -     48.440M in  10.011854s

Comparison:
disabled, warn=false:  4863064.0 i/s
 enabled, warn=false:  4840123.2 i/s - same-ish: difference falls within error
 disabled, warn=true:  4839310.1 i/s - same-ish: difference falls within error


ruby 3.1.0dev (2021-06-25T04:02:18Z flavorjones-extens.. de943189aa) [x86_64-linux]
Warming up --------------------------------------
  enabled, warn=true   492.200k i/100ms
Calculating -------------------------------------
  enabled, warn=true      4.866M (± 2.1%) i/s -     48.728M in  10.017455s

Comparison:
  enabled, warn=true:  4866434.8 i/s
disabled, warn=false:  4863064.0 i/s - same-ish: difference falls within error
 enabled, warn=false:  4840123.2 i/s - same-ish: difference falls within error
 disabled, warn=true:  4839310.1 i/s - same-ish: difference falls within error

My conclusion is that the performance impact is very small, and we could omit the option if the Ruby core maintainers decide this behavior should be on by default.

Updated by nobu (Nobuyoshi Nakada) 4 months ago

I don't think the configuration option is needed.
And Check_Type(klass, T_CLASS); can be in rb_data_object_check.

Updated by mdalessio (Mike Dalessio) 4 months ago

nobu (Nobuyoshi Nakada) wrote in #note-1:

I don't think the configuration option is needed.
And Check_Type(klass, T_CLASS); can be in rb_data_object_check.

Thank you for your review. I've updated the pull request.

Updated by naruse (Yui NARUSE) 3 months ago

I want Ruby 3.1 to be compatible with 3.0. Therefore even if it is accepted, you can merge this in 3.2.

Updated by mdalessio (Mike Dalessio) 3 months ago

Naruse, thank you for your time.

Would you consider for 3.1 if this change was behind a configuration option that defaulted to disabled?

Updated by mame (Yusuke Endoh) 2 months ago

How about raising an exception when attempting to allocate a T_DATA object with the default allocator?

Updated by nobu (Nobuyoshi Nakada) 2 months ago

mame (Yusuke Endoh) wrote in #note-5:

How about raising an exception when attempting to allocate a T_DATA object with the default allocator?

Though I agreed it once, no way to tell if the given class is T_DATA in the default allocator.

Updated by Eregon (Benoit Daloze) 2 months ago

I'm confused, why does Class#allocate not simply use the alloc function set with rb_define_alloc_func?
Then there would be no need to undefine/redefine allocate, isn't it?

(IMHO users should never call allocate directly because it returns an uninitialized object which is likely to break when used, but that probably needs documentation, more thoughts, etc)

Updated by nobu (Nobuyoshi Nakada) 2 months ago

Eregon (Benoit Daloze) wrote in #note-7:

I'm confused, why does Class#allocate not simply use the alloc function set with rb_define_alloc_func?

You're confused by Class#allocate which calls the allocator set for each T_DATA and rb_class_allocate which implements the default allocator for T_OBJECT.
The former method now calls the latter function set with rb_define_alloc_func.

Updated by matz (Yukihiro Matsumoto) about 2 months ago

In my opinion, it should be merged, and it should cause errors (not warnings) in the future. But we need a migration path to ease the pain of the change.

Matz.

Updated by nobu (Nobuyoshi Nakada) about 2 months ago

naruse (Yui NARUSE) objects to that warning appearing every time that instance is created.
So I propose to undefine the allocator in that check, so an exception will be raised when new or allocate are called later.

static inline void
rb_data_object_check(VALUE klass)
{
    if (klass != rb_cObject && (rb_get_alloc_func(klass) == rb_class_allocate_instance)) {
        rb_undef_alloc_func(klass);
#if 0 // TODO: enable at the next release
        rb_warn("allocator of T_DATA class %"PRIsVALUE" got undefined", klass);
#endif
    }
}

Updated by mdalessio (Mike Dalessio) about 2 months ago

Thank you for your time, Matz, Naruse, and Nobu.

I've made the requested change in the pull request to undefine the allocator in the check, and the warning is hidden by #if 0.

Actions #12

Updated by Eregon (Benoit Daloze) about 2 months ago

  • Description updated (diff)
Actions #13

Updated by Eregon (Benoit Daloze) about 2 months ago

  • Description updated (diff)
Actions #14

Updated by Eregon (Benoit Daloze) about 2 months ago

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) about 2 months ago

My confusion was that the description of this issue talks about the allocate method, but actually it's about the alloc function.
In CRuby there is typically only one allocate method, Class#allocate, and that uses the alloc function associated with that class and goes up superclasses until it find one or finds UNDEF_ALLOC_FUNC (which causes TypeError "allocator undefined for Foo").
rb_define_alloc_func() and rb_undef_alloc_func() set the alloc function, they don't change any method (but of course they affect the behavior of Class#allocate which looks at the alloc function).

Actions #16

Updated by Eregon (Benoit Daloze) about 2 months ago

  • Description updated (diff)

Updated by mdalessio (Mike Dalessio) about 2 months ago

Benoit, thanks for helping clarify the language in the description. I've also updated the PR commit log to use this language.

Actions #18

Updated by nobu (Nobuyoshi Nakada) about 2 months ago

  • Status changed from Open to Closed

Updated by nobu (Nobuyoshi Nakada) about 2 months ago

  • Backport set to 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED
  • Tracker changed from Feature to Bug

At least c0f4e4ca needs to be backported, since p ObjectSpace::InternalObjectWrapper.new segfaults.

Updated by nagachika (Tomoyuki Chikanaga) about 1 month ago

  • Backport changed from 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED to 2.6: REQUIRED, 2.7: REQUIRED, 3.0: DONE

ruby_3_0 c42208f8e24402fe1aa8747901fba275bfb0d56b merged revision(s) c0f4e4ca6d0f76985bca79314b232b787c8f008e.

Actions

Also available in: Atom PDF