Feature #21459
closedAdd Set C-API
Description
I would like to add a minimal C-API for Set:
void rb_set_foreach(VALUE set, int (*func)(VALUE element, VALUE arg), VALUE arg);
VALUE rb_set_new(void);
VALUE rb_set_new_capa(unsigned long capa);
bool rb_set_lookup(VALUE set, VALUE element);
bool rb_set_add(VALUE set, VALUE element);
VALUE rb_set_clear(VALUE set);
bool rb_set_delete(VALUE set, VALUE element);
size_t rb_set_size(VALUE set);
I think this should allow extension libraries to start benefiting from core Set without having to resort to method calls (dangerous in an C extension as they could be redefined to return objects of an unexpected type).
I've submitted a pull request for this: https://github.com/ruby/ruby/pull/13735
Updated by Eregon (Benoit Daloze) 25 days ago
jeremyevans0 (Jeremy Evans) wrote:
I think this should allow extension libraries to start benefiting from core Set without having to resort to method calls (dangerous in an C extension as they could be redefined to return objects of an unexpected type).
What would be the danger with that?
It could cause a NoMethodError or so, but that's fine.
IMO rb_funcall() is enough for this, and anyway needs to be used for Ruby version < 3.5.
Updated by jeremyevans0 (Jeremy Evans) 25 days ago
Eregon (Benoit Daloze) wrote in #note-1:
jeremyevans0 (Jeremy Evans) wrote:
I think this should allow extension libraries to start benefiting from core Set without having to resort to method calls (dangerous in an C extension as they could be redefined to return objects of an unexpected type).
What would be the danger with that?
It could cause a NoMethodError or so, but that's fine.
When you are using rb_funcall, you cannot rely on the type of returned objects. If you want to pass returned objects to other C-API functions, you need defensive type checks, or it can result in undefined behavior (including segfaults) depending on how the returned objects are used.
Let's say you want to call the Set#size
method and pass the result to rb_fix2str
. If a user overrides the Set#size
method to return an object of a different type, you get undefined behavior, as rb_fix2str
does not perform type checks. By having an rb_set_size
C-API function, you know what type will be returned.
Having C-API functions for common methods is more efficient performance wise, and increases programmer happiness, since calling the functions is simpler than using rb_funcall. This is especially true in the rb_set_foreach
case. Without rb_set_foreach
, instead of passing a C function pointer, you would have to build a Ruby block in C that you could pass to Set#each
or Set#delete_if
. Passing a C function is also more flexible, since the same function can deal with both iteration and deletion.
IMO rb_funcall() is enough for this, and anyway needs to be used for Ruby version < 3.5.
The idea that we shouldn't add this because it cannot be used on older Ruby versions is basically an argument against adding any feature at all. If we add this, code targeting Ruby 3.5+ can benefit from it, and eventually not working on versions older than Ruby 3.5 will be a non-issue.
Updated by Eregon (Benoit Daloze) 24 days ago
If you want to pass returned objects to other C-API functions, you need defensive type checks, or it can result in undefined behavior (including segfaults) depending on how the returned objects are used.
I was thinking about methods returning a Set, those are no risk for using rb_funcall()
on it.
In fact these direct C API functions without type checks is what causes the issue in the first place.
Let's say you want to call the Set#size method and pass the result to rb_fix2str.
They could just use rb_funcall(returned_value, "to_s")
on the result then, but yeah it would be a problem for FIX2LONG(returned_value)
, fair point.
Having C-API functions for common methods is more efficient performance wise, and increases programmer happiness, since calling the functions is simpler than using rb_funcall.
I wouldn't say it increases programmer happiness in general, this is only relevant for native extensions using Set
and e.g. needing to iterate a Set, that's quite a small fraction of native extensions.
This is especially true in the rb_set_foreach case.
This convinces me, good point, I'm fine with adding the C API functions proposed here.
The idea that we shouldn't add this because it cannot be used on older Ruby versions is basically an argument against adding any feature at all.
I'm just highlighting that the utility/value of this will be very low until gems can assume 3.5+ and only relevant for gems using native extensions using Set. But still useful for those, yes.
Updated by matz (Yukihiro Matsumoto) 15 days ago
I accept the proposal.
Matz.
Updated by jeremyevans0 (Jeremy Evans) 14 days ago
- Status changed from Open to Closed
Feature added in 2ab38691a2683c992bf2886159094afd5e461233