Project

General

Profile

Actions

Feature #21853

open

Make Embedded TypedData a public API

Feature #21853: Make Embedded TypedData a public API

Added by byroot (Jean Boussier) about 2 months ago. Updated about 9 hours ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:124635]

Description

As part of Ruby 3.3, we added a private RUBY_TYPED_EMBEDDABLE flag to the TypedData API to allow TypedData to use variable width allocation.

Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision.

This API has both memory and speed benefits as it allow to avoid some malloc/free churn, reduce pointer chasing, etc.

For instance, when we converted Time to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103

I believe numerous third party native extensions could benefit from it (I would certainly make use of it in ruby/json),
now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1

Updated by Eregon (Benoit Daloze) about 2 months ago Actions #1 [ruby-core:124636]

I'm thinking about this in the context of TruffleRuby, where RTypedData never moves (it's allocated via system calloc()).
I think the best then would be to ignore this new flag entirely, and so the public API should be done in a way that it can be implemented as if it's not embedded.

Related: https://github.com/truffleruby/truffleruby/issues/4130
So on TruffleRuby I think we could always use the same allocation for the RTypedData + data struct, when using TypedData_Make_Struct(), effectively the same as embedded TypedData but never moving.
But not when using TypedData_Wrap_Struct() since that uses an existing data pointer.

Updated by byroot (Jean Boussier) about 2 months ago Actions #2 [ruby-core:124637]

So on TruffleRuby I think we could always use the same allocation for the RTypedData + data struct, when using TypedData_Make_Struct(), effectively the same as embedded TypedData but never moving.

I don't think so, because you still need to support DATA_PTR(obj) = ptr, which isn't allowed for embedded typed datas.

Updated by Eregon (Benoit Daloze) about 1 month ago · Edited Actions #3 [ruby-core:124647]

Good point! How do embedded typed datas handle this, do they raise an exception in such a case?
Seems tricky given the DATA_PTR(obj) API returning a pointer.

I'd actually love if we had a separate API for changing the data pointer as a macro or function (e.g. RTYPEDDATA_SET_DATA(obj, new_data_pointer) to follow RTYPEDDATA_GET_DATA), so we know better when it can be changed.
Currently we have to workaround in TruffleRuby that after every native call that accesses a T_DATA we have to check if the data pointer has changed :/

Of course we wouldn't be able to remove DATA_PTR() yet, but we could maybe deprecate it and/or at some point make it return a const pointer or so to prevent writes.

Updated by byroot (Jean Boussier) about 1 month ago Actions #4 [ruby-core:124648]

How do embedded typed datas handle this, do they raise an exception in such a case?

Unfortunately not. It end up with data corruption.

I'd actually love if we had a separate API for changing the data pointer as a macro or function

Makes sense.

Updated by Eregon (Benoit Daloze) about 11 hours ago · Edited Actions #5 [ruby-core:125007]

One tricky aspect about RUBY_TYPED_EMBEDDABLE is if in the struct there is a pointer to inside that struct then those pointers will become invalid when the object is moved.
Is there a way to handle that correctly to update such pointers? (EDIT: it seems not looking at this).
If the struct is ever passed to a native library I would consider it extremely dangerous to use RUBY_TYPED_EMBEDDABLE.

Overall it sounds quite error-prone, also considering there is no safeguard to avoid writing to DATA_PTR, so I'm not sure it's appropriate to expose this to user extensions.
EDIT: and it also needs RB_GC_GUARD() calls to avoid the object moving or being freed while on the stack as shown in this commit. Those are notoriously easy to forget.

Updated by Eregon (Benoit Daloze) about 9 hours ago Actions #6 [ruby-core:125008]

Eregon (Benoit Daloze) wrote in #note-5:

also considering there is no safeguard to avoid writing to DATA_PTR

One idea to address this (but not the other 2 concerns) would be to raise on DATA_PTR() for RUBY_TYPED_EMBEDDABLE, and only allow RTYPEDDATA_GET_DATA().

Actions

Also available in: PDF Atom