Project

General

Profile

Feature #14819

Efficient cstring to RVALUE typecasting for c extension gems

Added by sam.saffron (Sam Saffron) 6 months ago. Updated 6 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:87368]

Description

A general pattern I notice in the PG / MySQL and other gems is a general need for a C string to RVALUE type casting.

A lot of the problem is that the libpq likes returning strings and you do not want to allocate an extra RVALUE string for an IP address or Date and so on when returning data for the data set.

The result is that many c extension gems carry around very similar code for casting C strings to dates/ip addresses and so on.

An example of such a function is: https://github.com/ged/ruby-pg/blob/master/ext/pg_text_decoder.c#L551-L715

Ruby already provides quite a few helpers such as rb_cstr2inum I wonder if there are some gaps we can fill like

rb_cstr2ipaddr rb_cstr2time rb_cstr2date rb_cstr2macaddr

Having these implementation could help reduce code carried in gems and offer a faster method for adoption of faster patterns. For the interim gems could copy and paste MRI implementation and longer term just add an #if to lean on MRI.

An alternative "wild" idea here would be to allow Ruby code to operate in an "unsafe" way on strings that are not RVALUES yet, .NET has a similar mechanism with unsafe. Eg:

unsafe do
  non_rvalue_string = api.getcstr
  # operate on string with a limited set of operators that only allocate on stack
  api.free non_rvalue_string
end

If the wild idea here is interesting perhaps lets open a new topic, for this purpose let's discuss rb_cstr2ipaddr rb_cstr2time rb_cstr2date rb_cstr2macaddr

History

#1 [ruby-core:87369] Updated by jeremyevans0 (Jeremy Evans) 6 months ago

IPAddr and Date are stdlib, not core, so I don't think a C-API (rb_cstr2ipaddr, rb_cstr2date) method could be added for them unless it was part of the related extensions . Date is written in C, but IPAddr is written in ruby and doesn't have a C extension. In either case, I don't believe it would work on Windows, since I don't believe Windows allows a dlopened library to call functions in another dlopened libary.

I don't think there is a macaddr library in stdlib or core, though there is a gem for it, so I don't think rb_cstr2macaddr makes sense.

The big problem with rb_cstr2time is you would have to decide which format(s) you want to support. You would probably want a separate function per format.

I think these types of conversion methods are better implemented in C-extension gems specific to the conversions required. For example, in sequel_pg, I have C functions for converting to Time/Date/DateTime quickly, but they are specific to PostgreSQL's date/time formats and the specific Sequel configuration in use.

#2 [ruby-core:87370] Updated by sam.saffron (Sam Saffron) 6 months ago

Interesting, maybe what we need is a magic typecaster gem that centralizes all this work and is implemented in Ruby for Windows. Yeah ... not sure what the perfect solution is here, but it would be nice to centralize the implementations, even if we just cut-and-paste from the central repo.

btw did you see, PG gem now does native cstr2time I wonder if its time to just merge in functionality of sequel_pg into the pg gem in light of the type casters and so on. More winners here having one fast implementation of the building block.

#3 [ruby-core:87372] Updated by jeremyevans0 (Jeremy Evans) 6 months ago

sam.saffron (Sam Saffron) wrote:

btw did you see, PG gem now does native cstr2time I wonder if its time to just merge in functionality of sequel_pg into the pg gem in light of the type casters and so on. More winners here having one fast implementation of the building block.

As I mentioned, sequel_pg's implementation is dependent on Sequel's configuration (Date/Time, UTC/local database/application timezone, 8 separate cases, with a fallback to a pure-ruby version for non-UTC/local timezones), a generic implementation could definitely not handle things correctly. I suspect that the sequel_pg case is not the only case where this is true.

#4 [ruby-core:87377] Updated by normalperson (Eric Wong) 6 months ago

sam.saffron@gmail.com wrote:

A general pattern I notice in the PG / MySQL and other gems is
a general need for a C string to RVALUE type casting.

A lot of the problem is that the libpq likes returning strings
and you do not want to allocate an extra RVALUE string for an
IP address or Date and so on when returning data for the data
set.

What Jeremy said about Date and IPAddr being in stdlib and not core.

So I don't think this doable unless we start stuffing those
things into core.

Fwiw, the overhead of those objects (especially Time) has always
been frustrating to me; I'd much rather have Time functionality
built around Integers.

So I wonder if we should support more operations on Integers
instead of having fancy objects...

rb_cstr2ipaddr rb_cstr2time rb_cstr2date rb_cstr2macaddr

Another thing: No more "cstr" functions, please.
Any parsers should take an explicit size arg for safety
and performance instead of relying on '\0' terminator.

#5 [ruby-core:87380] Updated by sam.saffron (Sam Saffron) 6 months ago

I like the idea of a building block that converts to Integer or (Integer pair) if it is acceptable, you can already init a Time cheaply with a number, so we could have a similar method for IPAddr and MacAddr etc. and get away with shipping the common building block without pulling these things into core.

It is not a trivial win though so I get the reluctance here.

#6 [ruby-core:87385] Updated by larskanis (Lars Kanis) 6 months ago

I don't think it's really helpful to have C-str to Time converters in core. Time string representation differs in various details between PG and MySQL and even more between other sources.

However while implementing the time decoder for pg I wished to have a fast version of:

rb_funcall(rb_cTime, rb_intern("new"), 7, INT2NUM(year), INT2NUM(mon), INT2NUM(day), INT2NUM(hour), INT2NUM(min), sec_and_usec_rational, INT2NUM(gmt_offset));

We have rb_time_timespec_new(ts, offset) since ruby-2.3, which is more than 10 times faster than the above, but it requires to convert the time to timespec first. This conversion is pretty difficult and involves some less standard functions (timegm() and mktime()) which didn't work as expected on Windows and termix/Android.

Since Ruby core has everything builtin to do the timespec conversion, it would be nice to have a public C function to create a Time object from integer values.

#7 [ruby-core:87400] Updated by normalperson (Eric Wong) 6 months ago

lars@greiz-reinsdorf.de wrote:

However while implementing the time decoder for pg I wished to have a fast version of:

rb_funcall(rb_cTime, rb_intern("new"), 7, INT2NUM(year), INT2NUM(mon), INT2NUM(day), INT2NUM(hour), INT2NUM(min), sec_and_usec_rational, INT2NUM(gmt_offset));

We have rb_time_timespec_new(ts, offset) since ruby-2.3,
which is more than 10 times faster than the above, but it
requires to convert the time to timespec first. This
conversion is pretty difficult and involves some less standard
functions (timegm() and mktime()) which didn't work as
expected on Windows and termix/Android.

I think a better path would be to bundle timegm for systems
without it. It avoid increasing the C-API footprint of Ruby
and reduces the cognitive overhead for anybody reading the
code.

Unfortunately the Ruby community doesn't like GPL; since gnulib
provides exactly the functionality required (and much more in
terms of portability) in a well-organized repo:

https://git.savannah.gnu.org/cgit/gnulib.git

But at least for timegm, musl and any BSDs have it, too.

#8 [ruby-core:87426] Updated by larskanis (Lars Kanis) 6 months ago

I think a better path would be to bundle timegm for systems without it.

And that's the problem. We have 3 implementations (each at least 500 LOC) of timegm/mktime then: one in ruby core, one in libc and another one in pg. And we can only hope that they all return the same result, when called with the same input values. To my experience they don't.

#9 [ruby-core:87434] Updated by normalperson (Eric Wong) 6 months ago

lars@greiz-reinsdorf.de wrote:

I think a better path would be to bundle timegm for systems without it.

And that's the problem. We have 3 implementations (each at
least 500 LOC) of timegm/mktime then: one in ruby core, one in
libc and another one in pg. And we can only hope that they all
return the same result, when called with the same input
values. To my experience they don't.

Interesting (and sad) that they don't :<

Fwiw, Ruby doesn't have an exact timegm replacement.
We (the core team) generally prefer to keep the C API small
to allow future improvements to be made more easily, so I'm
not sure how readily exposing something like rb_timegm(struct tm *)
would be accepted.

A bigger problem is probably rb_funcall*, which doesn't benefit
from inline caching at all...

Also available in: Atom PDF