Project

General

Profile

Actions

Misc #20519

closed

Porting regexp to pure ruby?

Added by brightbits (Michael Baldry) 27 days ago. Updated 23 days ago.

Status:
Feedback
Assignee:
-
[ruby-core:118147]

Description

Would there be any benefit in porting Regexp from Onigmo to a pure ruby implementation that could benefit from YJIT?

Compiling a pattern could be translating to a ruby method which would be optimized by YJIT easily.

Has this been explored or any work done around this kind of thing, before I take a look in to it more?

Many thanks

Updated by shyouhei (Shyouhei Urabe) 26 days ago

  • Status changed from Open to Feedback

Ruby (especially its multilingualized string) is built on top of Onigmo and not vice versa. You must first decouple them, which alone is not an easy task.

Updated by brightbits (Michael Baldry) 26 days ago

shyouhei (Shyouhei Urabe) wrote in #note-1:

Ruby (especially its multilingualized string) is built on top of Onigmo and not vice versa. You must first decouple them, which alone is not an easy task.

Ah yes, I see now that everything in enc has an Oniguruma copyright header.

I think that could all remain and just change the actual regexp matching functions but after doing some quick benchmarking with ruby implementing the logic of a relatively simple regexp parsing dates, with YJIT I couldn't get anywhere near the speed of Onigmo.. Which doesn't mean it's not possible, I didn't dig too deep, or do any kind of profiling to see what was taking the time.

The thought came about as my team were benchmarking a change where one suggested a regexp for matching and replacing a string prefix and it was tested against using start_with? and then string range accessor to drop the prefix, which seemed to be faster for that case.

I agree it sounds like a very big job and based on initial testing, unlikely to be an improvement in most cases.

Updated by kddnewton (Kevin Newton) 23 days ago

Hi @brightbits (Michael Baldry)! I've investigated this one at length, and can give some context.

As you already discovered, Onigmo stretches well beyond regular expressions. It also provides all of the encoding support within CRuby, stretching all of the way into the parser. This has led most other Ruby implementations to have to vendor Onigmo in order to match behavior 1:1. For example TruffleRuby uses it as a fallback (https://github.com/oracle/truffleruby/blob/master/lib/cext/include/ruby/onigmo.h), Artichoke uses it as a fallback (https://github.com/artichoke/artichoke/blob/77434156f30188a6e27f321b9b0f8437acfc0834/spinoso-regexp/Cargo.toml#L27), Natalie uses it as its regexp engine (https://github.com/natalie-lang/natalie/blob/556e8c195423daddf1c5aba49bb67dda22fb36d7/Rakefile#L467-L480), etc. For these reasons replacing Onigmo entirely may be possible, but it would certainly be an extremely long and arduous process because of concerns about backward compatibility.

That being said, there are things that could be done. The various options would be:

Updated by brightbits (Michael Baldry) 23 days ago

I was at the kaigi but unfortunately missed that talk! I didn't realise a few weeks later I'd be digging in to it :) Looks like some interesting work has gone in to this area already. I'm going to spend some time looking in to this.

Thanks for the detailed response, I really appreciate it!

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0