Project

General

Profile

Actions

Misc #20238

open

Use prism for mk_builtin_loader.rb

Added by kddnewton (Kevin Newton) 5 months ago. Updated about 1 month ago.

Status:
Open
Assignee:
-
[ruby-core:116589]

Description

I would like to propose that we use prism for mk_builtin_loader.rb.

Right now the Ruby syntax that you can use in builtin classes is restricted to the base Ruby version (2.7). This means you can't use a lot of the nicer syntax that Ruby has shipped in the last couple of years.

If we switch to using prism to parse the builtin files instead of using ripper, then we can always use the latest version of Ruby syntax. A pull request for this is here: https://github.com/kddnewton/ruby/pull/65. The approach for the PR is taken from how RJIT bindgen works.

Updated by hsbt (Hiroshi SHIBATA) 5 months ago

Do you have any reason this use bundler and gem version of prism? It seems not working if BASERUBY is system ruby on Linux distribution because these environment is separated ruby, rubygems and bundler packages. It should be using repository version of prism.

Updated by kddnewton (Kevin Newton) 5 months ago

I tried to get that working, but it seems like the extensions are built too late. We need prism earlier in the process to build the core files first. Maybe there's a way, but @k0kubun (Takashi Kokubun) and I couldn't find it.

Updated by Eregon (Benoit Daloze) 5 months ago

hsbt (Hiroshi SHIBATA) wrote in #note-1:

It seems not working if BASERUBY is system ruby on Linux distribution because these environment is separated ruby, rubygems and bundler packages. It should be using repository version of prism.

Is that a blocker?
Because I think running mk_builtin_loader.rb is only needed for builds from the repository, not from release archives (since it needs BASERUBY).
And for building from repository, there is BASERUBY needed and it seems fair enough to also rely on RubyGems & Bundler being there.
OS packagers could install those if they want to build for some reason from the repository instead of release archives, no?

Updated by k0kubun (Takashi Kokubun) 4 months ago

It seems not working if BASERUBY is system ruby on Linux distribution because these environment is separated ruby, rubygems and bundler packages.

I tried it on Ubuntu 22.04, and it worked. We already use ERB of baseruby, so it should have default gems, including Bundler. The prism installation goes to tool/.bundle, not the package directory. You need apt install ruby-dev (header files) in addition to apt install ruby though.

It should be using repository version of prism.

I think this idea itself is feasible. However, because the prism for baseruby needs to be built using baseruby's headers, you'd still need to install ruby-dev. So it probably doesn't make things easier.

If we don't want to require ruby-dev, we might be able to tweak the PR to discover prism.gem of baseruby for Ruby 3.3+. You'd still need ruby-dev for Ruby 3.2 and older, but it'd be at least optional for newer Rubies.

Updated by hsbt (Hiroshi SHIBATA) 4 months ago

Ex. System ruby didn't contain Bundler in ruby package of FreeBSD 13. rubygem-bundler is separated package from ruby.

[hsbt@freebsd13 /usr/home/hsbt]$ ruby -v -rrubygems -rbundler -e "puts Bundler::VERSION"
ruby 3.1.4p223 (2023-03-30 revision 957bb7cb81) [amd64-freebsd13]
<internal:/usr/local/lib/ruby/3.1/rubygems/core_ext/kernel_require.rb>:85:in `require': cannot load such file -- bundler (LoadError)
        from <internal:/usr/local/lib/ruby/3.1/rubygems/core_ext/kernel_require.rb>:85:in `require'

ERB is working fine.

[hsbt@freebsd13 /usr/home/hsbt]$ ruby -v -rerb -e "puts ERB"
ruby 3.1.4p223 (2023-03-30 revision 957bb7cb81) [amd64-freebsd13]
ERB

It means hard to use bundler/inline with BASERUBY for build toolchain in supported platform.

Updated by Eregon (Benoit Daloze) 4 months ago

That means that ruby package on FreeBSD 13 can also not use any bundled gem, e.g. bigdecimal when 3.4 comes out. So it's very clearly an incomplete Ruby installation.
A complete Ruby installation includes RubyGems and bundled gems (including Bundler), they are part of the release archive (and AFAIK always built).
If OS packagers like to cut things smaller than CRuby provide in the release archive, they can, but it seems no big deal to install extra packages when building CRuby from the repository.

In general to compile Ruby from the repository, one needs ruby-dev/ruby-devel/ruby-all (or using a Ruby installer, and BTW they all provide a complete Ruby installation). What the ruby package includes is just too inconsistent between platforms.
I think we should update the documentation for building Ruby from the repository to mention this, and then maybe install a couple extra packages for rubyci.org machines.

Updated by ko1 (Koichi Sasada) 4 months ago

I'm not negative to use Prism (or using parse.y) to make it to support newest syntax in builtins.

There are some concerns.

  • Installing prism.gem requires bundler for BASERUBY (already commented)
  • Using prism gem it can be newer than the source code (prism gem for ruby 3.5 syntax to build ruby 3.4, for example)
    • So I believe it should use Prism in source code.
  • Using prism in source code, it also needs extra dependencies to build an extension for BASERUBY (as k0kubun-san commented)
    • rbconfig.rb contains building tools information, but it doesn't mean such tools are available and it can cause troubles.

Another idea is making small C tool using prism.
To build MRI, developers should prepare a compiler and make command.
With such minimum requirement tools, we can build "Ruby source code to structured AST representation" tool with prism as C program.
AST representation is anything okay, JSON, YAML, XML or Array/Hash with Ruby's syntax for examples.
After converting, we can process them with BASERUBY.

Updated by kddnewton (Kevin Newton) 4 months ago

Using prism gem it can be newer than the source code (prism gem for ruby 3.5 syntax to build ruby 3.4, for example)

You can pass Prism.parse(code, version: "3.3.0") and it will parse as if it were 3.3.0. I'm fine supporting older Ruby versions for this purpose.

Another idea is making small C tool using prism.

I like this idea if we can't use the vendored prism.

Updated by ko1 (Koichi Sasada) 4 months ago

kddnewton (Kevin Newton) wrote in #note-8:

You can pass Prism.parse(code, version: "3.3.0") and it will parse as if it were 3.3.0. I'm fine supporting older Ruby versions for this purpose.

(off-topic)

Oh, Prism supports multi-versions?

Updated by k0kubun (Takashi Kokubun) 4 months ago

install a couple extra packages for rubyci.org machines

Note that RubyCI doesn't have this problem. On RubyCI servers, which are provisioned by ruby-infra-recipe, we build Ruby 3.2.3 using ruby-build (from a tarball) and use it as a BASERUBY.

Using prism in source code
Another idea is making small C tool using prism

It looks like we're converging on using Prism in the source code instead of installing one from rubygems.org, whether we use it as prism.gem or C code. It would obviate Bundler and allow using the latest syntax without releasing prism.gem, so there's a benefit in doing so.

I think the question is: Do CRuby developers (want to) use a BASERUBY that cannot build a C extension? If the answer is yes, we'd be forced to use Prism as a C library, serializing Ruby AST using C code. If not, it leaves an option to use prism.gem built from the source code.

Updated by k0kubun (Takashi Kokubun) 4 months ago

We discussed this at DevMeeting 2024-02-14.

  • The idea of using Prism for supporting the latest syntax in builtin classes is accepted.
  • We should try making a small C program using Prism to achieve this. If it turns out to be too complicated, we'll revisit the discussion.
    • The way to prepare an environment for building C extensions varies across different platforms, so we worry that installing C extensions with a BASERUBY may make things harder.

Updated by kddnewton (Kevin Newton) 4 months ago

Thank you for the discussion! I will look into a C program and update this ticket, that sounds like a good approach.

Updated by naruse (Yui NARUSE) 4 months ago

I think there is a risk to use syntax which is not released yet. If the syntax is reverted by some reason, mk_builtin_loader.rb also needs to be also fixed.

We should try making a small C program using Prism to achieve this. If it turns out to be too complicated, we'll revisit the discussion.

I doubt this idea even if the initial implementation of it is simple. I'm afraid that it becomes more complex over the years.
I think it should use latest stable Ruby as BASERUBY instead.

Updated by k0kubun (Takashi Kokubun) 4 months ago

I think there is a risk to use syntax which is not released yet. If the syntax is reverted by some reason, mk_builtin_loader.rb also needs to be also fixed.

Note that I'm fine with the latest released syntax as well. The syntax I've wanted to use are: pattern matching (Ruby 2.7, already added to BASERUBY), endless method definition (Ruby 3.0), and omitted keyword arguments (Ruby 3.1), which had been already released when I tried to use them.

So it's fine to use either Prism with PM_OPTIONS_VERSION_CRUBY_3_3_0 or the Prism version for the latest release Ruby.

I doubt this idea even if the initial implementation of it is simple. I'm afraid that it becomes more complex over the years.

It all depends on the design of the C program, so that's why we should try making one. If the new mk_builtin_loader.rb and the C program seem harder to maintain than today's mk_builtin_loader.rb, we can revisit alternatives (bumping BASERUBY, using prism.gem from BASERUBY, etc).

I think it should use latest stable Ruby as BASERUBY instead.

Based on Matz's conclusion (ref: meeting notes), the "latest" we can use is still Ubuntu 22.04's Ruby 3.0. That's too old for me since I want to use Ruby 3.1's feature.

However, if no other option is left, I'd be happy to wait until Ubuntu 24.04's release, which is already close enough, and bump BASERUBY to Ruby 3.1+.

Updated by kddnewton (Kevin Newton) 4 months ago

I built the C program here: https://github.com/ruby/ruby/pull/9998. It keeps mk_builtin_loader.rb in place and uses a new tool/dump_ast.c to dump the AST to JSON. Then we parse the AST and walk it in mk_builtin_loader.rb.

I added many, many comments to explain how everything is working in the C program. I tried to be as verbose as possible to make it very clear how it worked. @naruse (Yui NARUSE) I hope this assuages your fears about it being too complex.

I've got some build failures because I'm not creating a directory in the right order, which I'll address next week.

Updated by kddnewton (Kevin Newton) 4 months ago

Sorry the previous link was incorrect. The correct PR is here: https://github.com/ruby/ruby/pull/10005. The comments aren't in the C file, they're in the Ruby file. The C file is relatively minimal as it is — it parses the file and if it's valid it serializes to JSON and dumps to stdout, otherwise it builds the error message, prints to stderr, and exits. The Ruby file performs the tree walk in pretty much the same way that the Ripper one did.

Updated by hsbt (Hiroshi SHIBATA) 4 months ago

We should close this because @k0kubun (Takashi Kokubun) is working to update BASERUBY version for using new syntax. There is no reason why we use prism for it.

Updated by kddnewton (Kevin Newton) 4 months ago

Respectfully, I disagree. If we switch to prism for this now, then every time we want to bump the version of syntax that can be used in builtin files, it will involve changing a single line of code (the version that prism should support). If we don't, then we will need to wait for BASERUBY to be updated every time, and also will need to continue to support/debug ripper's AST.

@matz (Yukihiro Matsumoto) has said repeatedly that the AST going forward will be prism's AST. That means that even if prism isn't the default parser going forward, the AST will be consistent. This means that regardless of the future of prism, this code will not need to change. This seems to me to be a more maintainable solution, requiring fewer updates and maintenance going forward.

Updated by naruse (Yui NARUSE) about 1 month ago

I talked with k0kubun and disscussed about multiple options as below.
I understand the background and agree to go ahead.

Upgrading BASERUBY

  • more work for each upgrade
  • there is less motivation to upgrade BASERUBY other than mk_builtin_loader.rb, but mk_builtin_loader.rb has much needs to track latest ruby

About BASERUBY with GEM_HOME

  • BASERUBY may not have gem for system ruby of some distribution

prism library with C program

  • only run for git checkout (not tarball)
  • not commit the resulted source
  • simpler than above options
Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0