Feature #22097
openAdd Proc#with_refinements
Description
Abstract¶
I propose Proc#with_refinements(mod, ...) to support block-level refinements.
module StringExt
refine String do
def shout = upcase + "!"
end
end
original = ->(s) { s.shout }
refined = original.with_refinements(StringExt)
p refined.call("hello") # "HELLO!"
p original.call("hello") # NoMethodError
When no argument is given, ArgumentError is raised.
When a non-Module argument is given, TypeError is raised.
Background and Motivation¶
I previously proposed Proc#using in [Feature #16461], but it introduced semantic complexities because it mutated existing blocks.
Instead of mutating the existing block, Proc#with_refinements returns a new Proc object with its own isolated call sites.
This approach makes its semantics much simpler than Proc#using, and it avoids thread-safety issues and plays nicely with inline caches.
Limitations¶
- Similar to
Proc#binding,Proc#with_refinementsraisesArgumentErrorif the
receiver is not created from a Ruby block.
:to_s.to_proc.with_refinements(StringExt) #=> ArgumentError
- Chained application of
Proc#with_refinementsis not allowed.ArgumentErroris
raised if the receiver is aProcreturned byProc#with_refinements.
refined = prc.with_refinements(StringExt)
refined.with_refinements(IntegerExt) #=> ArgumentError
-
define_method(anddefine_singleton_method) rejects aProcwith refinements.
ArgumentErroris raised if the return value ofProc#with_refinementsis given to
define_method.
refined = prc.with_refinements(StringExt)
define_method(:foo, &refined) #=> ArgumentError
Implementation¶
I've opened a pull request: https://github.com/ruby/ruby/pull/17248
A PoC for JRuby is also available at: https://github.com/jruby/jruby/pull/9486
Data structure changes¶
- Added a bit field
has_refinementstorb_proc_t. - Added a hidden instance variable to
Procto store acrefwith the applied refinements. - Added a single-entry cache
refinement_memotorb_iseq_constant_body.
Deep copy of iseq and caching¶
Proc#with_refinements performs a deep copy of the receiver's iseq to isolate its call sites from the original Proc.
While a deep copy can be an expensive operation, the single-entry cache in rb_iseq_constant_body mitigates this overhead effectively for most practical use cases where the same refinements are applied repeatedly.
Overhead for code not using Proc#with_refinements¶
- Memory footprint: Neither internal structure grows in size.
has_refinementsis a 1-bit field added to rb_proc_t's existing bit field, andrefinement_memoshares a union withmandatory_only_iseqin rb_iseq_constant_body. - Execution speed: The common
Proc#callpath is kept frameless and only adds a singlehas_refinementsbit check. - GC: The mark/free/memsize functions add a single branch per
iseqto select the union member.
Benchmark results: https://gist.github.com/shugo/ddfe92f28ea31e6527a2f270e6daee7c
Here's an excerpt from the results, where compare-ruby is master and built-ruby is the branch for this feature (focusing on Proc/Block operations):
| compare-ruby | built-ruby | |
|---|---|---|
| vm_proc | 47.215M | 46.149M |
| 1.02x | - | |
| vm_yield | 1.649 | 1.754 |
| - | 1.06x |
Updated by shugo (Shugo Maeda) 5 days ago
- Related to Feature #16461: Proc#using added
- Related to Feature #12086: using: option for instance_eval etc. added
Updated by headius (Charles Nutter) 5 days ago
Thank you for considering JRuby! I will review your PR and also start a ruby-4.1 branch you can target.
Updated by shugo (Shugo Maeda) 5 days ago
- Description updated (diff)
Updated by shugo (Shugo Maeda) 5 days ago
headius (Charles Nutter) wrote in #note-2:
Thank you for considering JRuby! I will review your PR and also start a ruby-4.1 branch you can target.
Thank you!
I've opened a new pull request at: https://github.com/jruby/jruby/pull/9486
Updated by shugo (Shugo Maeda) 5 days ago
- Description updated (diff)
Updated by shugo (Shugo Maeda) 5 days ago
- Description updated (diff)
Updated by shugo (Shugo Maeda) 4 days ago
For maintainability, I've replaced the hand-written iseq deep-copy with an in-memory IBF dump+load round-trip in: https://github.com/ruby/ruby/pull/17248/changes/f27cf1d98c18f4137ace0243ca696ba3e17834af
Updated by Eregon (Benoit Daloze) 3 days ago
ยท Edited
Since the performance relies on having with_refinements called always with the same Refinement module for a given block, how about raising an exception if it doesn't hold?
Then we effectively have a guarantee vs very slow performance for e.g. loop { original.with_refinements(A); original.with_refinements(B) } (silly example, but could happen naturally in a bigger app).
Semantically, nested blocks also get access to the refinements, as shown in test_with_refinements_nested_block, or for clarity:
module StringExt
refine String do
def shout = upcase + "!"
end
end
original = ->(s) { -> { s.shout }.call }
refined = original.with_refinements(StringExt)
p refined.call("hello") # "HELLO!"
This is what I would expect, just I didn't see that in the description.
Copying a block IR's, and the IR of all nested blocks (IR = bytecode for CRuby) is quite expensive.
It's cached but it's still going to be a significant cost on either application startup/on the first request/etc.
It would be good to get some numbers on that, e.g. creating and calling N blocks vs the same but also using with_refinements.
The increased memory footprint would be worth documenting.
Semantically, this means a given block (the lexical construct) can behave significantly differently based on calls to the original Proc or the with_refinements Proc.
It's a bit like a given block being both a lambda and a proc, that's confusing and generally forbidden (except send(rand < 0.5 ? :lambda : :proc) { ... } but that's obvious; lambda(&b) is forbidden for this reason).
Or similar to the issues we had with Ractor.make_shareable (which we solved by making the semantics much more similar and error if it would be too different).
In summary: observable different semantics for the same block is always surprising, because hard to explain and to debug.
IOW, it can break the author of the block's intention, by changing what a given piece of Ruby code means.
I suppose the general expectation here is only the refined block is called and the original block is never called.
If that holds I think it's fine, the problem is how to make it hold?
To make the semantics cleaner, maybe we should prevent the original block to be called (i.e. raise an exception if it's called) once with_refinements has been called on it?
(note: this would be stored in the block, so for all Proc instances of that block)
One might still call the original block, then use with_refinements and observe the mixed semantics but that becomes a much narrower case.
One way to fully address that would be to make this lexical, like:
proc_using_refinements(A) do
...
end
and error if proc_using_refinements is not called with a literal block.
Or maybe tweak the lambda operator like e.g.:
->(s) [StringExt] { s.shout }
But I guess the use case here wants more flexibility?
Updated by shugo (Shugo Maeda) 3 days ago
Thank you for the feedback!
Eregon (Benoit Daloze) wrote in #note-8:
Since the performance relies on having
with_refinementscalled always with the same Refinement module for a given block, how about raising an exception if it doesn't hold?
I would prefer not to. The memo is just a cache, and this restriction would make the cache observable: whether prc.with_refinements(B) succeeds would depend on whether some other code called it with A before. For example, two libraries applying different refinements to Procs created from the same block would conflict, and the failure would depend on call order. That seems harder to debug than the performance issue it prevents.
Instead, how about emitting a performance warning (Warning[:performance], like the object shapes warnings) when with_refinements discards the cached copy because it was called with different modules for the same block? That makes the performance issue visible without changing the semantics.
Semantically, nested blocks also get access to the refinements, as shown in
test_with_refinements_nested_block, or for clarity:
Yes, nested blocks also see the refinements. This is intended behavior. The refinements also apply to methods defined with def inside the body. I have documented this in the RDoc in https://github.com/ruby/ruby/pull/17248/changes/5c84f091bb3f01a646554b368c62b197c3d6c700
Copying a block IR's, and the IR of all nested blocks (IR = bytecode for CRuby) is quite expensive.
It's cached but it's still going to be a significant cost on either application startup/on the first request/etc.
It would be good to get some numbers on that, e.g. creating and calling N blocks vs the same but also usingwith_refinements.
That makes sense.
Here are some numbers, with 10,000 distinct blocks of a realistic size (about 15 lines, 2 nested blocks, so each copy duplicates 3 iseqs):
call 10000 original blocks 27.2 ms (2.72 us/block)
with_refinements x10000 (first time: copy) 261.8 ms (26.18 us/block)
call 10000 refined blocks 27.6 ms (2.76 us/block)
with_refinements x10000 (memoized) 3.2 ms (0.32 us/block)
with_refinements x10000, alternating A/B 251.8 ms (25.18 us/block)
iseq tree size: original 4040 bytes, copy 3992 bytes
So the copy costs about 26 us and 4 KB per block per module set, and it happens only once thanks to the memoization. Even 10,000 refined blocks add only ~0.3 seconds to startup. Call speed is the same as the original.
The full script is at: https://gist.github.com/shugo/07e62c44bc4765ecff6d2b8e704b5f38
The increased memory footprint would be worth documenting.
I have documented the memory footprint in the RDoc in https://github.com/ruby/ruby/pull/17248/changes/5c84f091bb3f01a646554b368c62b197c3d6c700
To make the semantics cleaner, maybe we should prevent the original block to be called (i.e. raise an exception if it's called) once
with_refinementshas been called on it?
In the intended use cases, only the refined Proc is called. But I would prefer not to enforce it, for two reasons:
- Calling both is well-defined: each Proc behaves consistently, and their inline caches are isolated. There is no "mixed" state.
- Storing a flag in the block would mutate state shared by all Proc instances of that block. Calling
original.callwould suddenly raise because some other code calledwith_refinementson a sibling Proc. This is the same action-at-a-distance problem thatProc#usinghad, which this proposal was redesigned to avoid.
Note that a block can already behave differently depending on how it is invoked (instance_exec changes self, instance variables, and method resolution). with_refinements is similar: an explicit, opt-in re-binding of the resolution context, and the new Proc object makes the boundary visible.
One way to fully address that would be to make this lexical, like:
(snip)
But I guess the use case here wants more flexibility?
As you guessed, the main use case is the opposite direction: a library (e.g. a DSL) applies its own refinements to blocks written by its users, so that users do not need to write using or name the modules. A lexical form cannot express this, because the modules are chosen by the library that receives the block, not by the author of the block.
For the case where the author of the block names the modules, a convenience method close to your proc_using_refinements can be built on top of the primitive:
module Kernel
private def using_refinements(*modules, &block)
block.with_refinements(*modules).call
end
end
using_refinements(StringExt) { "hi".shout } #=> "HI!"
I think adding such a helper is fine (the name is open to discussion). But I want Proc#with_refinements as the primitive, because the case where a library receives a block cannot be expressed lexically.
Updated by shugo (Shugo Maeda) about 12 hours ago
- Description updated (diff)
Now Proc#with_refinements can be called in non-main Ractors:
https://github.com/ruby/ruby/pull/17248/changes/2cddb3fd99a7c68b4ab478f497f2610206edf335
Memo access is synchronized with RB_VM_LOCKING().
In single-Ractor mode it takes no actual lock (it is gated on rb_multi_ractor_p()), so the overhead is negligible:
call 10000 original blocks 23.5 ms (2.35 us/block)
with_refinements x10000 (first time: copy) 260.5 ms (26.05 us/block)
call 10000 refined blocks 23.5 ms (2.35 us/block)
with_refinements x10000 (memoized) 3.1 ms (0.31 us/block)
with_refinements x10000, alternating A/B 246.1 ms (24.61 us/block)
iseq tree size: original 4040 bytes, copy 3992 bytes