Bug #13358
closedOpenStruct overriding allocate
Description
In https://github.com/ruby/ruby/commit/15960b37e82ba60455c480b1c23e1567255d3e05 OpenStruct gained
class << self # :nodoc:
alias allocate new
end
Which is rather severely conflicting with expected behavior as Class.allocate
is meant to not call initialize. So, in fact, the change made allocate
of OpenStruct
do what allocate
is asserting not to do :-/
For OpenStruct
itself that isn't that big a deal, for classes inheriting from OpenStruct
it breaks allocate
though.
Example:
require 'ostruct'
class A < OpenStruct
def initialize(x, y = {})
super(y)
end
end
A.allocate
As allocate
is alias'd to new
in OpenStruct
this will attempt to initialize A
which will raise an ArgumentError
because A
cannot be initialized without arguments.
$ ruby x.rb
x.rb:4:in `initialize': wrong number of arguments (given 0, expected 1..2) (ArgumentError)
from x.rb:9:in `new'
from x.rb:9:in `<main>'
OpenStruct at the very least should document the fact that its allocate is behaving differently.
Ideally, OpenStruct
should not alias allocate at all.
Files
Updated by Eregon (Benoit Daloze) almost 8 years ago
I agree making OpenStruct.allocate an alias of Class#new is not a great idea and easily problematic.
We had a problem due to this in TruffleRuby (now fixed by renaming all our internal allocate methods).
The fix we came up with seems cleaner and let YAML deserialize OpenStruct (the original reason why this alias was introduced)
by making respond_to? work on a OpenStruct.allocate instance.
It only needs an extra "&" in
https://github.com/ruby/ruby/blob/d77214e8a37efecfa835ce56ce5f11e4125effd4/lib/ostruct.rb#L198
therefore becoming:
https://github.com/graalvm/truffleruby/blob/8338f62a711e2a1d816dd8393b9402c6d3238e8c/lib/patches/ostruct.rb#L10
Updated by nobu (Nobuyoshi Nakada) over 7 years ago
- Description updated (diff)
- Backport changed from 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN to 2.2: DONTNEED, 2.3: REQUIRED, 2.4: DONTNEED
Updated by nobu (Nobuyoshi Nakada) over 7 years ago
Note that OpenStruct.allocate
is intended to be used by only psych.
Not only OpenStruct
, Class#allocate
is not intended to be called directly.
Updated by nobu (Nobuyoshi Nakada) over 7 years ago
- Related to Bug #11884: Psych.load broken for OpenStruct in Ruby 2.3.0 added
Updated by nobu (Nobuyoshi Nakada) over 7 years ago
- Status changed from Open to Closed
Applied in changeset r58077.
ostruct.rb: fix OpenStruct.allocate
- lib/ostruct.rb (OpenStruct.allocate): initialize an instance
variable directly, without callingintialize
method which may
be overridden in a subclass. [ruby-core:80292] [Bug #13358]
Updated by Eregon (Benoit Daloze) over 7 years ago
nobu (Nobuyoshi Nakada) wrote:
Note that
OpenStruct.allocate
is intended to be used by only psych.
Not onlyOpenStruct
,Class#allocate
is not intended to be called directly.
Lots of Ruby code calls Class#allocate.
Of course, #initialize should be called directly after or some other way to initialize the instance.
Why another patch? It seems more hacky and less performant.
Updated by nobu (Nobuyoshi Nakada) over 7 years ago
Eregon (Benoit Daloze) wrote:
Lots of Ruby code calls Class#allocate.
Of course, #initialize should be called directly after or some other way to initialize the instance.
For what purpose?
It should be used by Class#new
, Kernel#dup
, Kernel#clone
, and Marshal.load
.
Why another patch? It seems more hacky and less performant.
As it is not called usually, it should take the performance penalty, not respond_to?
.
Updated by Eregon (Benoit Daloze) over 7 years ago
nobu (Nobuyoshi Nakada) wrote:
For what purpose?
It should be used byClass#new
,Kernel#dup
,Kernel#clone
, andMarshal.load
.
But not only, after all it is a public and well-known method (many hits on GitHub code search).
Many deserializing libraries will use it,
but also various libraries which need to build an instance without passing in state,
or use it as a way to replicate Class#new
without using super like (maybe to use a different initializing method):
def self.new(name)
if r = CACHE[name]
r
else
# Could use super(name), but not everybody does
allocate.tap { |obj| obj.send(:initialize, name) }
end
end
As it is not called usually, it should take the performance penalty, not
respond_to?
.
I think respond_to?
should work and not throw an exception on any Object
,
even if not fully initialized, just like say #instance_variable_defined?
, #equal?
and #object_id
.
The performance hit on respond_to?
is not significant, it's just an extra NIL_P
.
On the other hand, the one on allocate
is, and affects every caller of OpenStruct.allocate
.
require 'benchmark/ips'
require 'ostruct'
class SomeClass
end
Benchmark.ips do |x|
x.report("OpenStruct.allocate") { OpenStruct.allocate }
x.report("Class#allocate") { SomeClass.allocate }
x.compare!
end
class MyOpenStruct
def respond_to_missing?(mid, include_private = false) # :nodoc:
mname = mid.to_s.chomp("=").to_sym
@table&.key?(mname) || super
end
end
os = OpenStruct.new
my = MyOpenStruct.new
Benchmark.ips do |x|
x.report("OpenStruct#respond_to?") { os.respond_to?(:init_with?) }
x.report("MyOpenStruct#respond_to?") { my.respond_to?(:init_with?) }
x.compare!
end
Comparison:
Class#allocate: 7783868.7 i/s
OpenStruct.allocate: 3813277.6 i/s - 2.04x slower
Comparison:
OpenStruct#respond_to?: 1752400.3 i/s
MyOpenStruct#respond_to?: 1734649.5 i/s - same-ish: difference falls within error
@nobu (Nobuyoshi Nakada): Can I commit my fix?
It is faster and does not change the semantics of OpenStruct.allocate
.
Updated by nobu (Nobuyoshi Nakada) over 7 years ago
Eregon (Benoit Daloze) wrote:
But not only, after all it is a public and well-known method (many hits on GitHub code search).
Many deserializing libraries will use it,
Methods of constructed objects will be used more than construction usually.
but also various libraries which need to build an instance without passing in state,
or use it as a way to replicateClass#new
without using super like (maybe to use a different initializing method):
It doesn't need allocate
, CACHE[name] || super
.
The performance hit on
respond_to?
is not significant, it's just an extraNIL_P
.
And a branch.
On the other hand, the one on
allocate
is, and affects every caller ofOpenStruct.allocate
.
Why do you think the performance of allocate
matters?
Note that it is never used in common, since Class#new
never calls allocate
overridden in ruby level.
Updated by nagachika (Tomoyuki Chikanaga) over 7 years ago
- Backport changed from 2.2: DONTNEED, 2.3: REQUIRED, 2.4: DONTNEED to 2.2: DONTNEED, 2.3: DONE, 2.4: DONTNEED
ruby_2_3 r58161 merged revision(s) 58077.
Updated by Eregon (Benoit Daloze) over 7 years ago
nobu (Nobuyoshi Nakada) wrote:
Eregon (Benoit Daloze) wrote:
But not only, after all it is a public and well-known method (many hits on GitHub code search).
Many deserializing libraries will use it,Methods of constructed objects will be used more than construction usually.
but also various libraries which need to build an instance without passing in state,
or use it as a way to replicateClass#new
without using super like (maybe to use a different initializing method):It doesn't need
allocate
,CACHE[name] || super
.
Yes, nevertheless existing code uses it.
And if sby needed some special method like #initialize_native
,
calling #allocate
is the most natural way to do it in Ruby.
The performance hit on
respond_to?
is not significant, it's just an extraNIL_P
.And a branch.
Which is insignificant compared to the rest of the logic in #respond_to_missing?
.
On the other hand, the one on
allocate
is, and affects every caller ofOpenStruct.allocate
.Why do you think the performance of
allocate
matters?
Note that it is never used in common, sinceClass#new
never callsallocate
overridden in ruby level.
I think this bug is no reason to make a method 2x slower.
Ruby code might call #allocate
for various reasons.
But my main issue with this fix is it only addresses a specific use-case and not the general issue:
#respond_to?
should work on any object, even uninitialized and just #allocate
-d.
Kernel#respond_to_missing?
works on any object, but OpenStruct#respond_to_missing?
does not currently.
For instance, Class.instance_method(:allocate).bind(OpenStruct).call.respond_to?(:foo)
breaks.
Same for rb_obj_alloc()
+ rb_obj_respond_to()
or Ruby #respond_to?
Note that all of these used to work in Ruby 2.2.
I will commit my patch tomorrow unless you object.
It is more robust and has no significant downside.
Updated by nobu (Nobuyoshi Nakada) over 7 years ago
Eregon (Benoit Daloze) wrote:
But my main issue with this fix is it only addresses a specific use-case and not the general issue:
#respond_to?
should work on any object, even uninitialized and just#allocate
-d.
Kernel#respond_to_missing?
works on any object, butOpenStruct#respond_to_missing?
does not currently.
I can't get your point.
OpenStruct#respond_to?
works as others, and OpenStruct#respond_to_missing?
too.
Anyway Kernel#respond_to_missing?
is private and not to be used directly.
For instance,
Class.instance_method(:allocate).bind(OpenStruct).call.respond_to?(:foo)
breaks.
Why bypass overridden methods?
I will commit my patch tomorrow unless you object.
I object.
It is more robust and has no significant downside.
No merit.
Updated by Eregon (Benoit Daloze) over 7 years ago
nobu (Nobuyoshi Nakada) wrote:
Eregon (Benoit Daloze) wrote:
But my main issue with this fix is it only addresses a specific use-case and not the general issue:
#respond_to?
should work on any object, even uninitialized and just#allocate
-d.
Kernel#respond_to_missing?
works on any object, butOpenStruct#respond_to_missing?
does not currently.I can't get your point.
OpenStruct#respond_to?
works as others, andOpenStruct#respond_to_missing?
too.
AnywayKernel#respond_to_missing?
is private and not to be used directly.
No, OpenStruct#respond_to?
does not:
p Class.instance_method(:allocate).bind(OpenStruct).call.respond_to?(:foo) # or rb_obj_alloc() in C
Trunk:
.../ruby-trunk/lib/ruby/2.5.0/ostruct.rb:201:in 'respond_to_missing?': undefined method 'key?' for nil:NilClass (NoMethodError)
Ruby 2.2 (or on any other instance than a OpenStruct like Object.allocate.respond_to?(:foo)
):
false
For instance,
Class.instance_method(:allocate).bind(OpenStruct).call.respond_to?(:foo)
breaks.Why bypass overridden methods?
Because it is possible, and I don't see why that should raise an error.
Class#allocate
can allocate any object except OpenStruct
?
That seems a not needed exception.
It's also what rb_obj_alloc()
does.
I will commit my patch tomorrow unless you object.
I object.
It is more robust and has no significant downside.
No merit.
Robustness of Ruby code has no merit?
The reason of this bug is the previous workaround.
Now it's a workaround on top of another workaround,
using meta-programming just to avoid a nil check,
and incorrect if #allocate
is not called dynamically.
It also breaks the contract of #allocate
to not initialize the instance,
and still lets OpenStruct#respond_to?
raise an error when Kernel#respond_to?
never does.
I attach my proposed patch, please have a look and consider its merits.
Updated by Eregon (Benoit Daloze) over 7 years ago
nobu (Nobuyoshi Nakada) wrote:
No merit.
Beyond the reasons above, is there no merit for such a small and localized patch?
I thought that was quite valued in MRI.
Updated by nobu (Nobuyoshi Nakada) over 7 years ago
- Status changed from Closed to Assigned
- Assignee set to Eregon (Benoit Daloze)
OK, although I still think it is too artificial example, nobody would mind a little change of the OpenStruct
performance.
Updated by Eregon (Benoit Daloze) over 7 years ago
- Status changed from Assigned to Closed
Applied in changeset trunk|r58229.
ostruct.rb: improve fix for OpenStruct.allocate + #respond_to?
- lib/ostruct.rb (OpenStruct#respond_to_missing?): this makes
OpenStruct#respond_to? works on any OpenStruct instance,
just like Kernel#respond_to? does, without workarounds.
[ruby-core:80292] [Bug #13358]
Updated by Eregon (Benoit Daloze) over 7 years ago
- Backport changed from 2.2: DONTNEED, 2.3: DONE, 2.4: DONTNEED to 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: DONTNEED
nobu (Nobuyoshi Nakada) wrote:
OK, although I still think it is too artificial example, nobody would mind a little change of the
OpenStruct
performance.
Thank you for your consideration, I committed as r58229.
I will let usa-san, the new 2.3 branch maintainer, decide whether this is worth backporting to 2.3.
I personally would recommend it.
P.S.:
Oddly enough, I did not receive your reply by email and it was not given a [ruby-core] id.
Maybe because it includes a Status update?
Updated by Eregon (Benoit Daloze) over 7 years ago
- Backport changed from 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: DONTNEED to 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: REQUIRED
I believe this should be backported to 2.4, this is notably the version the OP used.
Updated by nagachika (Tomoyuki Chikanaga) over 7 years ago
- Backport changed from 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: REQUIRED to 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: DONE
ruby_2_4 r59407 merged revision(s) 58077,58229.
Updated by usa (Usaku NAKAMURA) over 7 years ago
- Backport changed from 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: DONE to 2.2: DONTNEED, 2.3: DONTNEED, 2.4: DONE
Updated by PSchambacher (Pierre Schambacher) about 7 years ago
Any chance this could be backported to ruby 2.3? It's creating issues because we're loading OpenStruct from the database and Ruby 2.4 is still several months ahead for this project.