Project

General

Profile

Actions

Bug #13358

closed

OpenStruct overriding allocate

Added by sitter (Harald Sitter) almost 8 years ago. Updated about 7 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
[ruby-core:80292]

Description

In https://github.com/ruby/ruby/commit/15960b37e82ba60455c480b1c23e1567255d3e05 OpenStruct gained

  class << self # :nodoc:
    alias allocate new
  end

Which is rather severely conflicting with expected behavior as Class.allocate is meant to not call initialize. So, in fact, the change made allocate of OpenStruct do what allocate is asserting not to do :-/

For OpenStruct itself that isn't that big a deal, for classes inheriting from OpenStruct it breaks allocate though.

Example:

require 'ostruct'

class A < OpenStruct
  def initialize(x, y = {})
    super(y)
  end
end

A.allocate

As allocate is alias'd to new in OpenStruct this will attempt to initialize A which will raise an ArgumentError because A cannot be initialized without arguments.

$ ruby x.rb
x.rb:4:in `initialize': wrong number of arguments (given 0, expected 1..2) (ArgumentError)
        from x.rb:9:in `new'
        from x.rb:9:in `<main>'

OpenStruct at the very least should document the fact that its allocate is behaving differently.
Ideally, OpenStruct should not alias allocate at all.


Files

0001-ostruct.rb-improve-fix-for-OpenStruct.allocate-respo.patch (1.19 KB) 0001-ostruct.rb-improve-fix-for-OpenStruct.allocate-respo.patch Proposed fix for OpenStruct#respond_to_missing? Eregon (Benoit Daloze), 03/28/2017 08:41 AM

Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #11884: Psych.load broken for OpenStruct in Ruby 2.3.0Closedmarcandre (Marc-Andre Lafortune)Actions

Updated by Eregon (Benoit Daloze) almost 8 years ago

I agree making OpenStruct.allocate an alias of Class#new is not a great idea and easily problematic.
We had a problem due to this in TruffleRuby (now fixed by renaming all our internal allocate methods).

The fix we came up with seems cleaner and let YAML deserialize OpenStruct (the original reason why this alias was introduced)
by making respond_to? work on a OpenStruct.allocate instance.
It only needs an extra "&" in
https://github.com/ruby/ruby/blob/d77214e8a37efecfa835ce56ce5f11e4125effd4/lib/ostruct.rb#L198
therefore becoming:
https://github.com/graalvm/truffleruby/blob/8338f62a711e2a1d816dd8393b9402c6d3238e8c/lib/patches/ostruct.rb#L10

Actions #2

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

  • Description updated (diff)
  • Backport changed from 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN to 2.2: DONTNEED, 2.3: REQUIRED, 2.4: DONTNEED

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

Note that OpenStruct.allocate is intended to be used by only psych.
Not only OpenStruct, Class#allocate is not intended to be called directly.

Actions #4

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

  • Related to Bug #11884: Psych.load broken for OpenStruct in Ruby 2.3.0 added
Actions #5

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

  • Status changed from Open to Closed

Applied in changeset r58077.


ostruct.rb: fix OpenStruct.allocate

  • lib/ostruct.rb (OpenStruct.allocate): initialize an instance
    variable directly, without calling intialize method which may
    be overridden in a subclass. [ruby-core:80292] [Bug #13358]

Updated by Eregon (Benoit Daloze) over 7 years ago

nobu (Nobuyoshi Nakada) wrote:

Note that OpenStruct.allocate is intended to be used by only psych.
Not only OpenStruct, Class#allocate is not intended to be called directly.

Lots of Ruby code calls Class#allocate.
Of course, #initialize should be called directly after or some other way to initialize the instance.

Why another patch? It seems more hacky and less performant.

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

Eregon (Benoit Daloze) wrote:

Lots of Ruby code calls Class#allocate.
Of course, #initialize should be called directly after or some other way to initialize the instance.

For what purpose?
It should be used by Class#new, Kernel#dup, Kernel#clone, and Marshal.load.

Why another patch? It seems more hacky and less performant.

As it is not called usually, it should take the performance penalty, not respond_to?.

Updated by Eregon (Benoit Daloze) over 7 years ago

nobu (Nobuyoshi Nakada) wrote:

For what purpose?
It should be used by Class#new, Kernel#dup, Kernel#clone, and Marshal.load.

But not only, after all it is a public and well-known method (many hits on GitHub code search).
Many deserializing libraries will use it,
but also various libraries which need to build an instance without passing in state,
or use it as a way to replicate Class#new without using super like (maybe to use a different initializing method):

def self.new(name)
  if r = CACHE[name]
    r
  else
    # Could use super(name), but not everybody does
    allocate.tap { |obj| obj.send(:initialize, name) }
  end
end

As it is not called usually, it should take the performance penalty, not respond_to?.

I think respond_to? should work and not throw an exception on any Object,
even if not fully initialized, just like say #instance_variable_defined?, #equal? and #object_id.

The performance hit on respond_to? is not significant, it's just an extra NIL_P.
On the other hand, the one on allocate is, and affects every caller of OpenStruct.allocate.

require 'benchmark/ips'
require 'ostruct'

class SomeClass
end

Benchmark.ips do |x|
  x.report("OpenStruct.allocate") { OpenStruct.allocate }
  x.report("Class#allocate") { SomeClass.allocate }
  x.compare!
end

class MyOpenStruct
  def respond_to_missing?(mid, include_private = false) # :nodoc:
    mname = mid.to_s.chomp("=").to_sym
    @table&.key?(mname) || super
  end
end

os = OpenStruct.new
my = MyOpenStruct.new

Benchmark.ips do |x|
  x.report("OpenStruct#respond_to?") { os.respond_to?(:init_with?) }
  x.report("MyOpenStruct#respond_to?") { my.respond_to?(:init_with?) }
  x.compare!
end
Comparison:
      Class#allocate:  7783868.7 i/s
 OpenStruct.allocate:  3813277.6 i/s - 2.04x  slower
Comparison:
  OpenStruct#respond_to?:  1752400.3 i/s
MyOpenStruct#respond_to?:  1734649.5 i/s - same-ish: difference falls within error

@nobu (Nobuyoshi Nakada): Can I commit my fix?
It is faster and does not change the semantics of OpenStruct.allocate.

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

Eregon (Benoit Daloze) wrote:

But not only, after all it is a public and well-known method (many hits on GitHub code search).
Many deserializing libraries will use it,

Methods of constructed objects will be used more than construction usually.

but also various libraries which need to build an instance without passing in state,
or use it as a way to replicate Class#new without using super like (maybe to use a different initializing method):

It doesn't need allocate, CACHE[name] || super.

The performance hit on respond_to? is not significant, it's just an extra NIL_P.

And a branch.

On the other hand, the one on allocate is, and affects every caller of OpenStruct.allocate.

Why do you think the performance of allocate matters?
Note that it is never used in common, since Class#new never calls allocate overridden in ruby level.

Updated by nagachika (Tomoyuki Chikanaga) over 7 years ago

  • Backport changed from 2.2: DONTNEED, 2.3: REQUIRED, 2.4: DONTNEED to 2.2: DONTNEED, 2.3: DONE, 2.4: DONTNEED

ruby_2_3 r58161 merged revision(s) 58077.

Updated by Eregon (Benoit Daloze) over 7 years ago

nobu (Nobuyoshi Nakada) wrote:

Eregon (Benoit Daloze) wrote:

But not only, after all it is a public and well-known method (many hits on GitHub code search).
Many deserializing libraries will use it,

Methods of constructed objects will be used more than construction usually.

but also various libraries which need to build an instance without passing in state,
or use it as a way to replicate Class#new without using super like (maybe to use a different initializing method):

It doesn't need allocate, CACHE[name] || super.

Yes, nevertheless existing code uses it.
And if sby needed some special method like #initialize_native,
calling #allocate is the most natural way to do it in Ruby.

The performance hit on respond_to? is not significant, it's just an extra NIL_P.

And a branch.

Which is insignificant compared to the rest of the logic in #respond_to_missing?.

On the other hand, the one on allocate is, and affects every caller of OpenStruct.allocate.

Why do you think the performance of allocate matters?
Note that it is never used in common, since Class#new never calls allocate overridden in ruby level.

I think this bug is no reason to make a method 2x slower.
Ruby code might call #allocate for various reasons.

But my main issue with this fix is it only addresses a specific use-case and not the general issue:
#respond_to? should work on any object, even uninitialized and just #allocate-d.
Kernel#respond_to_missing? works on any object, but OpenStruct#respond_to_missing? does not currently.

For instance, Class.instance_method(:allocate).bind(OpenStruct).call.respond_to?(:foo) breaks.
Same for rb_obj_alloc() + rb_obj_respond_to() or Ruby #respond_to?
Note that all of these used to work in Ruby 2.2.

I will commit my patch tomorrow unless you object.
It is more robust and has no significant downside.

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

Eregon (Benoit Daloze) wrote:

But my main issue with this fix is it only addresses a specific use-case and not the general issue:
#respond_to? should work on any object, even uninitialized and just #allocate-d.
Kernel#respond_to_missing? works on any object, but OpenStruct#respond_to_missing? does not currently.

I can't get your point.
OpenStruct#respond_to? works as others, and OpenStruct#respond_to_missing? too.
Anyway Kernel#respond_to_missing? is private and not to be used directly.

For instance, Class.instance_method(:allocate).bind(OpenStruct).call.respond_to?(:foo) breaks.

Why bypass overridden methods?

I will commit my patch tomorrow unless you object.

I object.

It is more robust and has no significant downside.

No merit.

Updated by Eregon (Benoit Daloze) over 7 years ago

nobu (Nobuyoshi Nakada) wrote:

Eregon (Benoit Daloze) wrote:

But my main issue with this fix is it only addresses a specific use-case and not the general issue:
#respond_to? should work on any object, even uninitialized and just #allocate-d.
Kernel#respond_to_missing? works on any object, but OpenStruct#respond_to_missing? does not currently.

I can't get your point.
OpenStruct#respond_to? works as others, and OpenStruct#respond_to_missing? too.
Anyway Kernel#respond_to_missing? is private and not to be used directly.

No, OpenStruct#respond_to? does not:

p Class.instance_method(:allocate).bind(OpenStruct).call.respond_to?(:foo) # or rb_obj_alloc() in C

Trunk:
.../ruby-trunk/lib/ruby/2.5.0/ostruct.rb:201:in 'respond_to_missing?': undefined method 'key?' for nil:NilClass (NoMethodError)
Ruby 2.2 (or on any other instance than a OpenStruct like Object.allocate.respond_to?(:foo)):
false

For instance, Class.instance_method(:allocate).bind(OpenStruct).call.respond_to?(:foo) breaks.

Why bypass overridden methods?

Because it is possible, and I don't see why that should raise an error.
Class#allocate can allocate any object except OpenStruct?
That seems a not needed exception.

It's also what rb_obj_alloc() does.

I will commit my patch tomorrow unless you object.

I object.

It is more robust and has no significant downside.

No merit.

Robustness of Ruby code has no merit?

The reason of this bug is the previous workaround.
Now it's a workaround on top of another workaround,
using meta-programming just to avoid a nil check,
and incorrect if #allocate is not called dynamically.
It also breaks the contract of #allocate to not initialize the instance,
and still lets OpenStruct#respond_to? raise an error when Kernel#respond_to? never does.

I attach my proposed patch, please have a look and consider its merits.

Updated by Eregon (Benoit Daloze) over 7 years ago

nobu (Nobuyoshi Nakada) wrote:

No merit.

Beyond the reasons above, is there no merit for such a small and localized patch?
I thought that was quite valued in MRI.

Actions #16

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

  • Status changed from Closed to Assigned
  • Assignee set to Eregon (Benoit Daloze)

OK, although I still think it is too artificial example, nobody would mind a little change of the OpenStruct performance.

Actions #17

Updated by Eregon (Benoit Daloze) over 7 years ago

  • Status changed from Assigned to Closed

Applied in changeset trunk|r58229.


ostruct.rb: improve fix for OpenStruct.allocate + #respond_to?

  • lib/ostruct.rb (OpenStruct#respond_to_missing?): this makes
    OpenStruct#respond_to? works on any OpenStruct instance,
    just like Kernel#respond_to? does, without workarounds.
    [ruby-core:80292] [Bug #13358]

Updated by Eregon (Benoit Daloze) over 7 years ago

  • Backport changed from 2.2: DONTNEED, 2.3: DONE, 2.4: DONTNEED to 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: DONTNEED

nobu (Nobuyoshi Nakada) wrote:

OK, although I still think it is too artificial example, nobody would mind a little change of the OpenStruct performance.

Thank you for your consideration, I committed as r58229.

I will let usa-san, the new 2.3 branch maintainer, decide whether this is worth backporting to 2.3.
I personally would recommend it.

P.S.:
Oddly enough, I did not receive your reply by email and it was not given a [ruby-core] id.
Maybe because it includes a Status update?

Updated by Eregon (Benoit Daloze) over 7 years ago

  • Backport changed from 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: DONTNEED to 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: REQUIRED

I believe this should be backported to 2.4, this is notably the version the OP used.

Updated by nagachika (Tomoyuki Chikanaga) over 7 years ago

  • Backport changed from 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: REQUIRED to 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: DONE

ruby_2_4 r59407 merged revision(s) 58077,58229.

Actions #21

Updated by usa (Usaku NAKAMURA) over 7 years ago

  • Backport changed from 2.2: DONTNEED, 2.3: UNKNOWN, 2.4: DONE to 2.2: DONTNEED, 2.3: DONTNEED, 2.4: DONE

Updated by PSchambacher (Pierre Schambacher) about 7 years ago

Any chance this could be backported to ruby 2.3? It's creating issues because we're loading OpenStruct from the database and Ruby 2.4 is still several months ahead for this project.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0