Bug #13397
closed#object_id should not be signed
Description
It is surprising that #object_id returns signed value. Let me explain show two examples. Working with 32b Ruby (ruby 2.4.0p0 (2016-12-24 revision 57164) [i386-linux]) to make this issue more apparent.
$ ruby << \EOR
GC.disable
3_000_000.times { p Object.new.inspect }
EOR
"#<Object:0x57d49a5c>"
"#<Object:0x57d499a8>"
"#<Object:0x57d49930>"
"#<Object:0x57d498b8>"
... snip ...
"#<Object:0x828bf164>"
"#<Object:0x828bf0ec>"
"#<Object:0x828bf074>"
"#<Object:0x828beffc>"
"#<Object:0x828bef84>"
^C-:2:in `p': Interrupt
from -:2:in `block in <main>'
from -:2:in `times'
from -:2:in `<main>'
"#<Object:0x8290b1f4>"
In this example, the "object_id", which is part of the inspect object is unsigned, since it is printed using C sprintf with %p format. There are other libraries, which tries to mimic the output [ 1 ]. The implementation is approximately following:
$ ruby << \EOR
GC.disable
class A
DEFAULT_OBJ_ID_STR_WIDTH = 0.size == 4 ? 7 : 14
def inspect
id_str = (object_id << 1).to_s(16).rjust(DEFAULT_OBJ_ID_STR_WIDTH, '0')
"#<#{self.class.name}:0x#{id_str}>"
end
end
3_000_000.times { p A.new.inspect }
EOR
"#<A:0x58585428>"
"#<A:0x585852d4>"
"#<A:0x585851bc>"
"#<A:0x5858507c>"
"#<A:0x58584ec4>"
"#<A:0x58584d5c>"
"#<A:0x58584c1c>"
"#<A:0x58584adc>"
... snip ...
"#<A:0x7fff4888>"
"#<A:0x7fff47c0>"
"#<A:0x7fff46f8>"
"#<A:0x7fff4630>"
"#<A:0x7fff4568>"
"#<A:0x7fff44a0>"
"#<A:0x7fff43d8>"
"#<A:0x7fff4310>"
"#<A:0x7fff4248>"
"#<A:0x7fff4180>"
"#<A:0x7fff40b8>"
"#<A:0x-7fffc034>"
"#<A:0x-7fffc110>"
"#<A:0x-7fffc1ec>"
"#<A:0x-7fffc2c8>"
"#<A:0x-7fffc3a4>"
"#<A:0x-7fffc480>"
"#<A:0x-7fffc55c>"
"#<A:0x-7fffc638>"
^C-:10:in `p': Interrupt
from -:10:in `block in <main>'
from -:10:in `times'
from -:10:in `<main>'
And the output is quite surprising to me. Why the object_id should be signed value? It doesn't make any sense to me. Is this implementation wrong or is Ruby wrong?
Updated by nobu (Nobuyoshi Nakada) about 6 years ago
It's not to make object_id
s Bignum as possible.
Updated by vo.x (Vit Ondruch) about 6 years ago
Ok, you want to prevent Bignums, but what is the suggested solution here? Do some pack("l").unpack("L") to get the expected value? Or you can provide object_hexid [ 1 ]. Or provide some %p equivalent formatter?
Because I am afraid that the "inspect" misuse is widespread. Here are some examples, even in Ruby codebase:
https://github.com/ruby/ruby/blob/trunk/lib/rubygems/dependency_list.rb#L107
https://github.com/ruby/ruby/blob/trunk/lib/rubygems/platform.rb#L115
https://github.com/ruby/ruby/blob/trunk/lib/rdoc/attr.rb#L90
https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent/map.rb#L218
https://github.com/rails/rails/blob/master/actionview/lib/action_view/template/resolver.rb#L59
Updated by vo.x (Vit Ondruch) about 6 years ago
Last CI build of concurrent-ruby in Fedora failed again:
https://apps.fedoraproject.org/koschei/package/rubygem-concurrent-ruby?collection=f27
https://kojipkgs.fedoraproject.org/work/tasks/6296/19116296/build.log
Is there a chance to find some generic reliable solution to this? Or is it just feature and I should persuade all the project to fix their Regexps 1 or implementation of #object_id?
Updated by nobu (Nobuyoshi Nakada) about 6 years ago
vo.x (Vit Ondruch) wrote:
Is there a chance to find some generic reliable solution to this? Or is it just feature and I should persuade all the project to fix their Regexps [1] or implementation of #object_id?
They seem to have the solution already.
def inspect
"#{to_s[0..-2]} intended_time: #{@IntendedTime}>"
end
The default Kernel#to_s
returns a string with the class name and the object ID.
Updated by vo.x (Vit Ondruch) about 6 years ago
I created PR fixing the projects listed above.
But still, this seems so generic and unexpected issue, possibly influencing libraries which are part of StdLib [3, 4], I'd like to see this improved. Couldn't the #format accept "object_id" formatter for example?
[1] https://github.com/ruby-concurrency/concurrent-ruby/pull/651
[2] https://github.com/rails/rails/pull/28902
[3] https://github.com/rubygems/rubygems/pull/1908
[4] https://github.com/rdoc/rdoc/pull/451
Updated by nobu (Nobuyoshi Nakada) about 6 years ago
Such feature would be nice, but can't help libraries which support older versions.
commit 1f7154a4ae0e480774dbfe79905b21d26d5b4cbc
Author: Nobuyoshi Nakada <nobu@ruby-lang.org>
Date: Fri Apr 28 16:35:48 2017
`%I` for object ID
diff --git a/sprintf.c b/sprintf.c
index f2d51f1c76..758e32fed4 100644
--- a/sprintf.c
+++ b/sprintf.c
@@ -25,6 +25,7 @@
static char *fmt_setup(char*,size_t,int,int,int,int);
static char *ruby_ultoa(unsigned long val, char *endp, int base, int octzero);
+static char *ruby_ptoa(VALUE val, char *endp, int base, int octzero);
static char
sign_bits(int base, const char *p)
@@ -781,6 +782,18 @@ rb_str_format(int argc, const VALUE *argv, VALUE fmt)
}
break;
+ case 'I':
+ {
+ VALUE arg = GETARG();
+ prec = (SIZEOF_VOIDP * CHAR_BIT + 3) / 4;
+ CHECK(prec + 2);
+ PUSH_("0x", 2);
+ t = ruby_ptoa(arg, &buf[blen + prec], 16, 0);
+ if (t > &buf[blen]) memset(&buf[blen], '0', t - &buf[blen]);
+ blen += prec;
+ }
+ break;
+
case 'd':
case 'i':
case 'o':
@@ -1255,6 +1268,18 @@ ruby_ultoa(unsigned long val, char *endp, int base, int flags)
return BSD__ultoa(val, endp, base, octzero, xdigs);
}
+static char *
+ruby_ptoa(VALUE val, char *endp, int base, int flags)
+{
+ const char *xdigs = lower_hexdigits;
+ int octzero = flags & FSHARP;
+#ifdef _HAVE_SANE_QUAD_
+ return BSD__uqtoa(val, endp, base, octzero, xdigs);
+#else
+ return BSD__ultoa(val, endp, base, octzero, xdigs);
+#endif
+}
+
int
ruby_vsnprintf(char *str, size_t n, const char *fmt, va_list ap)
{
diff --git a/test/ruby/test_sprintf.rb b/test/ruby/test_sprintf.rb
index 1bf65f1eab..0b7a11a33b 100644
--- a/test/ruby/test_sprintf.rb
+++ b/test/ruby/test_sprintf.rb
@@ -513,4 +513,9 @@
assert_equal before + 1, after, 'only new string is the created one'
assert_equal '1970-01-01', val
end
+
+ def test_object_id
+ x = Object.new
+ assert_equal(x.to_s, sprintf("#<%s:%I>", x.class, x))
+ end
end
Updated by vo.x (Vit Ondruch) about 6 years ago
The patch is missing documentation, otherwise I love it!
Updated by vo.x (Vit Ondruch) almost 6 years ago
Yet another instance of this issue:
Updated by headius (Charles Nutter) over 4 years ago
You would be well-advised to avoid object_id. It does not do what you think it does. By returning a pointer reference into the garbage-collected heap, it's possible for the same object_id to refer to different objects over time.
I have proposed deprecating and eventually removing both object_id and _id2ref, since implementing them safely would largely make them useless: https://bugs.ruby-lang.org/issues/15408
Updated by jeremyevans0 (Jeremy Evans) about 3 years ago
- Status changed from Open to Closed
With the change to use an incrementing object_id in Ruby 2.7, I don't think this is an issue anymore. The object_id counter is stored as a Ruby value, so at some point it probably goes from Fixnum to Bignum, but it should not go negative.