Feature #10038
openExtend ObjectSpace.dump to expose buffer addresses for String and Array
Description
ObjectSpace.dump() expose internal information in JSON.
How about to expose buffer addresses for String and Array?
Index: ext/objspace/objspace_dump.c
===================================================================
--- ext/objspace/objspace_dump.c (revision 46821)
+++ ext/objspace/objspace_dump.c (working copy)
@@ -178,12 +178,16 @@ dump_object(VALUE obj, struct dump_confi
dump_append(dc, ", \"broken\":true");
if (FL_TEST(obj, RSTRING_FSTR))
dump_append(dc, ", \"fstring\":true");
- if (STR_SHARED_P(obj))
+
+ if (STR_SHARED_P(obj)) {
dump_append(dc, ", \"shared\":true");
+ }
else {
dump_append(dc, ", \"bytesize\":%ld", RSTRING_LEN(obj));
- if (!STR_EMBED_P(obj) && !STR_SHARED_P(obj) && (long)rb_str_capacity(obj) != RSTRING_LEN(obj))
+ if (!STR_EMBED_P(obj) && !STR_SHARED_P(obj) && (long)rb_str_capacity(obj) != RSTRING_LEN(obj)) {
dump_append(dc, ", \"capacity\":%ld", rb_str_capacity(obj));
+ dump_append(dc, ", \"ptr\":\"%p\"", RSTRING_PTR(obj));
+ }
if (is_ascii_string(obj)) {
dump_append(dc, ", \"value\":");
@@ -205,8 +209,14 @@ dump_object(VALUE obj, struct dump_confi
dump_append(dc, ", \"length\":%ld", RARRAY_LEN(obj));
if (RARRAY_LEN(obj) > 0 && FL_TEST(obj, ELTS_SHARED))
dump_append(dc, ", \"shared\":true");
- if (RARRAY_LEN(obj) > 0 && FL_TEST(obj, RARRAY_EMBED_FLAG))
+ if (RARRAY_LEN(obj) > 0) {
+ if (FL_TEST(obj, RARRAY_EMBED_FLAG)) {
dump_append(dc, ", \"embedded\":true");
+ }
+ else {
+ dump_append(dc, ", \"ptr\":\"%p\"", RARRAY_PTR(obj));
+ }
+ }
break;
case T_CLASS:
With this hack, we can know the real memory address of them and cooperate with other native tools.
BTW, ObjectSpace.dump() should support T_SYMBOL.
Updated by headius (Charles Nutter) over 10 years ago
-1 from me, mostly because exposing actual memory addresses will further limit what Ruby can do with object references. In the presence of RGenGC, some objects are already being relocated in memory, so those objects would either have to become "shady" or not expose their addresses. As MRI's GC improves further, this will become more and more common.
And another (perhaps minor) reason: other implementations that already have fully generational GCs won't be able to support this feature at all.
If a feature like this must be added, put it in an MRI-specific namespace like RubyVM. And I'd still recommend not adding another feature that limits Ruby's (and MRI's) evolution.
Updated by normalperson (Eric Wong) over 10 years ago
headius@headius.com wrote:
If a feature like this must be added, put it in an MRI-specific
namespace like RubyVM. And I'd still recommend not adding another
feature that limits Ruby's (and MRI's) evolution.
The objspace extension is already implementation-specific and documented
as having no compatibility guarantees (even with future/past MRI
versions).
I'm only mildly in favor of it, I'm not sure I would use or need this.
Updated by headius (Charles Nutter) over 10 years ago
Eric Wong wrote:
The objspace extension is already implementation-specific and documented
as having no compatibility guarantees (even with future/past MRI
versions).
I suppose that's fair. We haven't had anyone request features from 'objspace' in JRuby.
I would still prefer an MRI-specific namespace for features only MRI can implement. JRuby and Rubinius have both been a good citizens, keeping our impl-specific APIs hidden. MRI should do the same.
Updated by ko1 (Koichi Sasada) over 10 years ago
(2014/07/17 23:52), headius@headius.com wrote:
I would still prefer an MRI-specific namespace for features only MRI can implement. JRuby and Rubinius have both been a good citizens, keeping our impl-specific APIs hidden. MRI should do the same.
I'm neutral about it. There are many MRI specific features and
behaviors. I want to wait about discussion.
(Ruby 3 can do it?)
--
// SASADA Koichi at atdot dot net
Updated by enebo (Thomas Enebo) over 10 years ago
ko1, when you say native tools, do you mean things like valgrind? Can you elaborate a bit (curious since I am doing some stuff with Java Heap dumps atm)?
I am neutral on this because it feels like an extension to help with your implementation and does not seem like 'Ruby'. I would like there to be some convention in MRI source perhaps (or even docs) indicating that something is an impl-specific method or API. I don't mind that each impl has extra useful stuff, but I would like a standard for indicating that it is impl-specific.
Updated by enebo (Thomas Enebo) over 10 years ago
Oh! I should also note I know dump_all is marked as experimental and implementation-specific in the documents so I am not saying you are not trying to mark things. I am just hoping we all can agree on a standard for this.
Also, I am interested in supporting dump_all in JRuby and it would be great if we could get to a common subset of what both impls can dump.
Being experimental and also that JRuby does not implement this atm makes this last paragraph a desire at this point.
Updated by ko1 (Koichi Sasada) over 10 years ago
Thomas Enebo wrote:
ko1, when you say native tools, do you mean things like valgrind? Can you elaborate a bit (curious since I am doing some stuff with Java Heap dumps atm)?
I wrote a tool to detect which memory part is shared with parent and child processes like that.
(Make big Array and detect how many memory pages are shared or not, with Linux specific feature)
require 'objspace'
require 'json'
PAGES = 1024 * 100 #* 1024
ary = Array.new(4096/8 * PAGES){|e| nil}
class SharedState
PAGE_SIZE = 4096
def initialize
@compare_process = Process.pid
end
def each_page ptr, size
page = first_page = (ptr/PAGE_SIZE) * PAGE_SIZE
last_page = ((ptr+size)/PAGE_SIZE) * PAGE_SIZE
while page <= last_page
yield page/PAGE_SIZE
page += PAGE_SIZE
end
end
def open_proc_files
open("/proc/#{@compare_process}/pagemap"){|f1|
open("/proc/#{Process.pid}/pagemap"){|f2|
yield f1, f2
}
}
end
def get_pfn f, page_index
f.seek(page_index * 8)
d = f.read(8).unpack('Q').first
d & ~(0xff << 55)
end
def make_shared_ary ptr, size
ary = []
open_proc_files{|f1, f2|
each_page(ptr, size){|page_index|
pfn1 = get_pfn f1, page_index
pfn2 = get_pfn f2, page_index
ary << (pfn1 == pfn2)
}
}
p true: ary.count(true), false: ary.count(false)
ary
end
def shared? obj
info = JSON.load(ObjectSpace.dump(obj))
case info['type']
when 'ARRAY'
ptr = info['ptr'].to_i(16)
size = info['memsize']
return unless ptr
make_shared_ary(ptr, size)
else
raise 'unsupported'
end
end
end
ss = SharedState.new
pid = fork{
p
ss.shared?(ary)
# ary[0] = true
(PAGES/4).times{|i|
ary[4096/8 * i * 4] = true
}
p
ss.shared?(ary)
}
p Process.waitpid(pid)
I can make it by gem. But it is easy to modify this ext compare with making gem for me :)
I am neutral on this because it feels like an extension to help with your implementation and does not seem like 'Ruby'. I would like there to be some convention in MRI source perhaps (or even docs) indicating that something is an impl-specific method or API. I don't mind that each impl has extra useful stuff, but I would like a standard for indicating that it is impl-specific.
I'm also neutral, it is only just idea.
I propose this idea because this feature is already "implementation specific (MRI and versions)", and for example, its provides implementation/version specific information such as WB-protected.
BTW, above tool is only a toy to know can I detect such shared pages.
And I could do. I had achieved my goal.
I will make more general tools to analyse to make ruby interepter more CoW frinedly,
Yes, like valgrind (or with valgrind).
Updated by hsbt (Hiroshi SHIBATA) over 10 years ago
- Status changed from Open to Assigned
Updated by naruse (Yui NARUSE) almost 7 years ago
- Target version deleted (
2.2.0)