Feature #10185

[PATCH] iseq: free untranslated iseq->iseq at compile

Added by Eric Wong 9 months ago. Updated 9 months ago.

[ruby-core:64663]
Status:Closed
Priority:Normal
Assignee:Eric Wong

Description

running "ruby -rpp -e 'pp GC.stat'", a reduction in
malloc usage is shown:

before:
:malloc_increase=>118784,
:oldmalloc_increase=>1178736,

after:
:malloc_increase=>99832,
:oldmalloc_increase=>1031976,

(numbers from x86-64)

  • compile.c (rb_iseq_translate_threaded_code): free iseq->iseq
    (rb_vm_addr2insn): new function for debug
    (rb_iseq_untranslate_threaded_code): ditto

  • iseq.c (iseq_memsize): support iseq->iseq == 0
    (rb_iseq_disasm): untranslate for dump
    (iseq_data_to_ary): ditto
    (rb_iseq_line_trace_each): ditto

  • vm_dump.c (rb_vmdebug_debug_print_pre): ditto

I think we can remove iseq->iseq field from the rb_iseq_t structure in
the future and use transient allocation for the debug functions.
rb_iseq_untranslate_threaded_code is an uncommon code path for debug and
not a performance issue in normal code.

iseq-tc-diet.patch Magnifier (4.43 KB) Eric Wong, 08/29/2014 09:24 PM

iseq-iseq-diet.patch Magnifier (9.54 KB) Eric Wong, 08/29/2014 11:10 PM

0001-iseq-translate-iseq-inplace.patch Magnifier (10.1 KB) Eric Wong, 09/09/2014 08:21 AM

Associated revisions

Revision 47508
Added by normal 9 months ago

compile: translate iseq in-place

running "ruby -rpp -e 'pp GC.stat'", a reduction in
malloc usage is shown:

before:
:malloc_increase=>118784,
:oldmalloc_increase=>1178736,

after:
:malloc_increase=>99832,
:oldmalloc_increase=>1031976,

For "ruby -e exit", valgrind reports over 300K reduction in
overall allocations (and unnecessary memory copies).

before:
total heap usage: 49,622 allocs, 20,492 frees, 8,697,493 bytes allocated
after:
total heap usage: 48,935 allocs, 19,805 frees, 8,373,773 bytes allocated

(numbers from x86-64)

v2 changes based on ko1 recommendations :
- squashed in-place direct thread translation to avoid alloc+copy
- renamed rb_iseq_untranslate_threaded_code to rb_iseq_original_iseq,
cache new iseq->iseq_original field.

  • compile.c (rb_iseq_translate_threaded_code): modify in-place w/o copy
    (rb_vm_addr2insn): new function for debug
    (rb_iseq_original_iseq): ditto
    (iseq_set_sequence): assign iseq_encoded directly
    [Feature #10185]

  • vm_core (rb_iseq_t): move original ->iseq to bottom

  • iseq.c (iseq_free, iseq_free): adjust for new layout
    (rb_iseq_disasm): use original iseq for dump
    (iseq_data_to_ary): ditto
    (rb_iseq_line_trace_each): ditto
    (rb_iseq_build_for_ruby2cext): use iseq_encoded directly

  • vm_dump.c (rb_vmdebug_debug_print_pre): use original iseq

Revision 47508
Added by normal 9 months ago

compile: translate iseq in-place

running "ruby -rpp -e 'pp GC.stat'", a reduction in
malloc usage is shown:

before:
:malloc_increase=>118784,
:oldmalloc_increase=>1178736,

after:
:malloc_increase=>99832,
:oldmalloc_increase=>1031976,

For "ruby -e exit", valgrind reports over 300K reduction in
overall allocations (and unnecessary memory copies).

before:
total heap usage: 49,622 allocs, 20,492 frees, 8,697,493 bytes allocated
after:
total heap usage: 48,935 allocs, 19,805 frees, 8,373,773 bytes allocated

(numbers from x86-64)

v2 changes based on ko1 recommendations :
- squashed in-place direct thread translation to avoid alloc+copy
- renamed rb_iseq_untranslate_threaded_code to rb_iseq_original_iseq,
cache new iseq->iseq_original field.

  • compile.c (rb_iseq_translate_threaded_code): modify in-place w/o copy
    (rb_vm_addr2insn): new function for debug
    (rb_iseq_original_iseq): ditto
    (iseq_set_sequence): assign iseq_encoded directly
    [Feature #10185]

  • vm_core (rb_iseq_t): move original ->iseq to bottom

  • iseq.c (iseq_free, iseq_free): adjust for new layout
    (rb_iseq_disasm): use original iseq for dump
    (iseq_data_to_ary): ditto
    (rb_iseq_line_trace_each): ditto
    (rb_iseq_build_for_ruby2cext): use iseq_encoded directly

  • vm_dump.c (rb_vmdebug_debug_print_pre): use original iseq

History

#1 Updated by Eric Wong 9 months ago

Patch #2:

Subject: [PATCH] iseq (rb_iseq_t): remove iseq->iseq field

It is unnecessary and reduces transient malloc/free during compile.

For "ruby -e exit", valgrind reports over 300K reduction in
overall allocations.

before:
total heap usage: 49,622 allocs, 20,492 frees, 8,697,493 bytes allocated

after:
total heap usage: 48,935 allocs, 19,805 frees, 8,373,773 bytes allocated

#2 Updated by Eric Wong 9 months ago

https://bugs.ruby-lang.org/issues/10185#change-48562

---Files--------------------------------
iseq-tc-diet.patch (4.43 KB)
iseq-iseq-diet.patch (9.54 KB)

Ping? I hope to commit these two soon.

#3 Updated by Koichi Sasada 9 months ago

Thank you for waiting my comment.

I'm thinking about compatibility and expandability.

Now, in Ruby source, iseq->iseq is used only for disasm. So that your patch works completely.
However we have several compilers (iseq -> C) to use iseq->iseq directly.

Of course, most of people don't use this compiler.
So your proopsal is reasonable.

On the other hand, the name `iseq_encoded' is not only for direct threaded code. We can make another encoding scheme.


Summary: my proposal is:

  • remain iseq->iseq (NULL as default)
  • rename rb_iseq_untranslate_threaded_code to rb_iseq_decode_encoded_iseq() and fill iseq->iseq if iseq->iseq is not NULL.
  • rb_iseq_decode_encoded_iseq() can call rb_iseq_untranslate_threaded_code() (one of decode scheme).
  • expose (not in ruby/ but internal.h and expose symbol) rb_iseq_decode_encoded_iseq() for maniac users.

#4 Updated by Koichi Sasada 9 months ago

Koichi Sasada wrote:

  • rename rb_iseq_untranslate_threaded_code to rb_iseq_decode_encoded_iseq() and fill iseq->iseq if iseq->iseq is not NULL.

decode is also strange.

const VALUE *rb_iseq_orignal_iseq() ?

#5 Updated by Eric Wong 9 months ago

This combines both patches to do translation in-place.

I couldn't make rb_iseq_original_iseq return const because of
rb_iseq_line_trace_each, but it now caches iseq->iseq field
(at the bottom of rb_iseq_t, away from hotter fields).

Also, I hope to make compile_data and iseq->iseq fields a union,
so it might be iseq->u1.compile_dat/iseq->u1.iseq in the future.

#6 Updated by Anonymous 9 months ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

Applied in changeset r47508.


compile: translate iseq in-place

running "ruby -rpp -e 'pp GC.stat'", a reduction in
malloc usage is shown:

before:
:malloc_increase=>118784,
:oldmalloc_increase=>1178736,

after:
:malloc_increase=>99832,
:oldmalloc_increase=>1031976,

For "ruby -e exit", valgrind reports over 300K reduction in
overall allocations (and unnecessary memory copies).

before:
total heap usage: 49,622 allocs, 20,492 frees, 8,697,493 bytes allocated
after:
total heap usage: 48,935 allocs, 19,805 frees, 8,373,773 bytes allocated

(numbers from x86-64)

v2 changes based on ko1 recommendations :
- squashed in-place direct thread translation to avoid alloc+copy
- renamed rb_iseq_untranslate_threaded_code to rb_iseq_original_iseq,
cache new iseq->iseq_original field.

  • compile.c (rb_iseq_translate_threaded_code): modify in-place w/o copy
    (rb_vm_addr2insn): new function for debug
    (rb_iseq_original_iseq): ditto
    (iseq_set_sequence): assign iseq_encoded directly
    [Feature #10185]

  • vm_core (rb_iseq_t): move original ->iseq to bottom

  • iseq.c (iseq_free, iseq_free): adjust for new layout
    (rb_iseq_disasm): use original iseq for dump
    (iseq_data_to_ary): ditto
    (rb_iseq_line_trace_each): ditto
    (rb_iseq_build_for_ruby2cext): use iseq_encoded directly

  • vm_dump.c (rb_vmdebug_debug_print_pre): use original iseq

Also available in: Atom PDF