Project

General

Profile

Bug #16819

Line reporting off by one when reporting line of a hash?

Added by enebo (Thomas Enebo) 3 months ago. Updated about 2 months ago.

Status:
Assigned
Priority:
Normal
Target version:
-
[ruby-core:98075]

Description

If I run this program:

TracePoint.new(:line) { |t| p t.lineno}.enable
def foo(a, b) # 2
  a + b       # 3
end           # 4
              # 5
foo 1, 2      # 6
              # 7
A = {         # 8
  a: 1,       # 9
  b: 2        # 10
}             # 11

I see:

system ~/work/jruby no_sourceposition * 2388% mri26 ../snippets/ast1.rb 
2
6
3
9

I believe this 9 should be an 8 (it is what we currently emit for JRuby). I tried to figure out why this is the case and I patched RubyVM::AbstractSyntaxTree with the ability to note newline flag:

diff --git a/ast.c b/ast.c
index f0e8dd2eaf..df58006a96 100644
--- a/ast.c
+++ b/ast.c
@@ -7,6 +7,8 @@
 #include "vm_core.h"
 #include "iseq.h"

+#define RBOOL(v) ((v) ? Qtrue : Qfalse)
+
 static VALUE rb_mAST;
 static VALUE rb_cNode;

@@ -731,6 +733,16 @@ rb_ast_node_inspect(VALUE self)
     return str;
 }

+static VALUE
+rb_ast_node_newline(VALUE self)
+{
+    struct ASTNodeData *data;
+    TypedData_Get_Struct(self, struct ASTNodeData, &rb_node_type, data);
+
+    return RBOOL(data->node->flags & NODE_FL_NEWLINE);
+}
+
+
 void
 Init_ast(void)
 {
@@ -756,5 +768,6 @@ Init_ast(void)
     rb_define_method(rb_cNode, "last_lineno", rb_ast_node_last_lineno, 0);
     rb_define_method(rb_cNode, "last_column", rb_ast_node_last_column, 0);
     rb_define_method(rb_cNode, "children", rb_ast_node_children, 0);
+    rb_define_method(rb_cNode, "newline?", rb_ast_node_newline, 0);
     rb_define_method(rb_cNode, "inspect", rb_ast_node_inspect, 0);
 }

I also made a simple script:

source = File.read ARGV.shift
root = RubyVM::AbstractSyntaxTree.parse source

def print_node(node, indent = "")
  if node.respond_to? :first_lineno
    eol = node.newline? ? " <-- newline" : ""
    $stdout.write "#{indent}(#{node.type}@#{node.first_lineno-1}-#{node.last_lineno-1})"
    case node.type
    when :LIT, :STR
      puts " = #{node.children[0].inspect}#{eol}"
    when :ARRAY
      puts eol
      node.children[0..-2].each do |child|
        print_node(child, indent + "  ")
      end
    when :FCALL
      puts " = #{node.children[0]}#{eol}"
      node.children[1..-1].each do |child|
        print_node(child, indent + "  ")
      end
    else
      puts eol
      node.children.each do |child|
        print_node(child, indent + "  ")
      end
    end
  elsif node.nil?
    puts "#{indent}nil"
  else
    puts "#{indent}#{node.inspect}"
  end
end

print_node root


puts RubyVM::InstructionSequence.compile(source).disasm

If I run this I see MRI has line 8 marked as the newline (which JRuby also matches) but if we look at the disasm it would appear compile.c decided to put the line in a different location:

../ruby/ruby --disable-gems ../snippets/ast_mri.rb ../snippets/ast1.rb 
(SCOPE@0-10)
  []
  nil
  (BLOCK@0-10)
    (CALL@0-0) <-- newline
      (ITER@0-0)
        (CALL@0-0)
          (CONST@0-0)
            :TracePoint
          :new
          (ARRAY@0-0)
            (LIT@0-0) = :line
        (SCOPE@0-0)
          [:t]
          (ARGS@0-0)
            1
            nil
            nil
            nil
            0
            nil
            nil
            nil
            nil
            nil
          (FCALL@0-0) = p <-- newline
            (ARRAY@0-0)
              (CALL@0-0)
                (DVAR@0-0)
                  :t
                :lineno
                nil
      :enable
      nil
    (DEFN@1-3) <-- newline
      :foo
      (SCOPE@1-3)
        [:a, :b]
        (ARGS@1-1)
          2
          nil
          nil
          nil
          0
          nil
          nil
          nil
          nil
          nil
        (OPCALL@2-2) <-- newline
          (LVAR@2-2)
            :a
          :+
          (ARRAY@2-2)
            (LVAR@2-2)
              :b
    (FCALL@5-5) = foo <-- newline
      (ARRAY@5-5)
        (LIT@5-5) = 1
        (LIT@5-5) = 2
    (CDECL@7-10) <-- newline
      :A
      (HASH@7-10)
        (ARRAY@8-9)
          (LIT@8-8) = :a
          (LIT@8-8) = 1
          (LIT@9-9) = :b
          (LIT@9-9) = 2
== disasm: #<ISeq:<compiled>@<compiled>:1 (1,0)-(11,18)> (catch: FALSE)
== catch table
| catch type: break  st: 0000 ed: 0013 sp: 0000 cont: 0013
| == disasm: #<ISeq:block in <compiled>@<compiled>:1 (1,22)-(1,39)> (catch: FALSE)
| == catch table
| | catch type: redo   st: 0001 ed: 0010 sp: 0000 cont: 0001
| | catch type: next   st: 0001 ed: 0010 sp: 0000 cont: 0010
| |------------------------------------------------------------------------
| local table (size: 1, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
| [ 1] t@0<Arg>
| 0000 nop                                                              (   1)[Bc]
| 0001 putself                      [Li]
| 0002 getlocal_WC_0                t@0
| 0004 opt_send_without_block       <callinfo!mid:lineno, argc:0, ARGS_SIMPLE>, <callcache>
| 0007 opt_send_without_block       <callinfo!mid:p, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
| 0010 nop
| 0011 leave                                                            (   1)[Br]
|------------------------------------------------------------------------
0000 opt_getinlinecache           7, <is:0>                           (   1)[Li]
0003 getconstant                  :TracePoint
0005 opt_setinlinecache           <is:0>
0007 putobject                    :line
0009 send                         <callinfo!mid:new, argc:1>, <callcache>, block in <compiled>
0013 nop
0014 opt_send_without_block       <callinfo!mid:enable, argc:0, ARGS_SIMPLE>, <callcache>(   1)
0017 pop
0018 putspecialobject             1                                   (   2)[Li]
0020 putobject                    :foo
0022 putiseq                      foo
0024 opt_send_without_block       <callinfo!mid:core#define_method, argc:2, ARGS_SIMPLE>, <callcache>
0027 pop
0028 putself                                                          (   6)[Li]
0029 putobject_INT2FIX_1_
0030 putobject                    2
0032 opt_send_without_block       <callinfo!mid:foo, argc:2, FCALL|ARGS_SIMPLE>, <callcache>
0035 pop
0036 duphash                      {:a=>1, :b=>2}                      (   9)[Li]
0038 dup                                                              (   8)
0039 putspecialobject             3
0041 setconstant                  :A
0043 leave

== disasm: #<ISeq:foo@<compiled>:2 (2,0)-(4,3)> (catch: FALSE)
local table (size: 2, argc: 2 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] a@0<Arg>   [ 1] b@1<Arg>
0000 getlocal_WC_0                a@0                                 (   3)[LiCa]
0002 getlocal_WC_0                b@1
0004 opt_plus                     <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache>
0007 leave                                                            (   4)[Re]

Updated by ko1 (Koichi Sasada) 3 months ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Open to Assigned

Yes, it is intentional behavior because the first instruction of the A = { ... } expression is line 9 (a: 1).
Each line number attribute is attached to the instruction and there is no instruction at the line 8 for the performance, from Ruby 2.5 [Feature #14104].

I agree it is not good result.
Before Ruby 2.5, we inserted trace instruction and this instruction has "line 8" attribute just before line 9.
However, the trace instructions are almost nop and we removed it.

I wonder there is a good way to solve it.

Updated by enebo (Thomas Enebo) about 2 months ago

"Before Ruby 2.5, we inserted trace instruction and this instruction has "line 8" attribute just before line 9.
However, the trace instructions are almost nop and we removed it."

You mean that because you had two line instrs in a row with nothing between them you ended up eliminating it as unneeded work. We actually do a similar thing but I guess we somehow put an instr between line 8 and line 9 instr. Putting an instr between the two line instr which will be noop but somehow defeat that optimization maybe?

Updated by ko1 (Koichi Sasada) about 2 months ago

You mean that because you had two line instrs in a row with nothing between them you ended up eliminating it as unneeded work. We actually do a similar thing but I guess we somehow put an instr between line 8 and line 9 instr. Putting an instr between the two line instr which will be noop but somehow defeat that optimization maybe?

We changed the strategy: instead of inserting trace instructions, but replace instructions with trace- prefix instructions if needed. Inserting nop (as old trace instruction) is one idea (which will be trace_nop insn if needed).

Also available in: Atom PDF