Project

General

Profile

Actions

Bug #19857

closed

Eval coverage is reset after each `eval`.

Added by ioquatix (Samuel Williams) over 1 year ago. Updated 6 months ago.

Status:
Rejected
Target version:
-
ruby -v:
Backport:
[ruby-core:114590]

Description

It seems like eval based coverage is reset every time eval is invoked.

#!/usr/bin/env ruby

require 'coverage'

def measure(flag)
  c = Class.new
  c.class_eval(<<~RUBY, "foo.rb", 1)
    def foo(flag)
      if flag
        puts "foo"
      else
        puts "bar"
      end
    end
  RUBY
  
  return c.new.foo(flag)
end

Coverage.start(lines: true, eval: true)

# Depending on the order of these two operations, different coverage is calculated, because the evaluation of the code is considered different, even if the content/path is the same.
measure(false)
measure(true)

p Coverage.result

Further investigation is required.

Updated by ioquatix (Samuel Williams) over 1 year ago

Example output:

> ./test.rb
foo
bar
{"foo.rb"=>{:lines=>[1, 1, 0, nil, 1, nil, nil]}}

(swap lines)
> ./test.rb
bar
foo
{"foo.rb"=>{:lines=>[1, 1, 1, nil, 0, nil, nil]}}

Updated by mame (Yusuke Endoh) over 1 year ago

Currently only the coverage of the last call to Kernel#eval is kept. This seems somewhat reasonable to me because:

  • I don't think calling eval on the same filename has such an important use case.
  • If diffrent code is eval'd with the same filename, you get broken coverage results.

Updated by ioquatix (Samuel Williams) over 1 year ago

We found it was discussed here:

https://bugs.ruby-lang.org/issues/19008#note-5

I'm a little embarrassed I didn't consider or fix it at the time. Anyway...

Here is the proposed fix: https://github.com/ruby/ruby/pull/8330

Specifically, when an iseq is compiled code, it resets all the coverage to zero for lines that are executable. Instead of resetting to zero, we should leave it as is if it already contains coverage information.

Updated by ioquatix (Samuel Williams) over 1 year ago

  • Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED

Currently only the coverage of the last call to Kernel#eval is kept.

This is not true. Coverage is stored in a global map of path -> coverage. Every attempt is made to use the same coverage for a given path. If you load or eval the same path multiple times, the same coverage will be reused. The bug is because iseq_set_sequence sets the coverage to zero for executable lines, for the lines it covers. So, it will:

  1. Load the current shared coverage for a given path.
  2. Based on the eval line number, update that coverage.
  3. For each executable instruction, assign 0 to the coverage[instruction_line + eval_line_number].

So, it won't keep the last result, it just resets some or all of the coverage, depending on how the eval call is invoked. In theory, the same problem can apply to require and load, but would be, I expect, a little more involved to reproduce.

Actions #5

Updated by ioquatix (Samuel Williams) over 1 year ago

  • Description updated (diff)

Updated by mame (Yusuke Endoh) over 1 year ago

  • Status changed from Closed to Open

Hey, I am the maintainer of the coverage library. Please don't merge it until I say it's okay.

I'm really feeling bad smell about merging coverage results automatically. Please give me time to understand what the current situation is and to consider whether it is really OK to merge the results. So, please let me revert the commit once.

Updated by mame (Yusuke Endoh) over 1 year ago

@ioquatix (Samuel Williams) When creating a ticket, could you write up a use case?
Even if you believe it is a bug, it is not necessarily so to others.

You only showed an example of evaling a method definition many times.
As far as that case is concerned, I don't think that it is so important and frequent that we need to care in the coverage library.
Please explain how the current behavior is troubling in a real-world example.

Updated by ioquatix (Samuel Williams) over 1 year ago

Bake uses module_eval to load source files:

https://github.com/ioquatix/bake/blob/c6b0d313a216077ceff460e2e90324206f3c243f/lib/bake/scope.rb#L19

If you load the same file several times during testing, coverage of previous tests is lost/cleared.

Any code which uses a variant of this - e.g. rack https://github.com/rack/rack/blob/64610db2d2e7e910c4d3e17874dc316ed3eb227a/lib/rack/builder.rb#L6 can suffer the same problem. It's common to load config.ru fresh each time you run a test, for example.

I don't know if this shows up in the real world but I imagine it can: any time you use ERB to load the same template in several different tests.

This is clearly a bug that should be fixed. Do you see some problems introduced by my proposed fix?

Updated by ioquatix (Samuel Williams) over 1 year ago

Here is an example of the ERB test issue:

https://github.com/ioquatix/covered/blob/main/examples/erb/test.rb

I could imagine this could show up in any code which reloads ERB templates between tests.

Updated by mame (Yusuke Endoh) over 1 year ago

Thanks. In your case of ERB.new, the test function does not actually use its arguments. Consider a more realistic example:

def test(value = nil)
  template = ERB.new(File.read(path).gsub("VALUE") { value || "default" })
  template.result(binding)
end

In this case, if we ever pass a multi-line string to test, automatic merging of the coverage results would bring trouble. I think it is safer to take only the coverage of the first or last version of code when trying to compile with the same pathname, instead of automatically merging them.

Updated by ioquatix (Samuel Williams) over 1 year ago

Thanks. In your case of ERB.new, the test function does not actually use its arguments.

I'm not sure what you mean. Depending on the argument, different coverage is generated. Can you explain why you think "the test function does not actually use its arguments"?

In this case, if we ever pass a multi-line string to test, automatic merging of the coverage results would bring trouble.

ERB.new(File.read(path).gsub("VALUE") { value || "default" })

I understand your concern, and I have never seen such a code in the real world. Can you share the real world risk?

Updated by ioquatix (Samuel Williams) over 1 year ago

By the way, did you realise, eval coverage is already merged, if it's a subset of the eval'ed code? So the behaviour is only broken if the eval'ed segments overlap, in which case the behaviour is, frankly, unexpected and pretty weird.

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

Merging coverage results for eval would be inconsistent with how coverage for load is handled.

example code: t.rb:

if $a
  p 1
else
  p 2
end

Print coverage result:

# Loaded Once
$ ruby -r coverage -e 'Coverage.start; $a = true; load "t.rb"; p Coverage.result' 
1
{"t.rb"=>[1, 1, nil, 0, nil]}

# Loaded Twice, not merged
$ ruby -r coverage -e 'Coverage.start; $a = true; load "t.rb"; $a = false; load "t.rb"; p Coverage.result' 
1
2
{"t.rb"=>[1, 0, nil, 1, nil]}

I agree with @mame (Yusuke Endoh) that if there is no way to ensure that the same code is evaluated in each evaluation, it is wrong to merge the results.

I can understand why real world code would evaluate multiple times for the same file and line:

[Array, Hash].each do |klass|
  klass.class_eval(RUBY, __FILE__, __LINE__+1)
    if self == Hash
      def foo; :bar end
    else
      def foo; :baz end
    end
RUBY
end

This is a trivial example, but at least shows why you would want to do it. In this example, it would be easy to separate the example code per class, but that is probably non-trivial and a bad idea for more complex examples.

I'm not necessarily opposed to merging results, but as it is not possible to ensure result merging makes sense, it should be an option and not the default behavior (assuming that @mame (Yusuke Endoh) is open to supporting it as an option).

Updated by ioquatix (Samuel Williams) over 1 year ago

@jeremyevans0 (Jeremy Evans) I'm not proposing any change to load, only to eval and related methods. However, I actually believe your example is equally confusing because the coverage is not merged. If someone loads the same file several times, it's reasonable to expect coverage to be merged (and probably unexpected that it's not).

I agree with @mame (Yusuke Endoh) (Yusuke Endoh) that if there is no way to ensure that the same code is evaluated in each evaluation, it is wrong to merge the results.

We have no way to disambiguate coverage, since by its nature, coverage refers to files and line offsets. In practice, After the program executes, coverage is displayed by loading those files and displaying information to the user. If those file paths and line numbers are not consistent, coverage will become meaningless. So if the user does not desire coverage, they should simply avoid setting the path and line number. This kind of consistency is also important for the Ruby debugger, so this is not something specific to coverage.

I'm not necessarily opposed to merging results, but as it is not possible to ensure result merging makes sense, it should be an option

This is already the case: it already has an option, Coverage.start(eval: true).

When you select this mode, coverage is already merged. If you have several evals for the same file, they all get merged into one coverage report.

What this bug is trying to address, is the fact that when those eval ranges overlap, they get reset. In every case I can think of, this is undesirable and basically not reporting coverage correctly, as lines which are previously executed, with coverage, get reset to 0.

Updated by mame (Yusuke Endoh) over 1 year ago

By the way, did you realise, eval coverage is already merged, if it's a subset of the eval'ed code?

Yes, I think that that is bad. I think I missed it because I didn't review https://github.com/ruby/ruby/pull/6396 enough. I want to limit it even from now.

I think "coverage for eval" feature was a good idea for measuring the coverage of Rails view code, but I would like to design it to the minimum necessary for that purpose. Currently, I think it is too flexible than I expected.

The design I am currently considering is as follows.

  • If there are loaded and eval'ed codes with the same path, only the loaded code is measured
  • If there are multiple codes eval'd with the same path, the first (or the last) eval'ed code is measured

I need to take time to understand and organize the current situation and carefully consider how it should be designed. However, I am not likely to be able to spend sufficient time to the coverage library for a while, so please wait.

BTW, I'm not keen on having Coverage.start have a new keyword argument for this purpose. Coverage library already has many different modes. I don't want to complicate things further. (This is mostly my fault, of course.)

Updated by ioquatix (Samuel Williams) over 1 year ago

I think "coverage for eval" feature was a good idea for measuring the coverage of Rails view code, but I would like to design it to the minimum necessary for that purpose. Currently, I think it is too flexible than I expected.

Actually, this bug report affects that use case too - if the same view is loaded multiple times (as demonstrated) in different tests, the coverage will not be reported correctly.

I want to limit it even from now... If there are loaded and eval'ed codes with the same path, only the loaded code is measured

Is there some reason why you think we shouldn't measure the complete coverage of a file, including evaluated code?

The point of code coverage is to provide feedback on whether every line of code is exercised by tests.

Your definition of coverage does not seem to include "all lines of code executed". Why?

Updated by mame (Yusuke Endoh) 11 months ago

  • Status changed from Open to Rejected

Sorry, but I am not keen on this proposal. Let me reject this once.

When I started developing the covearge library, I decided that its target is "source files that actually exist".
The measured coverage is visually observed by a human, which requires source code, and Ruby interpreter did not keep the eval'ed source code (at that time). I think that the design decision to target only source files was natural.

Because the Rails view code (.erb) exists as a file, I thought that it is the target of covearge library. So I accepted #19008.

Beyond that, measuring coverage on dynamically generated code conflicts with the coverage library's initial design decision.

To be honest, I don't want to further complicate the coverage library. Frankly, its API is already very complicated. I no longer want to expand it beyond the initial design decision.


One constructive note.
When I first started implementing the coverage library, I had no choice but to implement it as a built-in, but now we have TracePoint, which is well-designed and general-purpose than the coverage library.
Also, Prism API is being developed, and it may be possible (although I think TracePoint needs to be extended) to radically improve the coverage measurement on a column-by-column basis rather than line-by-line basis.
I don't have time right now, but I would like to rebuild the library as a pure external gem in future, i.e. coverage2 or something, with dynamically generated code, etc. in mind from the beginning.

Updated by ioquatix (Samuel Williams) 11 months ago

When I started developing the covearge library, I decided that its target is "source files that actually exist".

I completely agree with this and I don't think that statement has any bearing on this proposal.

The measured coverage is visually observed by a human, which requires source code, and Ruby interpreter did not keep the eval'ed source code (at that time). I think that the design decision to target only source files was natural.

Makes sense.

Because the Rails view code (.erb) exists as a file, I thought that it is the target of covearge library. So I accepted #19008.

That's reasonable.

Beyond that, measuring coverage on dynamically generated code conflicts with the coverage library's initial design decision.

That's not what this PR is doing?

To be honest, I don't want to further complicate the coverage library. Frankly, its API is already very complicated. I no longer want to expand it beyond the initial design decision.

This PR addresses a real world problem I ran into.

I accept the example given isn't clear. However, I'd like to discuss this at developer meeting.

Actions #21

Updated by ioquatix (Samuel Williams) 6 months ago

  • Backport deleted (3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED)
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0