Project

General

Profile

Bug #14464

MJIT & MinGW / gcc 7.3.0 seemed ok as of 62337, fail or skip after

Added by MSP-Greg (Greg L) 6 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
ruby 2.6.0dev (2018-05-23 trunk 63492) [x64-mingw32]
[ruby-core:85500]

Description

First of all, a thank you to those working on MJIT.

At least three builds of ruby-loco MinGW passed the MJIT tests (62327, 62331, 62337), but after that, the tests have either failed or skipped. First fail was at 62341.

The most recent build (2018-02-11 trunk 62371), skipped with no timeout error in jit_supported? I haven't looked at patching test_jit.rb to see if I can get more info.

I don't know if this is a MinGW issue or a gcc 7.3.0 issue, but, given that it did work for a few builds, I would appreciate it if someone could look into it. Anything I can help with, I'm happy to.

Thanks, Greg

TestJIT_info_62380.txt (33.4 KB) TestJIT_info_62380.txt MSP-Greg (Greg L), 02/12/2018 05:12 AM
MJIT-MinGW-63333.txt (28.7 KB) MJIT-MinGW-63333.txt test_jit.rb results w/63333 MSP-Greg (Greg L), 05/03/2018 08:57 PM

History

#1 [ruby-core:85504] Updated by k0kubun (Takashi Kokubun) 6 months ago

It seems that all JIT compilations on MinGW started to fail from r62340, and I fixed it on r62376. Now the tests won't be skipped on MinGW builds. For now, I recommend to specify environment variable RUBY_FORCE_TEST_JIT=1 when running test-all.

After r62376, JIT infrastructure seems working. But some tests in test_jit.rb is failing. The test failure seems to be caused by pointing to invalid address in generated code. As Linux builds with gcc 7.2 are perfectly working, probably it's caused by difference between size of variable types and wrong casting. This was NOT working as of r62337 too. We need deeper investigation for it...

#2 [ruby-core:85505] Updated by MSP-Greg (Greg L) 6 months ago

k0kubun,

Thank you for the response. A few notes:

The test-all log for 62337 showed the two MJIT tests passing, but subsequent builds skipped (or failed with RUBY_FORCE_TEST_JIT).

In the 62375 build (after my first post), both tests skipped.

RubyCI.org is rather 'tablet unfriendly', which biases me away from all use (gotta get over that), but I'll keep an eye on the Linux/gcc 7.2+ builds for the tests results.

Since ruby-loco is used for CI, I won't force fails with RUBY_FORCE_TEST_JIT, so I'll keep checking the logs...

We need deeper investigation for it...

As stated prev, not being a c type, I'm not much help. If there is anything I can add to the build to help identify the issue, I'll be happy to.

NOTE: I just saw 62376 (thanks), the next scheduled build is done at Noon JST. I'll post back when done...

Thanks again, Greg

#3 [ruby-core:85506] Updated by MSP-Greg (Greg L) 6 months ago

k0kubun (Takashi Kokubun),

Appveyor run of 62377 had the following (added backticks for web view):

Retrying...
[1/2]      8 TestJIT#test_compile_insns = 11.33 s = F
[2/2]      7 TestJIT#test_jit_output = 5.58 s = .

  1) Failure:
TestJIT#test_compile_insns [C:/projects/ruby-loco/src/ruby/test/ruby/test_jit.rb:30]:
Failed to run script with JIT:
'``
def foo(&b)
  a = b
  b = 2
  a.call + 2
end

print foo { 1 }

'``


stdout:
'``

'``


stderr:
'``
JIT success (1318.5ms): foo@-e:1 -> C:/Users/appveyor/AppData/Local/temp/_ruby_mjit_p12476u0.c
-e:2:in `foo': wrong argument type Binding (expected Class) (TypeError)
    from -e:7:in `<main>'
Successful MJIT finish

'``

.
<true> expected but was
<false>.

I ran the test locally, and I'm wondering what is using the /Users/user name/AppData/Local/temp folder, as on my system, all TEMP/TMP env variables are set to different folders. I've built ruby for quite a while, and also MSYS2 packages, and I don't ever recall anything using that. For many windows users, their user name may have a space (as mine does). That hasn't been an issue with config files, --user-install gems, etc.

But, when I ran the tests, it mangles the path...

The test-all summary for TestJIT is attached for 62380. I believe there are 9 failures & 5 skips in 64 tests?

Thanks again,

Greg

#4 [ruby-core:85718] Updated by hsbt (Hiroshi SHIBATA) 6 months ago

  • Assignee set to k0kubun (Takashi Kokubun)
  • Status changed from Open to Assigned

#5 [ruby-core:86582] Updated by k0kubun (Takashi Kokubun) 4 months ago

I'm wondering what is using the /Users/user name/AppData/Local/temp folder, as on my system, all TEMP/TMP env variables are set to different folders.

Now MJIT seems to use the result of rb_w32_system_tmpdir(), not just $TEMP or $TMP.

If there is anything I can add to the build to help identify the issue, I'll be happy to.

Current status of investigation for this ticket: I've investigated why #test_compile_insn_intern_duparray in test_jit.rb fails. I minimized the reproductive code to be ruby --disable-gems -e 'p proc { 0.to_s }.call' --jit-wait --jit-min-calls=1.

As far as I've debugged with gdb, the code crashes because rb_class_of(FIXNUM(0)), which is called by vm_search_method in _mjit0, returns the value of &rb_cInteger instead of rb_cInteger, and then Module#to_s is dispatched for 0.to_s. As 0 is not Module, it crashes. The rb_class_of code in MJIT header looks to properly return rb_cInteger. So I have no idea why it returns &rb_cInteger.

For those who help me for this issue, it would be helpful to confirm that my above understanding is correct or comment some wrong assumption in it. Thanks.

#6 [ruby-core:86869] Updated by MSP-Greg (Greg L) 3 months ago

k0kubun,

Thanks for looking into this. The motherboard on my home desktop stopped working, so I've been without a good dev system. The temp folder issue seems corrected; I looked thru the c code, and added TEMPDIR, which I hadn't used before.

It may not be helpful, but I've attached a log of 63333 and test_jit.rb. It's at 9 failures...

Thanks, Greg

#7 [ruby-core:87227] Updated by MSP-Greg (Greg L) 3 months ago

  • ruby -v changed from ruby 2.6.0dev (2018-02-11 trunk 62371) [x64-mingw32] to ruby 2.6.0dev (2018-05-23 trunk 63492) [x64-mingw32]

k0kubun (Takashi Kokubun),

Just ran with 63492, all passed with 5 skips, partial log:

  1) Skipped:
TestJIT#test_compile_insn_getblockparamproxy [C:/Greg/GitHub/ruby/test/ruby/test_jit.rb:89]:
support this in mjit_compile

  2) Skipped:
TestJIT#test_compile_insn_opt_call_c_function [C:/Greg/GitHub/ruby/test/ruby/test_jit.rb:523]:
support this in opt_call_c_function (low priority)

  3) Skipped:
TestJIT#test_compile_insn_reput [C:/Greg/GitHub/ruby/test/ruby/test_jit.rb:246]:
write test

  4) Skipped:
TestJIT#test_compile_insn_tracecoverage [C:/Greg/GitHub/ruby/test/ruby/test_jit.rb:277]:
write test

  5) Skipped:
TestJIT#test_compile_insn_defineclass [C:/Greg/GitHub/ruby/test/ruby/test_jit.rb:281]:
support this in mjit_compile (low priority)

Finished tests in 226.166094s, 0.3228 tests/s, 1.9808 assertions/s.
73 tests, 448 assertions, 0 failures, 0 errors, 5 skips

ruby -v: ruby 2.6.0dev (2018-05-23 trunk 63492) [x64-mingw32]

Thanks for your work on this, Greg

#8 [ruby-core:87239] Updated by k0kubun (Takashi Kokubun) 3 months ago

  • Status changed from Assigned to Closed

Oh, that's very good to know. TBH I don't know which revision fixed the problem, but I'm very happy to know the tests can succeed on the platform. Thank you to report this.

#9 [ruby-core:87243] Updated by MSP-Greg (Greg L) 3 months ago

k0kubun (Takashi Kokubun),

I narrowed it down using ruby-loco builds:

All ruby 2.6.0dev
(2018-05-21 trunk 63475) [x64-mingw32]  73 tests, 429 assertions, 10 failures, 0 errors, 5 skips
(2018-05-21 trunk 63480) [x64-mingw32]  73 tests, 448 assertions,  0 failures, 0 errors, 5 skips

All failing builds left artifacts in TEMPDIR, but 63480, 63492, and 63496 left it clean.

Thanks again, Greg

#10 [ruby-core:87263] Updated by k0kubun (Takashi Kokubun) 3 months ago

Thanks for letting me know that. I also confirmed r63479 doesn't pass the tests but r63480 does. So somehow r63480 fixed the issue.

#11 [ruby-core:87265] Updated by MSP-Greg (Greg L) 3 months ago

k0kubun (Takashi Kokubun),

Thanks for letting me know that

Not being a c type, it's the least I can do. I often feel like a dumb rock, as I can identify issues, but I can't fix them...

FYI, as of:

ruby 2.6.0dev (2018-05-27 trunk 63508) [x64-mingw32]

I re-enabled MJIT in ruby-loco, and the build passed. I've got quite a bit of logging for it, so -

When parallel testing ran, TestJIT#test_compile_insn_opt_aset failed at (I believe) the second assert. It passed three asserts on the retry.

Thanks again for your work on JIT, Greg

#12 [ruby-core:87525] Updated by MSP-Greg (Greg L) about 2 months ago

k0kubun,

First of all, thanks for all your work on MJIT. Lately, I've been testing a few extension gems, and I just decided to try JIT on EventMachine, which normally passes. Got thru quite a bit of it, but then a SEGV.

  1. Would you like a log, etc?

  2. If so, here, or open a new issue?

Re Windows, many gems are not testing with new Ruby versions, and they're also behind in releasing fat-binary gems for new Ruby versions. I've been involved in the first issue, now I'm moving to the second.

While working to help build Windows fat-binary gems on Appveyor, I realized that most of the work needed to build fat binaries is the same as the work needed to test on Appveyor.

Hence, to repo these issues, a fat-binary will be available...

Thanks, Greg

#13 [ruby-core:87531] Updated by k0kubun (Takashi Kokubun) about 2 months ago

New issue would be helpful since it seems to be a different problem. And if possible, please minimize the reproductive code. A bug report with reproductive steps using EventMachine might help, but it would be more helpful if you were able to omit EventMachine code to reproduce the SEGV and used very few lines of code.

Also available in: Atom PDF