Project

General

Profile

Bug #15316

rb_postponed_job_register not thread-safe

Added by normalperson (Eric Wong) 12 months ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:89848]

Description

Consider following execution timeline for two threads, t1 and t2.
(should be 2 columns, I only have fixed-width fonts for display)

t1                                      t2 (mjit-worker)
-----------------------------------------------------------------------------

 vm->postponed_job_index increment #1
 pjob[0]->func = ...
 pjob[0]->data = ...
                                        vm->postponed_job_index increment #2

 RUBY_VM_SET_POSTPONED_JOB_INTERRUPT

 rb_postponed_job_flush
 sees result of increment #2 from t2
 tries to access pjob[1] => CRASH
                                       pjob[1]->func = ...
                                       pjob[1]->data = ...

                                       RUBY_VM_SET_POSTPONED_JOB_INTERRUPT

So it looks like we need a THREAD-SAFE (not necessarily async-safe)
version of rb_postponed_job_register. Or a one-off API for MJIT...
job registration for MJIT.


Related issues

Related to Ruby master - Bug #15320: IO.popen with MJIT worker thread may deadlockClosedActions

Associated revisions

Revision 67485fee
Added by k0kubun (Takashi Kokubun) 12 months ago

vm_trace.c: MJIT-limited thread-safety for postponed_job

[Bug #15316]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66001 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 66001
Added by k0kubun (Takashi Kokubun) 12 months ago

vm_trace.c: MJIT-limited thread-safety for postponed_job

[Bug #15316]

Revision 66001
Added by k0kubun (Takashi Kokubun) 12 months ago

vm_trace.c: MJIT-limited thread-safety for postponed_job

[Bug #15316]

Revision eb38fb67
Added by normal 12 months ago

vm_trace.c: workqueue as thread-safe version of postponed_job

postponed_job is safe to use in signal handlers, but is not
thread-safe for MJIT. Implement a workqueue for MJIT
thread-safety.

[Bug #15316]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66100 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 66100
Added by normalperson (Eric Wong) 12 months ago

vm_trace.c: workqueue as thread-safe version of postponed_job

postponed_job is safe to use in signal handlers, but is not
thread-safe for MJIT. Implement a workqueue for MJIT
thread-safety.

[Bug #15316]

Revision 66100
Added by normal 12 months ago

vm_trace.c: workqueue as thread-safe version of postponed_job

postponed_job is safe to use in signal handlers, but is not
thread-safe for MJIT. Implement a workqueue for MJIT
thread-safety.

[Bug #15316]

History

Updated by normalperson (Eric Wong) 12 months ago

https://bugs.ruby-lang.org/issues/15316

So it looks like we need a THREAD-SAFE (not necessarily async-safe)
version of rb_postponed_job_register. Or a one-off API for MJIT...
job registration for MJIT.

I think I will introduce a new "workqueue" API dedicated to
thread-safety (but not async-signal-safety). We can use the
same interrupt flag, but it will use a separate data structure.

My goal is also to support MJIT in forked processes,
because (AFAIK) some production Ruby instances use fork
with long-running processes.

Updated by k0kubun (Takashi Kokubun) 12 months ago

Thanks to report that.

My goal is also to support MJIT in forked processes

Oh by the way, I have a patch which was created yesterday but I was too busy to commit that yet. (I'll commit that shortly.)

I think thread-safety of postponed_job and supporting MJIT in forked Ruby process are independent issues, but maybe your "workqueue" will be helpful for forked child somehow?

Updated by normalperson (Eric Wong) 12 months ago

takashikkbn@gmail.com wrote:

Oh by the way, I have a patch which was created yesterday but
I was too busy to commit that yet. (I'll commit that shortly.)

OK, please do.

I am also testing a minor cleanup:
https://80x24.org/spew/20181118081847.16293-1-e@80x24.org/raw
(but will probably be afk next 12 hours or so)

I think thread-safety of postponed_job and supporting MJIT in
forked Ruby process are independent issues, but maybe your
"workqueue" will be helpful for forked child somehow?

Somewhat related. We need to ensure a consistent state of the
queue (either postponed_job or workqueue).

Right now, we cannot guarantee a consistent state after forking;
either.

Updated by k0kubun (Takashi Kokubun) 12 months ago

OK, please do.

Done, in r65785. Comments from your point of view would be much appreciated.

I am also testing a minor cleanup

looks fine.

Somewhat related. We need to ensure a consistent state of the queue (either postponed_job or workqueue).

In a sense that postponed_job could be in an unexpected state by creating a child Ruby process which suddenly lost MJIT worker thread modifying postponed_job queue, agreed.

Right now, we cannot guarantee a consistent state after forking; either.

Yeah, I found that issue when I was creating r65785. In r65785, I added mjit_pause before fork and mjit_resumes after fork (in different places depending on whether it's child or parent). As long as MJIT worker thread is absent when Ruby is forked, I think the situation became a little better.

Updated by k0kubun (Takashi Kokubun) 12 months ago

Well, I seem to have introduced at least two deadlocks by that. I already fixed the first one, and I'm going to fix the second one http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1467162.

I'll take a deeper look when I come home today, but do you know who wakes up rb_sigwait_sleep and/or how the hang there could happen?

Updated by normalperson (Eric Wong) 12 months ago

takashikkbn@gmail.com wrote:

Well, I seem to have introduced at least two deadlocks by
that. I already fixed the first one, and I'm going to fix the
second one
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1467162.

I'll take a deeper look when I come home today, but do you
know who wakes up rb_sigwait_sleep and/or how the hang there
could happen?

Any signal should wake it up...

If you can't solve it, can you wait until after the 11/22
developers meeting?

It's a small chance, but I really want to get Thread::Light
(auto-fiber) ready and into 2.6 (at least for Linux/FreeBSD),
and I barely have time for Ruby as it is.

It's not looking like I'll be able to afford to hack on Ruby
next year :<

Updated by k0kubun (Takashi Kokubun) 12 months ago

I see.

If you can't solve it, can you wait until after the 11/22
developers meeting?

Sure. I hope we'll release the fix on rc1, so I'll try to fix that anyway.

I hope auto-fiber will be in 2.6 too :)

#8

Updated by k0kubun (Takashi Kokubun) 12 months ago

  • Related to Bug #15320: IO.popen with MJIT worker thread may deadlock added

Updated by k0kubun (Takashi Kokubun) 12 months ago

If you can't solve it, can you wait until after the 11/22 developers meeting?

As tracked in [Bug #15320], I succeeded to fix it in r 65817. So, please never mind about that.
http://ci.rvm.jp/results/trunk-mjit-wait@silicon-docker/1468912
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1468910

I'll take a look at postponed_job race then.

#10

Updated by k0kubun (Takashi Kokubun) 12 months ago

  • Status changed from Open to Closed

Applied in changeset trunk|r66001.


vm_trace.c: MJIT-limited thread-safety for postponed_job

[Bug #15316]

Updated by k0kubun (Takashi Kokubun) 12 months ago

r66100

Thanks to work on this, Eric.

Also available in: Atom PDF