Project

General

Profile

Bug #14898

test/lib/test/unit/parallel.rb: TestSocket#test_timestamp stuck sometimes

Added by ko1 (Koichi Sasada) almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
[ruby-core:87836]

Description

With parallel tests (make test-all TESTS=-j4 with 4 parallelism) stuck sometimes.

http://ci.rvm.jp/results/trunk-test@ruby-sky3/1087178

We can see this stuck very old revisions but not sure how to solve...

Anyone help us?

Updated by normalperson (Eric Wong) almost 2 years ago

ko1@atdot.net wrote:

With parallel tests (make test-all TESTS=-j4 with 4 parallelism) stuck sometimes.

http://ci.rvm.jp/results/trunk-test@ruby-sky3/1087178

We can see this stuck very old revisions but not sure how to solve...

I've never seen it stuck myself.

Is UDP over loopback supposed to be reliable?

I would not expect it to be (but am not sure), I think it's
possible the kernel could drop packets if under memory pressure.

Updated by ko1 (Koichi Sasada) almost 2 years ago

On 2018/07/06 18:47, Eric Wong wrote:

I've never seen it stuck myself.

Only a few times per thousands trial. I also never seen in manual trial.

Is UDP over loopback supposed to be reliable?

Maybe yes because other tests passed.

I would not expect it to be (but am not sure), I think it's
possible the kernel could drop packets if under memory pressure.

mmm. can we rewrite tests with this concern?

--
// SASADA Koichi at atdot dot net

#3

Updated by normalperson (Eric Wong) almost 2 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r63872.


test/socket/test_socket.rb (test_timestamp): retry send

I theorize there can be UDP packet loss even over loopback if
the kernel is under memory pressure. Retry sending periodically
until recvmsg succeeds.

i[ruby-core:87842] [Bug #14898]

Updated by normalperson (Eric Wong) almost 2 years ago

Koichi Sasada ko1@atdot.net wrote:

On 2018/07/06 18:47, Eric Wong wrote:

I would not expect it to be (but am not sure), I think it's
possible the kernel could drop packets if under memory pressure.

mmm. can we rewrite tests with this concern?

Maybe r63872 can help by retrying send.

Updated by ko1 (Koichi Sasada) almost 2 years ago

On 2018/07/07 14:36, Eric Wong wrote:

Maybe r63872 can help by retrying send.

Great! Thank you.

--
// SASADA Koichi at atdot dot net

#7

Updated by ko1 (Koichi Sasada) almost 2 years ago

  • Status changed from Closed to Open
#8

Updated by normalperson (Eric Wong) almost 2 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r64157.


test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

cf. http://ci.rvm.jp/results/trunk-test@frontier/1153126
[ruby-core:88104] [Bug #14898]

Updated by normalperson (Eric Wong) almost 2 years ago

ko1@atdot.net wrote:

http://ci.rvm.jp/results/trunk-test@frontier/1153126

Oh, different test, that is test_timestampns getting stuck.
Trying r64157:

test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

Updated by normalperson (Eric Wong) almost 2 years ago

http://ci.rvm.jp/results/trunk_clang_38@silicon-docker/1185552
:<

ko1@atdot.net wrote:

http://ci.rvm.jp/results/trunk-test@frontier/1153126

ko1: is frontier also on Docker? I seem to remember hearing of
some UDP problems in containers several years ago, but maybe it
was only UDP multicast... This was years ago, and I never tried
containers myself.

Updated by ko1 (Koichi Sasada) almost 2 years ago

Oh, different test, that is test_timestampns getting stuck.

sorry.

ko1: is frontier also on Docker?

No. It raw Linux machine.

Updated by normalperson (Eric Wong) almost 2 years ago

ko1@atdot.net wrote:

Bug #14898: test/lib/test/unit/parallel.rb: TestSocket#test_timestamp stuck sometimes
https://bugs.ruby-lang.org/issues/14898#change-73373

Still not solved. This might be a similar issue to r64478 with
too many pipes...

Also available in: Atom PDF