Project

General

Profile

Bug #14898

test/lib/test/unit/parallel.rb: TestSocket#test_timestamp stuck sometimes

Added by ko1 (Koichi Sasada) 3 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
[ruby-core:87836]

Description

With parallel tests (make test-all TESTS=-j4 with 4 parallelism) stuck sometimes.

http://ci.rvm.jp/results/trunk-test@ruby-sky3/1087178

We can see this stuck very old revisions but not sure how to solve...

Anyone help us?

Associated revisions

Revision eb78beda
Added by normal 3 months ago

test/socket/test_socket.rb (test_timestamp): retry send

I theorize there can be UDP packet loss even over loopback if
the kernel is under memory pressure. Retry sending periodically
until recvmsg succeeds.

i [Bug #14898]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63872 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 63872
Added by normalperson (Eric Wong) 3 months ago

test/socket/test_socket.rb (test_timestamp): retry send

I theorize there can be UDP packet loss even over loopback if
the kernel is under memory pressure. Retry sending periodically
until recvmsg succeeds.

i [Bug #14898]

Revision 12f11714
Added by normal about 2 months ago

test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

cf. http://ci.rvm.jp/results/trunk-test@frontier/1153126
[Bug #14898]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64157 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 64157
Added by normalperson (Eric Wong) about 2 months ago

test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

cf. http://ci.rvm.jp/results/trunk-test@frontier/1153126
[Bug #14898]

Revision ce48b558
Added by normal about 1 month ago

test/socket/test_socket.rb (timestamp_retry_rw): IO.select before recvmsg

CI failures are still happening from these tests, but try
to break out of it earlier instead of holding up the job.

[Bug #14898]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64484 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 64484
Added by normalperson (Eric Wong) about 1 month ago

test/socket/test_socket.rb (timestamp_retry_rw): IO.select before recvmsg

CI failures are still happening from these tests, but try
to break out of it earlier instead of holding up the job.

[Bug #14898]

History

#1 [ruby-core:87838] Updated by normalperson (Eric Wong) 3 months ago

ko1@atdot.net wrote:

With parallel tests (make test-all TESTS=-j4 with 4 parallelism) stuck sometimes.

http://ci.rvm.jp/results/trunk-test@ruby-sky3/1087178

We can see this stuck very old revisions but not sure how to solve...

I've never seen it stuck myself.

Is UDP over loopback supposed to be reliable?

I would not expect it to be (but am not sure), I think it's
possible the kernel could drop packets if under memory pressure.

#2 [ruby-core:87842] Updated by ko1 (Koichi Sasada) 3 months ago

On 2018/07/06 18:47, Eric Wong wrote:

I've never seen it stuck myself.

Only a few times per thousands trial. I also never seen in manual trial.

Is UDP over loopback supposed to be reliable?

Maybe yes because other tests passed.

I would not expect it to be (but am not sure), I think it's
possible the kernel could drop packets if under memory pressure.

mmm. can we rewrite tests with this concern?

--
// SASADA Koichi at atdot dot net

#3 Updated by normalperson (Eric Wong) 3 months ago

  • Status changed from Open to Closed

Applied in changeset trunk|r63872.


test/socket/test_socket.rb (test_timestamp): retry send

I theorize there can be UDP packet loss even over loopback if
the kernel is under memory pressure. Retry sending periodically
until recvmsg succeeds.

i [Bug #14898]

#4 [ruby-core:87843] Updated by normalperson (Eric Wong) 3 months ago

Koichi Sasada ko1@atdot.net wrote:

On 2018/07/06 18:47, Eric Wong wrote:

I would not expect it to be (but am not sure), I think it's
possible the kernel could drop packets if under memory pressure.

mmm. can we rewrite tests with this concern?

Maybe r63872 can help by retrying send.

#5 [ruby-core:87844] Updated by ko1 (Koichi Sasada) 3 months ago

On 2018/07/07 14:36, Eric Wong wrote:

Maybe r63872 can help by retrying send.

Great! Thank you.

--
// SASADA Koichi at atdot dot net

#7 Updated by ko1 (Koichi Sasada) about 2 months ago

  • Status changed from Closed to Open

#8 Updated by normalperson (Eric Wong) about 2 months ago

  • Status changed from Open to Closed

Applied in changeset trunk|r64157.


test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

cf. http://ci.rvm.jp/results/trunk-test@frontier/1153126
[Bug #14898]

#9 [ruby-core:88268] Updated by normalperson (Eric Wong) about 2 months ago

ko1@atdot.net wrote:

http://ci.rvm.jp/results/trunk-test@frontier/1153126

Oh, different test, that is test_timestampns getting stuck.
Trying r64157:

test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

#10 [ruby-core:88277] Updated by normalperson (Eric Wong) about 2 months ago

http://ci.rvm.jp/results/trunk_clang_38@silicon-docker/1185552
:<

ko1@atdot.net wrote:

http://ci.rvm.jp/results/trunk-test@frontier/1153126

ko1: is frontier also on Docker? I seem to remember hearing of
some UDP problems in containers several years ago, but maybe it
was only UDP multicast... This was years ago, and I never tried
containers myself.

#11 [ruby-core:88345] Updated by ko1 (Koichi Sasada) about 1 month ago

Oh, different test, that is test_timestampns getting stuck.

sorry.

ko1: is frontier also on Docker?

No. It raw Linux machine.

#12 [ruby-core:88562] Updated by normalperson (Eric Wong) about 1 month ago

ko1@atdot.net wrote:

Bug #14898: test/lib/test/unit/parallel.rb: TestSocket#test_timestamp stuck sometimes
https://bugs.ruby-lang.org/issues/14898#change-73373

Still not solved. This might be a similar issue to r64478 with
too many pipes...

Also available in: Atom PDF