Project

General

Profile

Bug #14898

test/lib/test/unit/parallel.rb: TestSocket#test_timestamp stuck sometimes

Added by ko1 (Koichi Sasada) 9 months ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
[ruby-core:87836]

Description

With parallel tests (make test-all TESTS=-j4 with 4 parallelism) stuck sometimes.

http://ci.rvm.jp/results/trunk-test@ruby-sky3/1087178

We can see this stuck very old revisions but not sure how to solve...

Anyone help us?

Associated revisions

Revision eb78beda
Added by normal 9 months ago

test/socket/test_socket.rb (test_timestamp): retry send

I theorize there can be UDP packet loss even over loopback if
the kernel is under memory pressure. Retry sending periodically
until recvmsg succeeds.

i[ruby-core:87842] [Bug #14898]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63872 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 63872
Added by normalperson (Eric Wong) 9 months ago

test/socket/test_socket.rb (test_timestamp): retry send

I theorize there can be UDP packet loss even over loopback if
the kernel is under memory pressure. Retry sending periodically
until recvmsg succeeds.

i[ruby-core:87842] [Bug #14898]

Revision 63872
Added by normal 9 months ago

test/socket/test_socket.rb (test_timestamp): retry send

I theorize there can be UDP packet loss even over loopback if
the kernel is under memory pressure. Retry sending periodically
until recvmsg succeeds.

i[ruby-core:87842] [Bug #14898]

Revision 12f11714
Added by normal 8 months ago

test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

cf. http://ci.rvm.jp/results/trunk-test@frontier/1153126
[ruby-core:88104] [Bug #14898]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64157 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 64157
Added by normalperson (Eric Wong) 8 months ago

test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

cf. http://ci.rvm.jp/results/trunk-test@frontier/1153126
[ruby-core:88104] [Bug #14898]

Revision 64157
Added by normal 8 months ago

test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

cf. http://ci.rvm.jp/results/trunk-test@frontier/1153126
[ruby-core:88104] [Bug #14898]

Revision ce48b558
Added by normal 7 months ago

test/socket/test_socket.rb (timestamp_retry_rw): IO.select before recvmsg

CI failures are still happening from these tests, but try
to break out of it earlier instead of holding up the job.

[Bug #14898]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64484 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 64484
Added by normalperson (Eric Wong) 7 months ago

test/socket/test_socket.rb (timestamp_retry_rw): IO.select before recvmsg

CI failures are still happening from these tests, but try
to break out of it earlier instead of holding up the job.

[Bug #14898]

Revision 64484
Added by normal 7 months ago

test/socket/test_socket.rb (timestamp_retry_rw): IO.select before recvmsg

CI failures are still happening from these tests, but try
to break out of it earlier instead of holding up the job.

[Bug #14898]

History

Updated by normalperson (Eric Wong) 9 months ago

ko1@atdot.net wrote:

With parallel tests (make test-all TESTS=-j4 with 4 parallelism) stuck sometimes.

http://ci.rvm.jp/results/trunk-test@ruby-sky3/1087178

We can see this stuck very old revisions but not sure how to solve...

I've never seen it stuck myself.

Is UDP over loopback supposed to be reliable?

I would not expect it to be (but am not sure), I think it's
possible the kernel could drop packets if under memory pressure.

Updated by ko1 (Koichi Sasada) 9 months ago

On 2018/07/06 18:47, Eric Wong wrote:

I've never seen it stuck myself.

Only a few times per thousands trial. I also never seen in manual trial.

Is UDP over loopback supposed to be reliable?

Maybe yes because other tests passed.

I would not expect it to be (but am not sure), I think it's
possible the kernel could drop packets if under memory pressure.

mmm. can we rewrite tests with this concern?

--
// SASADA Koichi at atdot dot net

#3

Updated by normalperson (Eric Wong) 9 months ago

  • Status changed from Open to Closed

Applied in changeset trunk|r63872.


test/socket/test_socket.rb (test_timestamp): retry send

I theorize there can be UDP packet loss even over loopback if
the kernel is under memory pressure. Retry sending periodically
until recvmsg succeeds.

i[ruby-core:87842] [Bug #14898]

Updated by normalperson (Eric Wong) 9 months ago

Koichi Sasada ko1@atdot.net wrote:

On 2018/07/06 18:47, Eric Wong wrote:

I would not expect it to be (but am not sure), I think it's
possible the kernel could drop packets if under memory pressure.

mmm. can we rewrite tests with this concern?

Maybe r63872 can help by retrying send.

Updated by ko1 (Koichi Sasada) 9 months ago

On 2018/07/07 14:36, Eric Wong wrote:

Maybe r63872 can help by retrying send.

Great! Thank you.

--
// SASADA Koichi at atdot dot net

#7

Updated by ko1 (Koichi Sasada) 8 months ago

  • Status changed from Closed to Open
#8

Updated by normalperson (Eric Wong) 8 months ago

  • Status changed from Open to Closed

Applied in changeset trunk|r64157.


test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

cf. http://ci.rvm.jp/results/trunk-test@frontier/1153126
[ruby-core:88104] [Bug #14898]

Updated by normalperson (Eric Wong) 8 months ago

ko1@atdot.net wrote:

http://ci.rvm.jp/results/trunk-test@frontier/1153126

Oh, different test, that is test_timestampns getting stuck.
Trying r64157:

test/socket/test_socket.rb (test_timestampns): retry send

It looks like we need to retry test_timestampns in addition
to test_timestamp; so share some code while we're at it.

Updated by normalperson (Eric Wong) 8 months ago

http://ci.rvm.jp/results/trunk_clang_38@silicon-docker/1185552
:<

ko1@atdot.net wrote:

http://ci.rvm.jp/results/trunk-test@frontier/1153126

ko1: is frontier also on Docker? I seem to remember hearing of
some UDP problems in containers several years ago, but maybe it
was only UDP multicast... This was years ago, and I never tried
containers myself.

Updated by ko1 (Koichi Sasada) 8 months ago

Oh, different test, that is test_timestampns getting stuck.

sorry.

ko1: is frontier also on Docker?

No. It raw Linux machine.

Updated by normalperson (Eric Wong) 7 months ago

ko1@atdot.net wrote:

Bug #14898: test/lib/test/unit/parallel.rb: TestSocket#test_timestamp stuck sometimes
https://bugs.ruby-lang.org/issues/14898#change-73373

Still not solved. This might be a similar issue to r64478 with
too many pipes...

Also available in: Atom PDF