Bug #5463
closedPTY or IO.select timing issue results in no EOF
Description
I have observed that when running a shell through PTY the slave will sometimes fail to produce an EOF after an exit command. As a result polling via IO.select can timeout. A full example is attached. This is a simplified example illustrating the problematic loop:
PTY.spawn ...¶
master.write "exit 8\n"
str = ''
while true
unless IO.select([slave],nil,nil,3)
raise "timeout waiting for slave EOF"
end
if slave.eof?
break
end
str << slave.read(1)
end
After 'exit' is written to master, the loop normally reads all of slave into str. The select ensures the loop can timeout but under normal circumstances it will not (3 seconds is plenty of time to exit a shell). The bug is that it occasionally does timeout having never seen an EOF - meaning either the select is not detecting EOF on the slave or an EOF is not being written to the slave.
The bizarre thing is that I can confirm after the timeout that the pty process does exit with the correct status (8) regardless of whether the loop exits normally with an EOF or by timeout.
I'm not sure if this is an issue with the PTY implementation, an issue with the shell, or with the OS. I have observed the bug repeatedly using 1.9.2 on OS X 10.6.8, Ubuntu 11.04, and SLES 10, and with shells bash, ksh, csh, zsh (although mostly with bash). I suspect the PTY implementation plays some role because the bug does not appear to occur on 1.8.7 and 1.8.6. However the frequency of the bug varies so much across OS and shell, I know it could very well be an issue outside of ruby.
To reproduce, run the pty_fail.rb script for 10k (or more) iterations. On OS X it usually crops up within 10k. On Ubuntu 11.04 it is very, very rare, ~100k may be needed. Ex:
ruby pty_no_eof_example.rb 10000 /bin/bash
Files