Project

General

Profile

Actions

Bug #18811

open

PTY I/O not working on AIX 7.x

Added by hspem (Per-Erik Martin) 28 days ago. Updated 20 days ago.

Status:
Feedback
Priority:
Normal
Target version:
-
ruby -v:
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [powerpc-aix7.1.0]
[ruby-core:108737]

Description

The attached test script simply executes a command under a PTY and captures the output and exit code.
This works on Linux (all supported versions of Redhat, Debian, Ubuntu, and SuSE) as well as Solaris 11.4 on sparc and x86.
But on AIX 7.1 - 7.3, it doesn't. This was tested with a ruby 3.1.2 that I built, but it exhibits the same behavior on other ruby versions as well, like 2.7.5 from IBM's freeware repo.

The core of the attached program is:

begin
  PTY.spawn($command) do |r, w, pid|
    w.close
    begin
      r.each do |line|
        puts "Got line \"#{line.chomp}\""
      end
    rescue Errno::EIO
      # ignore
    rescue => e
      $stderr.puts "Read error: #{e}"
    ensure
      Process.wait pid
    end
    $exit_status = $?.exitstatus
  end
rescue => e
  $stderr.puts "PTY.spawn error: #{e}"
end

This should read the output from echo but doesn't:

# ./ptytest.rb 'echo b'
Exit status 0
# 

It does execute the command, but it seems it can't read the output:

# ./ptytest.rb 'exit 3'
Exit status 3
# cat /tmp/e.txt
cat: 0652-050 Cannot open /tmp/e.txt.
# ./ptytest.rb 'echo b > /tmp/e.txt'
Exit status 0
# cat /tmp/e.txt
b
# 

However, the unit tests for PTY work, so how it this possible? It turns out that all the unit tests run ruby (tinyruby really), and this works:

# ./ptytest.rb 'ruby -e "puts \"b\""'
Got line "b"
Exit status 0
# 

as well as this:

# ./ptytest.rb 'ruby -e "system \"echo b\""'
Got line "b"
Exit status 0
# 

So I'm at a loss here. How come it works to use "ruby" but no other commands? Is there something wrong with the script? If so, why does it work on all other platforms?


Files

ptytest.rb (534 Bytes) ptytest.rb hspem (Per-Erik Martin), 05/30/2022 07:26 AM
pty.c.patch (655 Bytes) pty.c.patch hspem (Per-Erik Martin), 06/07/2022 07:15 AM
Actions #1

Updated by hspem (Per-Erik Martin) 28 days ago

  • Subject changed from PTY I/O now working on AIX 7.x to PTY I/O not working on AIX 7.x

Updated by mame (Yusuke Endoh) 28 days ago

  • Assignee set to kanemoto (Yutaka Kanemoto)
  • Status changed from Open to Feedback

@kanemoto (Yutaka Kanemoto) Can you handle this ticket?

This is my impression, but the maintenance state of AIX is not good. The CI for AIX has not been working for a long time. This issue is probably unlikely to be fixed unless you provide a patch; even if a patch is created, we cannot promise that we will apply it.

Updated by hspem (Per-Erik Martin) 28 days ago

I had some instances with inconsistent behavior and got an idea...

# ./ptytest.rb 'echo foo'
Exit status 0
# ./ptytest.rb 'echo foo'
Exit status 0
# ./ptytest.rb 'echo foo'
Exit status 0
# ./ptytest.rb 'sleep 1;echo foo'
Got line "foo"
Exit status 0
# ./ptytest.rb 'sleep 1;echo foo'
Got line "foo"
Exit status 0
# ./ptytest.rb 'sleep 1;echo foo'
Got line "foo"
Exit status 0
# 

So it's some kind of race condition apparently. If the command takes some time before it generates the output, it works. The systems I'm testing on are notoriously slow (virtual machines on not very new hardware). This might explain why this has gone unnoticed by others, if you have fast enough systems and/or run commands that don't print output too soon, it will appear to work.

Updated by hspem (Per-Erik Martin) 25 days ago

Another discovery... sleep after the commands makes it work as well:

# ./ptytest.rb 'echo foo; sleep 1'
Got line "foo"
Exit status 0
# 

So apparently, it's when the child process is exiting too soon the problem arises.

Updated by hspem (Per-Erik Martin) 20 days ago

mame (Yusuke Endoh) wrote in #note-2:

This is my impression, but the maintenance state of AIX is not good. The CI for AIX has not been working for a long time. This issue is probably unlikely to be fixed unless you provide a patch; even if a patch is created, we cannot promise that we will apply it.

I found a way to fix this. The problem seems to be a known strangeness in the AIX pty implementation. It's sensitive to in which order things are done relative closing the descriptors. It's more common that one of the processes simply hangs if you get it wrong, but apparently you can also lose output. We have had a similar problem in sshd where it would hang unless you synchronized the parent and child processes and made the child wait for the master to close the slave end before proceeding. (And I have had processes hanging in ruby's pty as well, although not as often.) So the issue seems to be when the parent process does "close(slave)" and when this happens in relation to what the child process is doing.

Adding some kind of synchronization isn't easy in ext/pty/pty.c, but there's another way:
Since AIX does not have TIOCSTTY defined, chfunc() will close "slave" and reopen the device. This means that the parent process can close its slave descriptor before forking, instead of afterwards. This resolves the race condition.

The patch is simple, attached.

Actions

Also available in: Atom PDF