Bug #18768
closedInconsistent behavior of IO, StringIO and String each_line methods when return paragraph and chomp: true passed
Description
In IO, StringIO and String methods that return lines (e.g. each, each_line, gets, readline, readlines) behave in a different way when the following parameters are passed
- separator is
""and -
chompistrue.
They truncate the new line characters between paragraphs and the trailing one differently:
"a\n\nb\n\nc\n".each_line("", chomp: true).to_a
#=> ["a\n", "b\n", "c\n"]
StringIO.new("a\n\nb\n\nc\n").each_line("", chomp: true).to_a
#=> ["a\n", "b\n", "c"]
File.open('chomp.txt').each_line("", chomp: true).to_a
#=> ["a", "b", "c\n"]
The text file content is the same:
File.read('chomp.txt')
#=> "a\n\nb\n\nc\n"
Expected behavior - they return the same result.
Updated by andrykonchin (Andrew Konchin) over 3 years ago
- Description updated (diff)
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
- Related to Bug #18770: Inconsistent behavior of IO/StringIO's each methods when called with nil as a separator, limit and chomp: true added
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
I agree that the behavior for all of these methods should be the same. I think the IO behavior makes the most sense, and String and StringIO should be changed to match it.
I submitted pull requests to Ruby (https://github.com/ruby/ruby/pull/5960) and StringIO (https://github.com/ruby/stringio/pull/29) to fix this.
Updated by jeremyevans (Jeremy Evans) over 3 years ago
- Status changed from Open to Closed
Applied in changeset git|609d73e8925f807786686caf635178bb1de74256.
[ruby/stringio] Fix handling of chomp with paragraph separator
Try to mirror IO behavior, where chomp takes out the entire paragraph
separators between entries, but does not chomp a single line separator
at the end of the string.
Partially Fixes [Bug #18768]
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
- Status changed from Closed to Open
Updated by mame (Yusuke Endoh) over 3 years ago
I have no strong opinion about this issue and am never against Jeremy's change, but I'm just curious. I thought that each_line(..., chomp: true) {|s| ... } was equal to each_line(...) {|s| s = s.chomp; ... }. In the above examples, StringIO's result (["a\n", "b\n", "c"]) was the simplest to me. @jeremyevans0 (Jeremy Evans) How did you think the IO behavior made the most sense?
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
mame (Yusuke Endoh) wrote in #note-6:
I have no strong opinion about this issue and am never against Jeremy's change, but I'm just curious. I thought that
each_line(..., chomp: true) {|s| ... }was equal toeach_line(...) {|s| s = s.chomp; ... }. In the above examples, StringIO's result (["a\n", "b\n", "c"]) was the simplest to me. @jeremyevans0 (Jeremy Evans) How did you think the IO behavior made the most sense?
My thought here is that chomp: true, does not mean chomp with no arguments, it means chomp with the separator used for the each_line. So when you call each_line with '' (paragraph separator), it only chomps the paragraph separator. That's what the IO behavior already was, and I thought it best to make String/StringIO consistent with that.
Updated by mame (Yusuke Endoh) over 3 years ago
@jeremyevans0 (Jeremy Evans) Thank you for your explanation, I understand. The keyword name "chomp" might be a bit confusing.
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
mame (Yusuke Endoh) wrote in #note-8:
@jeremyevans0 (Jeremy Evans) Thank you for your explanation, I understand. The keyword name "chomp" might be a bit confusing.
I agree, the behavior is not obvious from the name chomp. However, it is obvious that chomp is supposed to remove the separator and not the generic newline. You can see this when providing an explicit line ending character:
'abc'.each_line('b', chomp: true).to_a
# => ["a", "c"]
require 'stringio'
StringIO.new('abc').each_line('b', chomp: true).to_a
# => ["a", "c"]
r, w = IO.pipe
w.write('abc')
w.close
r.each_line('b', chomp: true).to_a
# => ["a", "c"]
Updated by mame (Yusuke Endoh) over 3 years ago
We discussed this at the dev meeting, and @matz (Yukihiro Matsumoto) agreed with @jeremyevans0 (Jeremy Evans): chomp: true should remove the separator. Could you please merge the change?
Updated by jeremyevans (Jeremy Evans) over 3 years ago
- Status changed from Open to Closed
Applied in changeset git|423b41cba77719b4f62aa530593ad36a990f7c74.
Make String#each_line work correctly with paragraph separator and chomp
Previously, it was including one newline when chomp was used,
which is inconsistent with IO#each_line behavior. This makes
behavior consistent with IO#each_line, chomping all paragraph
separators (multiple consecutive newlines), but not single
newlines.
Partially Fixes [Bug #18768]