Project

General

Profile

Actions

Bug #20101

closed

rb_file_open and rb_io_fdopen don't perform CRLF -> LF conversion when encoding is set

Added by kjtsanaktsidis (KJ Tsanaktsidis) about 1 year ago. Updated about 1 year ago.


Description

When opening a file with File.open, as long as 'b' is not set in the mode, Ruby will perform CRLF -> LF conversion on Windows when reading text files - i.e. CRLF line endings on disk get converted to Ruby strings with only "\n" in them. If you explicitly set the encoding with IO#set_encoding, this still works properly.

If you open the file in C with either the rb_io_fdopen or rb_file_open APIs in text mode, CRLF -> LF conversion also works. However, if you then call IO#set_encoding on this file, the CRLF -> LF conversion stops happening.

Concretely, this means that the conversion doesn't happen in the following circumstances:

  • When loading ruby files with require (that calls rb_io_fdopen)
  • When parsing ruuby files with RubyVM::AbstractSyntaxTree (that calls rb_file_open).
    This then causes the ErrorHighlight tests to fail on windows if git has checked them out with CRLF line endings - the error messages it's testing wind up with literal \r\n sequences in them because the iseq text from the parser contains un-newline-converted strings.

This seems to happen because, in File.open, the file's encflags get the flag ECONV_DEFAULT_NEWLINE_DECORATOR in rb_io_extract_modeenc; however, this method isn't called for rb_io_fdopen or rb_file_open, so encflags doesn't get set to ECONV_DEFAULT_NEWLINE_DECORATOR. Without that flag, the underlying file descriptor's mode gets changed to binary mode by the NEED_NEWLINE_DECORATOR_ON_READ_CHECK macro.

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 1 year ago

  • Status changed from Open to Closed

Fix merged in 31371b2e24b03ccb0a03b622faf8c65e6cf6a31a

Actions

Also available in: Atom PDF

Like0
Like0Like0