IO#flush causes unnecessary fsync on Windows
On Windows calling IO#flush is effectively identical to calling IO#fsync, i.e. contents of the file are committed to disk platters instead of just being flushed. I traced it back to bug #776 where the original "bug" was worked around by forcing fsync to happen on flushes. Unfortunately due to this change IO#flush becomes unusable, as fsync are very expensive, e.g. on one of my machines I had fsync taking up to 150ms and I heard stories of machines where fsync takes on the order of 2000ms.
Originally I discovered this problem where my script would print out a couple hundred lines using Kernel#p, and to my astonishment when I redirected to a file script started taking several seconds to complete.
The problem with original fix (adding fsync during flush) is that there was no issue to begin with. It's not even due to Windows per se why file size is not updated, it's due to how NTFS driver is optimized to not update file size (in the directory entry) until the file is closed. Please read this blog post on details about what's going on: http://blogs.msdn.com/b/oldnewthing/archive/2011/12/26/10251026.aspx
What I mean is that IO#flush without fsync properly flushes all the data to the file, you can read all this data from another process, the only thing that is not updated is directory entry metadata (until the file is closed), which is by design, it's how it's supposed to work on Windows with NTFS filesystem. The workaround (i.e. fsync) working is more of an accident, it's just when OS is forced to write all that data to disk it currently tries to create a consistent picture and updates directory metadata as well, there's nothing saying that it would keep doing that in the future. Worst of all is that original bug was about temporary files, and fsync during IO#flush forces them to be written to disk, even if they are short lived.
Please remove fsync from IO#flush on Windows. You shouldn't workaround correct Windows behavior and make it unbearably slow. Instead, people need to learn how filesystems work on Windows and learn to close files if they are finished writing to them and really need directory metadata to be updated (however most of the time people shouldn't care about directory metadata like file size, it's just some arbitrary cached value and is not necessarily true all of the time).
- io.c (rb_io_flush_raw, rb_io_fsync): [EXPERIMENTAL] remove force syncing for Win32 to speed up IO. this may break some tests, and they'll be fixed later. [Bug #9153]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@45254 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
#3 [ruby-core:58836] Updated by snaury (Alexey Borzenkov) over 4 years ago
usa (Usaku NAKAMURA) wrote:
Thank you for your long description.
I would like to also know how we educate all the people to take care of Windows.
I'm not sure what you mean? One idea would be to change test_size_flushes_buffer_before_determining_file_size in test/test_tempfile.rb to skip the last assert on /mswin|mingw/ with a comment on why. So the next time it comes up there would be a record of why flushing is not supposed to change file size in a directory entry. Also in lib/minitest/unit.rb it should be a.close and b.close instead of a.flush and b.flush (you don't need to keep files open when you only need a filename).
Also, maybe update tempfile.rb to mention this file size not being in sync as expected on Windows, and that people should close their tempfiles before giving filenames to other processes, etc.? (since it seems like old unexpected behavior was mostly with tempfiles, and not closing tempfiles in a block is so strangely common).
#4 [ruby-core:61230] Updated by usa (Usaku NAKAMURA) over 4 years ago
- Status changed from Feedback to Closed
- % Done changed from 0 to 100
#5 [ruby-core:61231] Updated by usa (Usaku NAKAMURA) over 4 years ago
- ruby -v changed from ruby 2.0.0p353 (2013-11-22) [i386-mingw32] to -
Ah, sorry, I can see now that it was already reverted. However it was
reverted together with #ifndef _WIN32. That #ifndef is not needed, i.e.
rb_thread_io_blocking_region(nogvl_fsync, fptr, fptr->fd) should be called
Oops, I mistook!
You are completely right.