Project

General

Profile

Actions

Feature #18784

open

`FileUtils.rm_f` and `FileUtils.rm_rf` should not mask exceptions

Added by deivid (David Rodríguez) over 2 years ago. Updated over 1 year ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:108565]

Description

In recent times, I've been having issues with these methods because they don't let you know when some issue happened while trying to remove the given folders/files.

IMO most users expect all pre-existing folders/files that are passed to these methods to be actually removed by the methods. Instead, when this happens, errors are silently swallowed and normally the result is that you will get some other issue further down the road, making the problem hard to debug.

The current workaround I'm using is to double check whether the files still exist after the method, and raise a custom error if they do, but I still can't see the original problem, so issues are similarly hard to debug.

This is also a deviation from how rm -rf and rm -f work, since these tools finish with a failure exit code when they fail to remove the given files. Given that fileutils is intended to mimic shell functionality, I think this is just a bug.

I think the intention of the force flag here is to:

  • Don't prompt for confirmation.
  • Ignore given arguments that are not files that already exist.

But any issue other than that should not be swallowed, and in general I think the method should succeed if and only if the given list of file names does not exist after the methods are done.

I think this is in line with the following note I get when I run man rm, but also suggests that this is not the standard behavior of "historical implementations"

COMPATIBILITY

The rm utility differs from historical implementations in that the -f option only masks attempts to remove non-existent files instead of masking a large variety of errors.

I implemented this at https://github.com/ruby/fileutils/pull/58, but treating this as a bug. I can also implement a more conservative for approach for users that might be using FileUtils.rm_rf or FileUtils.rm_f but don't really care if the files are removed or not.

Alternative proposals would be FileUtils.rm_rf(force: strict), or FileUtils.strict_rm_rf, but to be honest, if this is considered a breaking change, I would ship it as a new major version, and let users update their code to swallow errors themselves if they need to.

Happy to hear any feedback!


Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #9260: make FileUtils.rm_rf raise on errorsClosedActions

Updated by mame (Yusuke Endoh) over 2 years ago

It would be helpful to write the issue in code:

$ mkdir foo                                    # create "foo/bar" with permission 555
$ touch foo/bar
$ chmod 555 foo

$ ruby -rfileutils -e 'FileUtils.rm_rf("foo")' # expected: an exception, actual: say nothing

$ ls foo                                       # the directory "foo" is not deleted
bar

FYI, rm -rf fails with an explicit message:

$ rm -rf foo
rm: cannot remove 'foo/bar': Permission denied

Updated by deivid (David Rodríguez) over 2 years ago

Thank you @mame (Yusuke Endoh) for doing that!

I wrote the issue in code in the upstream issue: https://github.com/ruby/fileutils/issues/57, but I forgot to also add that kind of repro here. So thanks again! :)

Updated by mame (Yusuke Endoh) over 2 years ago

At the dev meeting, we had a long discussion about this issue.

@aamine, who is the original author of fileutils, said that this behavior is by design. rm_r(path, force: true) means "ignore all errors (by catching StandardError)". He knew this meaning was different from -f option of rm command which means "ignore nonexistent files". @matz (Yukihiro Matsumoto) agreed with @aamine and is not so positive about the change.

@matz (Yukihiro Matsumoto) also said that he is open to a feature request to change/improve this behavior, but in that case, we need more precise specification proposal, its rationale, and consideration of compatibility. For example,

  • If a file that cannot be deleted is found during rm_rf, should an exception be raised as soon as possible? Or should it be raised after all deletable files have been removed? (The behavior of rm -rf is the latter.)
  • (If the latter of the above is chosen) what exception should be raised if there are multiple files that cannot be deleted? It seems that rm -rf reports an error whenever it finds a file that cannot be deleted, and exits with status non-zero if any file cannot be deleted. But this is difficult to imitate in FileUtils, so a new design decision for FileUtils is needed.
  • Should rm_rf ignore nonexistent directory directly passed in its argument? Or should it ignore any nonexistent files during recursion? We need to investigate how rm -rf behave in such a case. (rm_rf gets a list of files in a directory and then delete them. If rm -rf is performed against the same directory in parallel by another process, some files that are included in the list might be deleted by another process before rm_rf attempts to delete them.)

Updated by deivid (David Rodríguez) over 2 years ago

Thanks for discussing it!

In my opinion, the current behavior is hardly useful for anyone, so I think it should be removed. We got bug reports in Bundler for a long time that were very hard to investigate and replicate because the errors were so confusing due to this issue (they happened at random places, namely, whenever the assumption that the folder should've been removed was first broken).

I think the simplest would be to raise when a non deletable file is first found, even if it doesn't match rm -rf behavior exactly.

Regarding the behavior when a non existent file is passed, ignoring that error is how I found the :force flag documented. And about finding it during recursion, how would that be possible? You mean a broken symlink or something? If that's what you mean I would also raise there.

That all said, I actually found a workaround using Fileutils that does what I want:

def rm_rf(path)
  FileUtils.remove_entry_secure(path) if File.exist?(path)
end

So I'm no longer motivated anymore to fix this issue "the right way", specially given that it turns out that the behaviour is intentional and that nobody else has reported the headaches I reported.

Updated by deivid (David Rodríguez) over 2 years ago

I just saw the "deletion in parallel` explanation. In that case, I guess it makes sense to ignore all "file does not exist" errors, but not anything else. That still satisfies the assertion that the method would succeed if and only if the given list of file names does not exist after the methods are done.

Updated by mame (Yusuke Endoh) over 2 years ago

In my opinion, the current behavior is hardly useful for anyone

Personally, I agree. I want FileUtils.rm_rf to report an error if it fails to delete a file/directory passed as an argument. In the dev meeting, @knu (Akinori MUSHA) and @naruse (Yui NARUSE) had the same opinion.

I think the simplest would be to raise when a non deletable file is first found, even if it doesn't match rm -rf behavior exactly.

After some consideration, I second this. This is different from the behavior of rm -rf, but I think it is a difference between a command and a method. A command can report an error whenever it finds an undeletable file/directory, but it is difficult for a method to do the same.

One alternative design is to record an undeletable file name, go on to any remaining files, and at the end raise an exception with all undeletable file names. However, this approach may make the error message too long.

Another design I came up with is for FileUtils.rm_rf to directly write an error message to stderr whenever an undeletable file is found. This is very similar to rm -rf command, but this seems like too much of a side effect.

Also, I can't think of much benefit of the rm -rf behavior of deleting as much as possible. It may have an advantage of reducing disk space a bit, though...

Or should it ignore any nonexistent files during recursion?

I have confirmed that rm -rf has this behavior (ignore ENOENT even during recursion). My interpretation is that POSIX spec of rm also supports this behavior.

This is a PR to make FileUtils.rm* method with force option ignore ENOENT, not StandardError.

https://github.com/ruby/fileutils/pull/99

Updated by mame (Yusuke Endoh) over 2 years ago

Unfortunately, this change affected mkmf on Appveyor (Windows).

https://ci.appveyor.com/project/ruby/ruby/builds/44579924/job/n81xmb2mqs6no7dm#L23792

  2) Error:
TestMkmfTryConstant#test_simple:
Errno::EACCES: Permission denied @ apply2files - conftest.exe
    C:/projects/ruby/lib/fileutils.rb:2300:in `unlink'
    C:/projects/ruby/lib/fileutils.rb:2300:in `block in remove_file'
    C:/projects/ruby/lib/fileutils.rb:2308:in `platform_support'
    C:/projects/ruby/lib/fileutils.rb:2299:in `remove_file'
    C:/projects/ruby/lib/fileutils.rb:1449:in `remove_file'
    C:/projects/ruby/lib/fileutils.rb:1189:in `block in rm'
    C:/projects/ruby/lib/fileutils.rb:1188:in `each'
    C:/projects/ruby/lib/fileutils.rb:1188:in `rm'
    C:/projects/ruby/lib/fileutils.rb:1211:in `rm_f'
    C:/projects/ruby/lib/mkmf.rb:260:in `rm_f'
    C:/projects/ruby/lib/mkmf.rb:768:in `ensure in try_constant'
    C:/projects/ruby/lib/mkmf.rb:768:in `try_constant'
    C:/projects/ruby/test/mkmf/test_constant.rb:6:in `block in test_simple'
    C:/projects/ruby/test/mkmf/base.rb:132:in `instance_eval'
    C:/projects/ruby/test/mkmf/base.rb:132:in `mkmf'
    C:/projects/ruby/test/mkmf/test_constant.rb:6:in `test_simple'

mkmf uses FileUtils.rm_f to delete conftest.exe, which is a temporary executable. But this seems to fail occasionally.
According to @nobu (Nobuyoshi Nakada), mkmf tries to delete conftest.exe after the child process terminated. However, it may fail to delete an executable even immediately after execution ends. (Or, another process might lock the file somehow.)
This issue cannot reproduce on my Windows machine, so I cannot figure out the cause of the problem.
Note that @k0kubun (Takashi Kokubun) has disabled the tests in mkmf on Appveyor, so I think this failure of mkmf will decrease.

I am wondering if I should revert the change of FileUtils. I know that reverting will not solve the problem of mkmf. But this problem is harder to fix than I expected. I concern that this change will make users to just ignore exceptions of FileUtils.rm_rf, like begin; FileUtils.rm_rf(...); rescue SystemCallError; end. This is not what I wanted.

I will bring this reconsideration to the next dev meeting.

Updated by koic (Koichi ITO) over 2 years ago

I also encountered a possibly related Errno::EACCES error on Windows (mingw) of GitHub Actions.

Errno::EACCES:
  Permission denied @ apply2files - D:/a/_temp/d20220824-1696-gdx8qj/path/does/not/exist

Below are the background PRs.

Personally, I'm not already in trouble and this is informational.

Updated by deivid (David Rodríguez) over 2 years ago

In my opinion, this is ok. I also encountered some issues in RubyGems test suite when I first enabled this behavior that led me to correct some bad tests. And it also helped me figure out realworld bugs. I understand that sometimes it may not be easy to figure out the root cause, so people might go with begin; FileUtils.rm_rf(...); rescue SystemCallError; end. But even if that case they will be back to the previous situation, except it should be clear for them that something is off and they will hopefully add a TODO or something :)

I do understand the backwards compatibility concern of fixing this issue though. Even if this will surface issues in people's test, and that's a good thing, it could break CI's that were otherwise green. Would backwards compatible alternatives be considered like I suggest in the OP if this is reverted?

Updated by deivid (David Rodríguez) over 2 years ago

Another observation is: could we opt-out of this strict behavior only on Windows, given how picky it is about deleting used files? So far the only issues found in the realworld are under Windows I believe. Fileutils already has several special cases for dealing with Windows behavior.

Updated by mame (Yusuke Endoh) over 2 years ago

We faced another problem on Linux.

A rubygems test uses FileUtils.rm_rf in teardown. But it randomly fails for an unknown reason (mysteriously, only on MJIT test?). This makes the teardown method fail to restore the environment variables which were temporarily changed by the "setup" method. Unfortunately, the failure makes subsequent tests also fail.

http://ci.rvm.jp/results/trunk-mjit@phosphorus-docker/4211129

  3) Error:
TestGemExtRakeBuilder#test_class_build_with_args:
Errno::ENOTEMPTY: Directory not empty @ dir_s_rmdir - /tmp/ruby/v3/build/trunk-mjit/tmp/test_rubygems_20220826-23953-v9b8kw
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:2294:in `rmdir'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:2294:in `block in remove_dir1'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:2305:in `platform_support'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:2293:in `remove_dir1'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:2286:in `remove'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:1425:in `block in remove_entry'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:2351:in `postorder_traverse'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:1423:in `remove_entry'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:1276:in `block in rm_r'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:1272:in `each'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:1272:in `rm_r'
    /tmp/ruby/v3/src/trunk-mjit/lib/fileutils.rb:1300:in `rm_rf'
    /tmp/ruby/v3/src/trunk-mjit/test/rubygems/helper.rb:468:in `teardown'

I think that this pattern (FileUtils.rm_rf in teardown) is quite common, not only in rubygems. I am now afraid that this kind of trouble will occur in many other projects.

Unfortunately, this change caused much more problems than I had anticipated. I guess I'll just have to revert it.

Updated by deivid (David Rodríguez) over 2 years ago

Alright, I guess it makes sense to leave things as they are.

One question I have is: I though we had removed an ensure that prevented the original exception from being displayed and would instead show the error when removing the root directory. This one https://github.com/ruby/fileutils/pull/99/commits/ec5d3b84ea1e1d1b13c7dcb5bbe6cd5208c20a49.

Yet this exception is showing the error when trying to remove the root directory, and not the real culprit of failing to remove one file inside. Do you know why is that?

Updated by mame (Yusuke Endoh) over 2 years ago

deivid (David Rodríguez) wrote in #note-13:

Yet this exception is showing the error when trying to remove the root directory, and not the real culprit of failing to remove one file inside. Do you know why is that?

I have no idea. Perhaps another thread or process is creating files in the directory that is being deleted by FileUtils.rm_rf? This may be the case because the CI (ci.rvm.jp) runs tests in parallel. But I can't explain why it happens only on MJIT test.

Updated by Eregon (Benoit Daloze) over 2 years ago

I think those issues are revealing real problems which are worth investigating and fixing.
For instance if files are created and deleted in parallel in the same directory that seems like a good source of transient bugs waiting to happen.

Before the change I guess those directories were not removed? That seems a pretty big violation for the semantics of rm_rf.
Some file might be reused due to that and lead to a much more obscure error.

Updated by mame (Yusuke Endoh) over 2 years ago

mame (Yusuke Endoh) wrote in #note-14:

deivid (David Rodríguez) wrote in #note-13:

Yet this exception is showing the error when trying to remove the root directory, and not the real culprit of failing to remove one file inside. Do you know why is that?

I have no idea. Perhaps another thread or process is creating files in the directory that is being deleted by FileUtils.rm_rf? This may be the case because the CI (ci.rvm.jp) runs tests in parallel. But I can't explain why it happens only on MJIT test.

I think I have identified the mechanism. In short, my guess was correct.

  • rubygems attempts to delete TMPDIR by FileUtils.rm_rf.
  • FileUtils.rm_rf obtains a list of its child files, and starts deleting each of them.
  • During the deletion, MJIT (precisely, GCC invoked by MJIT) creates a temporary file in TMPDIR.
  • FileUtils.rm_rf finished deleting the children and then attempts to delete TMPDIR itself.
  • But it fails since TMPDIR is not empty.

I confirmed the mechanism by printing the remaining files in TMPDIR after FileUtils.rm_rf failed.
https://github.com/ruby/ruby/commit/7bdb999d0f28c7bb9d7a35ca775e405674527e5f

@k0kubun (Takashi Kokubun) put MJIT.pause before FileUtils.rm_rf. Since then, no more failures have occurred.
https://github.com/ruby/ruby/commit/95d2d7920c97d0502ebed4ba439177325ad05e57

Actions #17

Updated by mame (Yusuke Endoh) over 2 years ago

  • Assignee set to mame (Yusuke Endoh)

I have talked with @matz (Yukihiro Matsumoto) about this issue. He said let's revert this change. Instead, it will print a warning whenever it fails to delete a file (only when $VERBOSE = true). I will create a patch.

Updated by Eregon (Benoit Daloze) over 2 years ago

Those temporary directories should be separate, but I guess the issue is this line which means other things might get stored in that test temp dir:
https://github.com/ruby/ruby/blob/7bdb999d0f28c7bb9d7a35ca775e405674527e5f/test/rubygems/helper.rb#L337
That's the bug and what should be removed.

Updated by deivid (David Rodríguez) over 2 years ago

Thanks @mame (Yusuke Endoh), a warning in $VERBOSE mode is definitely better than nothing :)

Thanks @Eregon (Benoit Daloze)! I originally introduced that because one maintainer was getting test failures on her system otherwise, and because I thought we completely "owned" the tmpdir during execution of our own tests, see https://github.com/rubygems/rubygems/pull/3476. However, I tried the repro in that PR and it does no longer fail, even when removing that line, and I learned that at least in MJIT mode the temp folder is used outside of RubyGems. So I agree we should remove that line.

Updated by Eregon (Benoit Daloze) over 2 years ago

I'd suggest having the warning for $VERBOSE=false too (and so no warning for $VERBOSE=nil).
When such an error happens it means something is going wrong, and I'd imagine it's a lot more helpful for anyone using rm_rf and getting a bug report if they also see the file which could not be removed.
It's pretty rare to use -v/$VERBOSE=true, so only in $VERBOSE=true means losing most of the value.

The exception would be even better IMO, maybe we can first warn and then in the next version actually raise?

Updated by mame (Yusuke Endoh) over 2 years ago

Here is a patch. Revert fa65d676ece93a1380b9e6564efa4b4566c7a44b and apply:

diff --git a/lib/fileutils.rb b/lib/fileutils.rb
index 4ba7d18..178db6e 100644
--- a/lib/fileutils.rb
+++ b/lib/fileutils.rb
@@ -1422,6 +1422,7 @@ module FileUtils
       begin
         ent.remove
       rescue
+        warn "failed to remove: %s" % $!.message if $VERBOSE
         raise unless force
       end
     end

I would like to discuss this at the next dev meeting with other committers.


In fact, I think raising an exception is more "correct". I have even tried that change.

However, I am afraid that the advantage of the correctness is not likely to pay its disadvantage.
This change will allow us to find a bug, but none of the bugs found so far are critical.
On the other hand, fixing the bugs correctly was very time consuming (or almost impossible).
So the change will force users to write a patch to swallow exceptions with rescue Exception.
In fact, when I talked this issue to @ko1 (Koichi Sasada), he said he would add a defensive rescue for rm_rf in debug.gem.
Users have to work, code gets messy, and bugs don't get fixed.

Updated by deivid (David Rodríguez) over 2 years ago

@mame (Yusuke Endoh) I agree with your analysis, but I have to say from experience that realworld bugs hidden by this behavior are pretty hard to fix too. The warning in verbose mode sounds great, but perhaps we could also mention this in documentation and add an opt-in strict alternative like I suggested in the original post.

Updated by deivid (David Rodríguez) over 2 years ago

I just removed the line pointed out by @Eregon (Benoit Daloze), so I guess all the workarounds added around the FileUtils.rm_rf in RubyGems teardown method can now be removed too!

Updated by mame (Yusuke Endoh) over 2 years ago

We discussed the issue at the dev meeting (for very long time) and agreed to revert it once.

There are two possible use cases for FileUtils.rm_rf: (1) delete the target at all costs, and (2) want to delete the target if possible.

For (1), it is not sufficient to raise an exception when it fails to delete some files. Instead, you have to make an effort to delete the target by using chmod and/or sleep/retry. This is obviously too incompatible to the original FileUtils.rm_rf. So the users should write their own code to delete the directory, or we should provide another method like FileUtils.super_rm_rf (tentative name :-) for that, if this use case is so common.

For (2), the original behavior that swallows the failure is good enough. We considered that this use case is more common than (1). For example, we often use rm_rf to delete a temporal directory created under tmp/. In this case, it is not a big deal even if rm_rf failed to delete the directory. We'd like to leave the OS to delete the directory eventually (such as on restarting the OS).

It is possible to introduce FileUtils.strict_rm_rf which throws an exception on failure. But it does not meet use cases (1) and (2). We need to clarify the benefit of the method if we introduce it. Also, we need a better name. (strict is not clear.)

We didn't have enough time to discuss and agree on printing a warning in case of failure under VERBOSE mode.

Updated by Dan0042 (Daniel DeLorme) over 2 years ago

Here's a use case for a rm_f that only ignores ENOENT errors. Before processing, you want to delete a cache file (that may or may not exist) so the processing regenerates it. But if the file exists, it must be deleted, otherwise the processing would use the stale cache file instead of regenerate it.

I think it would be useful to have that method in FileUtils (and Pathname) instead of File.delete(path) if File.exist?(path). A simple name like FileUtils.rm! would encourage this as the "correct" method to use while preserving backward compatibility for rm_f.

Actions #26

Updated by hsbt (Hiroshi SHIBATA) over 2 years ago

  • Related to Feature #9260: make FileUtils.rm_rf raise on errors added

Updated by mame (Yusuke Endoh) about 2 years ago

I have reverted fa65d676ece93a1380b9e6564efa4b4566c7a44b: https://github.com/ruby/fileutils/pull/102

Updated by mame (Yusuke Endoh) about 2 years ago

  • Assignee deleted (mame (Yusuke Endoh))
Actions #29

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

  • Tracker changed from Bug to Feature
  • ruby -v deleted (ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21])
  • Backport deleted (2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN)
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0