Misc #21035
closedClarify or redefine Module#autoload? and Module#const_defined?
Description
The documentation for Module#autoload?
says:
Returns filename to be loaded if name is registered as autoload in the namespace of mod or one of its ancestors.
As a user, if I declare an autoload, I expect this API:
module M
autoload :Foo, 'foo'
constants # => [:Foo]
const_defined?(:Foo) # => true
autoload?(:Foo) # => 'foo'
end
That it is indeed how it generally works. Even if the autoload path does not exist. But there is an edge case.
While constants
does include always :Foo
as far as I can tell, the return value of const_defined?
and autoload?
depends on the stack of features being loaded: The autoload path is resolved and if seen to be in the stack of features being loaded, the predicates return false
and nil
, respectively. Do you think that is intuitive?
I find that logic totally unexpected. I just defined an autoload, therefore, I think it would be natural for autoload?
to return what I just configured. Why should const_defined?
return nothing but true
? And why is it not consistent with constants
?
To me, it would make more sense that in the previous example const_defined?
returns true
, and autoload?
returns foo
unconditionally (and instantly, nowadays it takes a relative long time due to the lookup).
Now, if the autoload is triggered in a lookup then I would expect Kernel#require
logic to apply. But not when calling some simple predicates.
Please, note that the present behavior is not documented, so on paper the change would not be backwards incompatible.
If, on the other side, it is preferred to keep the behavior as it is, I guess it should be documented with precision (accounting for symlinks, relative paths in $LOAD_PATH
, etc.)
Updated by mame (Yusuke Endoh) 11 days ago
@fxn Please write expected behavior and actual result clearly. I understood your issue as the following, is it correct?
# main.rb
module M
autoload :Foo, './foo'
constants # => [:Foo] (as expected)
const_defined?(:Foo) # => true (as expected)
autoload?(:Foo) # => 'foo' (as expected)
Foo # fires the autoload
end
# foo.rb
module M
p constants # => [:Foo] (as expected)
p const_defined?(:Foo) # => expected: true, actual: false
p autoload?(:Foo) # => expected: 'foo', actual: nil
class Foo
end
end
Updated by fxn (Xavier Noria) 11 days ago
· Edited
Not exactly. Let's consider your first example:
# main.rb
module M
autoload :Foo, './foo'
constants # => [:Foo] (as expected)
const_defined?(:Foo) # => true (as expected), (1)
autoload?(:Foo) # => 'foo' (as expected), (2)
Foo # fires the autoload
end
Point of my ticket is that in that example (1) can be false
, and (2) can be nil
.
How is that possible? It is because within those predicates Ruby seems to check if the target file is a feature being loaded at the moment. If it is, you get false
and nil
. If it isn't, you get the expected values (even if the target file does not exist).
So my question would be: Is this deliberate and a wanted logic? In that case, I think it should be documented (could volunteer). Otherwise, it should be fixed.
Is that more clear?
Updated by fxn (Xavier Noria) 11 days ago
· Edited
Your example kind of shows this by firing an autoload. But that is not the general case, even if the autoload is not fired you get that extra logic. Let me give you a generic example that does not fire an autoload:
# main.rb
require 'foo'
# foo.rb
require 'm'
# m.rb
module M
autoload :Foo, 'foo'
p constants
p const_defined?(:Foo)
p autoload?(:Foo)
end
This is the execution:
% ruby -I. main.rb
[:Foo]
false
nil
Updated by fxn (Xavier Noria) 11 days ago
· Edited
BTW, my personal take is that the behavior should change, if you present that code above to anyone, expectation would be to return true
and foo
always, since the autoload was just defined and nothing happened (and constants
lists the constant, this is at least inconsistent).
But perhaps there is a reason for this, so I am presenting this as a question.
Updated by Eregon (Benoit Daloze) 11 days ago
Right that last example is weird, it's like it considers the autoload :Foo
is ongoing even though Foo
was never accessed.
I think it does consider the autoload to be ongoing because there is a feature foo.rb
being loaded.
In general this is the very messy interactions when doing a require "xxx"
which matches an autoload "xxx"
(see #15663).
I would love if we can simplify those somehow and/or document the behavior well.
@fxn Maybe you would be interested to start a document covering the behavior of autoload in details? (I think that could leave under https://github.com/ruby/ruby/tree/master/doc, or maybe a comment in load.c
).
It would be good to compare it to the actual implementation to make sure it's accurate.
BTW, const_defined?(:Foo)
is similar to defined?(Foo)
and I guess when performing an autoload it makes some sense for those to return false/nil, because the autoloaded file should define those constants (and defining those constants should e.g. not warn that the constant is already defined or something like that).
But I'm not sure it needs to be that complicated, maybe they should always return true/"constant" even during the autoload, that would be a nice simplification.
Please, note that the present behavior is not documented, so on paper the change would not be backwards incompatible.
If, on the other side, it is preferred to keep the behavior as it is, I guess it should be documented with precision (accounting for symlinks, relative paths in $LOAD_PATH, etc.)
In general I never trust Ruby methods documentation, especially about edge cases like that, the documentation is usually pretty short and only covers the happy cases.
Sometimes it's completely wrong, and I gave up on trying to fix those documentation errors because there are too many.
I think part of the reason of that is the documentation for many of the methods was added long after the method was implemented, by people who didn't write those methods or have time to look at their implementation in details.
Updated by Eregon (Benoit Daloze) 11 days ago
- Related to Feature #15663: Documenting autoload semantics added
Updated by fxn (Xavier Noria) 11 days ago
· Edited
BTW, const_defined?(:Foo) is similar to defined?(Foo) and I guess when performing an autoload it makes some sense for those to return false/nil, because the autoloaded file should define those constants (and defining those constants should e.g. not warn that the constant is already defined or something like that).
We don't see it the same way here.
For me, I defined metadata and I am asking for the metadata I just set. Did I define an autoload? Yes I did, if triggered it may do nothing, or it may fail because the file does not even exist. But we'll cross that bridge when we get to it, you did define an autoload, yes.
The logic of an actual autoload may check for features, but the predicates?
Updated by Eregon (Benoit Daloze) 11 days ago
Yeah, I don't disagree (see the sentence I wrote below that one), I was just guessing why it's the way it is.
Updated by Eregon (Benoit Daloze) 11 days ago
I recalled and searched for a couple places that rely on defined?(Foo)
to be nil
while autoloading Foo
.
This is actually if defined?(Bundler::Deprecate) && !autoload?(:Deprecate)
so I think it's safe as long as this change changes both defined?
and autoload?
to be unaffected by ongoing autoloads.
I did a gem-codesearch for autoload?
: https://gist.github.com/eregon/f19ab4347c521f860dc198446113ee01
Not easy to know if they would be affected by this change without looking deeper (which I don't have time for).
I think other places do something like:
module M
unless defined?(Foo)
Foo = ...
end
end
And this would not be safe with this change, because then defined?(Foo)
would be true while autoloading that file, and M::Foo would never be assigned, leading to NameError in the caller accessing M::Foo
.
This pattern seems hard to search with gem-codesearch, gem-codesearch 'unless defined\?[( ][A-Z]'
gives too many results.
One example, if M::VERSION
is an autoload, module M; VERSION = "1.2.3" unless defined? VERSION; end
and that autoload is used, then M::VERSION
would never be set (it's questionable why there is even that defined?
check, but that pattern is used a lot, maybe to workaround when some files are loaded with Kernel#load
and not require
).
Another example, if YAML is an autoload, require 'yaml' unless defined?(YAML)
should be fine, it would just trigger the actual loading later.
I also found https://github.com/oracle/truffleruby/blob/f1c77805d8d25d65c5dba811f1e9b3c58f4667ca/lib/patches/rubygems/specification.rb
and I think this actually relies on the current behavior, i.e., if that is being required while we are autoloading ::Gem
, then we need the else
branch.
IOW it can be useful or even necessary to find out if we are currently autoloading some constant.
That specific usage could be worked around though since it is specific to TruffleRuby and so could use some internals if needed.
It might be interesting to just try the change in a PR and see if it would break anything in ruby/ruby's CI.
Updated by mame (Yusuke Endoh) 10 days ago
This ticket was discussed briefly at the dev meeting.
fxn (Xavier Noria) wrote in #note-3:
Let me give you a generic example that does not fire an autoload:
# main.rb require 'foo' # foo.rb require 'm' # m.rb module M autoload :Foo, 'foo' p constants p const_defined?(:Foo) p autoload?(:Foo) end
This case was in a circular require pattern, which is unrecommended, so that it was not considered worth discussing anything based on this case.
Instead of arguing whether the current behavior is "intuitive" or not, it would be more constructive to explain shortly what you are trying to do and how the current behavior is troubling for that purpose, by using a short example code.
Updated by fxn (Xavier Noria) 10 days ago
· Edited
This case was in a circular require pattern, which is unrecommended, so that it was not considered worth discussing anything based on this case.
I don't think there is a circular require, because m.rb
is not doing a require of foo.rb
. Note that no autoload is being triggered.
I would understand that if the autoload was later triggered after m.rb
is loaded, then Kernel#require
does nothing due to idempotency, and a NameError
happens. But that is independent of the logic of the predicates about the autoload metadata.
I would also understand that Module#autoload
or constant lookup has special logic to handle circularity.
But the predicates basically ask for metadata about the autoload that was just declared.
Regarding use cases:
-
In some logic while scanning the file system, Zeitwerk needs to know if a given constant is already set, either exists for real or has an autoload, because in that case it should decline managing the corresponding file, and move on with the scanning. Due to this logic, I need to manually test and register those edge cases because the API is insufficient.
-
Module#constants
is inconsistent with the predicates.
Updated by mame (Yusuke Endoh) 10 days ago
I'm not going to argue with the definition of the word "circular require", but I don't think that code is valid anyway. It does not use autoload correctly. The circular require/autoload should be resolved.
In some logic while scanning the file system, Zeitwerk needs to know if a given constant is already set, either exists for real or has an autoload, because in that case it should decline managing the corresponding file, and move on with the scanning. Due to this logic, I need to manually test and register those edge cases because the API is insufficient.
This one is difficult to understand. It would be nice if you could write a demo with short valid code.
Module#constants
is inconsistent with the predicates.
I understand this. This issue is observable in the code I wrote in #note-1. I think that, in the context of requiring foo.rb, Module#constants
should not contain :Foo
until it is actually defined.
Updated by fxn (Xavier Noria) 9 days ago
· Edited
Introduction¶
OK, thanks for taking the time to think about this @mame (Yusuke Endoh).
I can explain the use case in Zeitwerk, but I think I understand what you have in mind. Problem is, it is difficult for me to see it because documentation does not cover this.
Zeitwerk usage of autoload?¶
Zeitwerk basically scans the file system to set autoloads dynamically. If it finds a top-level file foo.rb
, it defines
autoload :Foo, '/full/path/to/foo.rb'
That is dynamic, no source code involved. And AFAIK I have no way to know if a given file is in the stack of features being loaded. Then, some logic needs to know if Foo
has an autoload, because different, unrelated projects managed by different loaders could have a foo.rb
. And only the first one wins.
But I cannot figure that out just with the API. Because of the way this works today, after a dynamic autoload
call I need to invoke autoload?
and register the constant path in a special collection if the return value is not the path I just set. Then, my own "autoload?" logic is "autoload? || registered?", conceptually.
Two ways to look at Module#autoload¶
But once you said Module#constant
should agree with the predicates, I believe I see our different point of views.
The way I think about autoload is, very briefly:
-
Module#autoload
defines a file to be required when the constant is looked up in the receiver. -
Module#autoload?
says whether there is an autoload for the argument. - On constant lookup, an autoload is discarded if the target file is present in the stack of features being loaded.
Could it be that you think about this in this manner?
-
Module#autoload
defines a file to be required when the constant is looked up in the receiver, unless the target file is in the stack of loaded features, in which case it is discarded. -
Module#autoload?
says whether there is an autoload for the argument, unless the argument is now in the stack of features being loaded, in which case it returnsnil
and discards the autoload. - On constant lookup, an autoload is discarded if the target file is now present in the stack of features being loaded.
Emphasis in now, because maybe the file was not in the stack when the autoload was set, but it is later when the predicate is checked, or when the constant lookup hits the autoload.
And constants
and const_defined?
would document what is consistent with each option (in my version, their output is trivial, it is metadata, in the second version, all of them run a validation.)
Source of discrepancy (I think)¶
The documentation of Module#autoload
does not mention any discards, so from the point of view of the user, you set an autoload, and it is there. Discard happens on constant lookup. That is why I find my point of view more intuitive, or in agreement with the current documentation.
Resolution¶
Now, if the Ruby team prefers the second version and we fix Module#constants
, then I'd be OK with the resolution (would not agree, but it is your call :).
But then, as I said in the ticket description, I think all this semantics should be precisely documented.
Additionally, if that is the resolution, it would be nice if Module#autoload
returned a flag indicating if the autoload was discarded. That would help Zeitwerk's use case too.
Updated by Eregon (Benoit Daloze) 9 days ago
· Edited
Another way to see it is that autoload and require are kind of bidirectional at least in the current CRuby behavior:
autoload :Foo, "foo.rb"
- If I
p Foo
, that requiresfoo.rb
, the usual direction - But also if I
require "foo"
before any access toFoo
then that also "starts" the autoload for all autoloads with path "foo.rb".
So in your example in https://bugs.ruby-lang.org/issues/21035#note-3, the mere fact that there was a require "foo"
means any autoload with path "foo.rb" (present or (!) even future on CRuby as we see) are considered active while foo.rb is being required.
On TruffleRuby the autoload is considered active only if the constant is being autoloaded (through triggering an autoload or through require), i.e. it doesn't consider future autoloads.
So that script gives:
$ ruby -v -I. main.rb
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux]
[:Foo]
false
nil
$ ruby -v -I. main.rb
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM Native [x86_64-linux]
[:Foo]
true
"foo"
Could you point to the code in Zeitwerk which has to work around these subtle semantics?
Is that code then incorrect due to the current semantics and there is no way or only a hacky way to address it?
Maybe we could only change autoload?
to not consider ongoing autoloads/requires, I think that should be compatible enough.
Changing defined?
or const_defined?
OTOH seems much more risky for compatibility.
Updated by fxn (Xavier Noria) 9 days ago
· Edited
But also if I require "foo" before any access to Foo then that also "starts" the autoload for all autoloads with path "foo.rb".
I did not know this!
OK, I don't understand the mental model where all these behaviors fit, but I can see it is in any case intricate and deliberate.
My mental model was: I set an autoload, and that is an entry in a table that exists until constant lookup hits it. Then, extra logic (maybe circularity detection) plus Kernel#require
semantics (idempotency) apply. In that mental model, the predicates should be instant and trivial.
But that is not the mental model of the Ruby team.
I'd love to understand that mental model to learn how to think about this better, but in any case, I think the discussion about changing this can be closed.
I think Module#constants
and Module#const_defined?
should match. I believe we agree on that one, right?
Then, we are missing a good deal of documentation. For example,
Returns filename to be loaded if name is registered as autoload in the namespace of mod or one of its ancestors.
That is incomplete, or not clear enough. According to those docs
autoload :Foo, 'foo'
puts autoload?(:Foo)
should print "foo" unconditionally, I just registered the autoload!
I learned this is not the case in 2019 after hitting my head against the wall for some time trying to understand an edge case Zeitwerk needs to support.
Updated by Eregon (Benoit Daloze) 9 days ago
fxn (Xavier Noria) wrote in #note-15:
I think
Module#constants
andModule#const_defined?
should match. I believe we agree on that one, right?
I'm not sure 😅
IMO an autoload which does define the constant correctly should stay in Module#constants
at all times from the autoload :Foo, "path"
, on all threads.
It seems too weird for a constant to disappear from Module#constants
because it's being autoloaded by the current thread, and could cause subtle problems.
IOW, I think Module#constants
is correct as it is.
I do realize that with Module#const_defined?
and defined?
and autoload?
the constant looks like it disappears while it's being autoloaded by the current thread.
(that's the inconsistency you mention in the description)
I think for Module#const_defined?
and defined?
there might be valid reasons for it, notably for that pattern in https://bugs.ruby-lang.org/issues/21035#note-9.
For autoload?
I don't see any reason to make it disappear and so complicated behavior.
I think autoload?
should just query what's in the constant table: if it's an autoload constant and it's not resolved yet it should return the autoload path, whether it's currently loading or not, on all threads.
should print "foo" unconditionally, I just registered the autoload!
Yes, I think we can agree this is surprising behavior.
Maybe we can agree it is undesired behavior, WDYT @mame (Yusuke Endoh)?
It is unfortunately rather brittle when e.g. defining autoload :Foo, "foo"
that if there is any foo.rb
under $LOAD_PATH
that is being required currently it's considered to be the file defining constant Foo
, which it may or may not be.
Although in theory such conflicts should not happen much, because gems/apps are supposed to have different top-level namespace and e.g. module A; autoload :Foo, "a/foo"; end
does not conflict with module B; autoload :Foo, "b/foo"; end
.
@fxn Do you have an issue documenting how the conflict in autoload paths can happen in the real world from that 2019 issue or more recent?
@fxn How do you workaround that strange behavior in Zeitwerk, could you point to some code?
Updated by fxn (Xavier Noria) 9 days ago
· Edited
@Eregon (Benoit Daloze) thanks for your thoughtful response.
Let me reply to a detail first, because it may change the conversation. Later I'll address your other questions/remarks.
I realize the autoload of Foo
in my original example hides a bit my point. My point is simpler, if I registered an autoload, I expect querying for that autoload in the next line to be fast and return the metadata.
So, let me address Module#constants
vs Module#const_defined?
cat <<EOS > /tmp/foo.rb
require '/tmp/bar.rb'
EOS
cat <<EOS > /tmp/bar.rb
module M
autoload :Foo, '/tmp/foo.rb'
p constants
p const_defined?(:Foo)
end
EOS
ruby -r/tmp/foo.rb -e1
rm /tmp/foo.rb /tmp/bar.rb
That shell script prints
[:Foo]
false
and no autoloading is in place. I would expect a constant listed by Module#constants
to be defined, basically.
Updated by fxn (Xavier Noria) 9 days ago
That expectation can be solved in two ways:
- You remove the autoload-circularity validation from
const_defined?
- You add the autoload-circularity validation to
constants
.
Updated by fxn (Xavier Noria) 6 days ago
Hey, I have pending to send examples, but I'd like to suggest that we close this ticket (could followup with examples through a different channel).
The purpose of this ticket was "clarify or redefine". Since I did not know that issuing a require
may unregister autoloads, the behavior of autoload?
and const_defined?
without triggering an autoload was confusing to me. Now, the mental model is clear: autoloads can be unregistered/ignored, and autoload?
and const_defined?
act accordingly. Now it clicks!
Still, we have Module#constants
and we don't have documentation covering all this. But the approach to propose that has to be worded in a different way and I believe it would more clear to start a new ticket (I'd do it myself).
Please, feel free to close!
Updated by Eregon (Benoit Daloze) 6 days ago
- Status changed from Open to Closed
Updated by Eregon (Benoit Daloze) 6 days ago
FWIW I do think it would make sense to changed autoload?
to be unaffected by ongoing autoloads/requires as mentioned in https://bugs.ruby-lang.org/issues/21035#note-16 (i.e. an autoload is still an autoload until it's fully resolved)
But yeah, it's likely better to be its own ticket.
Updated by fxn (Xavier Noria) 6 days ago
· Edited
@Eregon (Benoit Daloze) but, if we introduce the concept of autoloads being unregistered or ignored, isn't the current autoload?
consistent with that concept? autoload?
returns a path if there is a registered autoload for the argument?
Updated by Eregon (Benoit Daloze) 6 days ago
· Edited
My POV is there is no autoloads being unregistered or ignored (the autoload exists until it's resolved, and importantly other threads must never see any difference for C between "before autoloading C" and "during autoloading C").
Rather it's just that defined?
and const_defined?
pretend the constant doesn't exist for the autoloading thread because lots of code can run in an autoload, and the code there should see there is no such constant yet since code during that autoload is supposed to define that constant.
If the code there would refer to the autoload constant it would be a sort of cycle and in fact IIRC it will throw a NameError in that case, which is consistent with "the constant looks not defined for the autoloading thread, until it defines that constant" whether asking though defined?
or const_defined?
or accessing the constant.
For example that allows C ||= some_value
to work inside file
for autoload :C, file
.
IOW, from my POV the special behavior of defined?
and const_defined?
(and currently for autoload?
too but I don't think it should be) is only there to "break this cycle" of the autoloaded constant being accessed during its autoload.
Updated by fxn (Xavier Noria) 6 days ago
· Edited
@Eregon (Benoit Daloze) Understood.
So then we are back to "clarify or redefine" perhaps?
If the current behavior of autoload?
and const_defined?
is expected in the case of ongoing autoloads, or autoloads that are not ongoing but whose argument is a feature being loaded, I'd like to understand how it makes sense and document it.
If it does not make sense for Ruby core once evaluated, then (same), which is the mental model and documente it, regardless of whether it gets redefined.
And Module#constants
would be part of the discussion.
Updated by Eregon (Benoit Daloze) 6 days ago
Right, I think that would be a good question to ask for the next dev meeting, can you add an item there?
I posted my guess above but I'm not sure that's the reason and I don't know the initial rationale for making these predicates behave differently during an ongoing autoload (for the constant being autoloaded).
Regarding require "a_feature_associated_with_an_autoload"
basically behaving the same as triggering the autoload via accessing the constant I think that's something which is ready to be documented because the intention seems clear.
Updated by fxn (Xavier Noria) about 10 hours ago
I have tried to formulate a more narrowed question in https://bugs.ruby-lang.org/issues/21154.
@Eregon (Benoit Daloze) How to add that to the agenda of the next dev meeting?
Updated by byroot (Jean Boussier) about 10 hours ago
Updated by fxn (Xavier Noria) about 10 hours ago
Thanks @byroot (Jean Boussier)! I see there is only one comment with a series of tickets, but I cannot edit that comment. Should I write a second comment?
Updated by byroot (Jean Boussier) about 10 hours ago
Yes, you are not meant to edit others comments, it's just that each comment can contain one or more tickets.