Bug #11762
closedArray#dig can raise TypeError: no implicit conversion of Symbol/String into Integer
Description
If you try to dig
in an Array using a symbol or string, a TypeError
exception will be raised:
irb> ['zero', 'one', 'two'].dig(:first)
TypeError: no implicit conversion of Symbol into Integer
from (irb):1:in `dig'
from (irb):1
I think it should return nil
in this case. The most typical use case for dig
is to dig through parsed JSON and either find the result we expected or else nil
. Wouldn't it defeat the purpose of dig
if we had to wrap calls to it in a rescue
to handle the case that an Array was present where we expected a Hash?
Can we clarify the desired behavior for this case, then update the documentation and tests to reflect that?
Files
Updated by colindkelley (Colin Kelley) almost 9 years ago
I think it should return nil in this case.
Anyone else have an opinion?
Updated by marcandre (Marc-Andre Lafortune) almost 9 years ago
- Assignee set to matz (Yukihiro Matsumoto)
I feel that either dig
should be safe only against nil
somewhere in the digging path (a bit like &.
), or it's should always safe, even when digging through unexpected objects or indices.
Currently:
{a: 'hello'}.dig(:a, :b) # => nil
{a: []}.dig(:a, :b) # => TypeError
I believe they should either both raise an error, or both return nil
.
I'm unsure as to the best solution. I'd guess, like Colin, that returning nil
is probably the best. Makes debugging harder when writing new code, but makes backward compatibility easier, since old code using dig
wouldn't bomb if the data layout changes in the future.
Matz, what do you think about this?
diff --git a/object.c b/object.c
index ff2db0b..c863fbe 100644
--- a/object.c
+++ b/object.c
@@ -3184,7 +3184,12 @@ rb_obj_dig(int argc, VALUE *argv, VALUE obj, VALUE notfound)
break;
case T_ARRAY:
if (dig_basic_p(obj, &ary)) {
- obj = rb_ary_at(obj, *argv);
+ VALUE index = rb_check_to_int(*argv);
+ if (NIL_P(index)) {
+ obj = Qnil;
+ } else {
+ obj = rb_ary_at(obj, index);
+ }
continue;
}
break;
diff --git a/test/ruby/test_array.rb b/test/ruby/test_array.rb
index 0922cb4..b117a7e 100644
--- a/test/ruby/test_array.rb
+++ b/test/ruby/test_array.rb
@@ -2655,6 +2655,7 @@ def test_dig
h = @cls[@cls[{a: 1}], 0]
assert_equal(1, h.dig(0, 0, :a))
assert_nil(h.dig(1, 0))
+ assert_nil(h.dig(0, :foo))
end
private
Updated by matz (Yukihiro Matsumoto) almost 9 years ago
- Status changed from Open to Rejected
I believe dig
should only ignore nil receiver as its description.
Hiding argument/type error is not a good idea, I think.
Matz.
Updated by marcandre (Marc-Andre Lafortune) almost 9 years ago
- Status changed from Rejected to Open
Reopening, since \the following should raise an error:
{a: 'hello'}.dig(:a, :b) # => nil, should raise an Error
Right?
Updated by colindkelley (Colin Kelley) almost 9 years ago
I'd guess, like Colin, that returning nil is probably the best. Makes debugging harder when writing new code, but makes backward compatibility easier, since old code using dig wouldn't bomb if the data layout changes in the future.
Yes, exactly. If we have to wrap the navigation with rescue
, then we don't really need dig
at all; we always had the option to wrap our collection navigating code with a rescue.
hash = {a: []}
result = hash.dig(:a, :b) rescue nil
would be about the same as what we can do now without dig
:
result = hash[:a][:b] rescue nil
I think we need dig
to navigate into a nested collection whose layout is not certain and return nil
if the value wasn't found. This is essentially extending the Hash#[]
behavior to nested collections that are so common now.
Updated by marcandre (Marc-Andre Lafortune) almost 9 years ago
The current doc also gives examples that should return raise an error, if I understand correctly:
a = [[1, [2, 3]]]
a.dig(0, 0, 0) #=> nil
Since 1.dig(0)
is invalid (and 1 is not nil
), that should raise a NoMethodError, undefined method 'dig' for 1:Fixnum
, right?
Matz, could you confirm this?
Note: I'm updating the documentation a bit, so I've changed this example for now. I'll change it back if need be.
Updated by colindkelley (Colin Kelley) almost 9 years ago
- File 11762.patch 11762.patch added
Here is my suggested documentation on how Hash#dig
should behave as part of a general dig
protocol. The patch includes equivalent changes to the documentation for Array#dig
.
* Extracts a nested value by navigating the given +key+
* (and keys, recursively). This is useful for navigating
* parsed JSON or other nested data. Nested data may be any
* object that implements the +dig+ method.
*
* Calls the +[]+ operator to look up the key so that
* subclasses may provide their own implementation.
*
* Returns the nested value, or nil if a key is not found
* at any level, or if a nested key navigates to an object that
* does not implement +dig+. Does not raise exceptions.
*
* == The dig Protocol
*
* The +dig+ method behaves like this at each level:
*
* def dig(key, *keys)
* value = self[key] rescue nil
* if keys.empty? || value.nil?
* value
* elsif value.respond_to?(:dig)
* value.dig(*keys)
* else
* nil
* end
* end
*
* == Example Usage
*
* Address = Struct.new(:street, :city, :state, :country)
*
* hash = {ceo: {name: "Pat", email: "pat@example.com"},
* managers: [
* {name: "Jose", email: "jose@example.com"},
* {name: "Sally", email: "sally@example.com"}
* {name: "Bob", email: "bob@example.com"}
* ],
* office: Address.new("9 Oak St", "New York", "NY", "USA")
* }
*
* hash.dig(:ceo, :name) #=> "Pat"
* hash.dig(:ceo, 0, :email) #=> nil
* hash.dig(:managers, 1, :name) #=> "Sally"
* hash.dig(:managers, :name) #=> nil
* hash.dig(:office, :city) #=> "New York"
This example is designed to call attention to the common case where you intended to access a hash but an array was there:
* hash.dig(:managers, :name) #=> nil
I feel the dig
method would be much less useful if a surprise as shown above were to raise an exception. (That would force a rescue wrapper. Furthermore, note that the exception would be of limited value since it would be ambiguous as to what level it occurred at.)
Updated by marcandre (Marc-Andre Lafortune) almost 9 years ago
- ruby -v changed from 2.3.0-preview1 to 2.3.0-preview2
Matz, please confirm we should change {a: 'hello'}.dig(:a, :b)
from returning nil
to raising an error, since you said only nil
should be protected against.
Updated by matz (Yukihiro Matsumoto) almost 9 years ago
Hmm,
{a: 'hello'}.dig(:a, :b)
should raise TypeError since a string do not have method dig
.
Matz.
Updated by nobu (Nobuyoshi Nakada) almost 9 years ago
- Status changed from Open to Closed
Applied in changeset r53058.
object.c: raise TypeError
- object.c (rb_obj_dig): raise TypeError if an element does not
have #dig method. [ruby-core:71798] [Bug #11762]
Updated by marcandre (Marc-Andre Lafortune) almost 9 years ago
Yukihiro Matsumoto wrote:
Hmm,
{a: 'hello'}.dig(:a, :b)
should raise TypeError since a string do not have methoddig
.
Thanks Matz
I don't understand why it's would be a TypeError
. I thought it would be a NoMethodError
(or alternatively an ArgumentError
, as reducing the number of arguments could make that error disappear).
My understanding was that a TypeError
occurs iff an argument of an unexpected type occurs, but the error won't go away by changing the type of any argument.
Are you sure that TypeError
is the best error to raise in this case?
Thanks
Updated by colindkelley (Colin Kelley) almost 9 years ago
As described by Matz, it sounds like the implementation would be equivalent to
def dig(key, *keys)
value = self[key] # may raise TypeError
if keys.empty? || value.nil?
value
else
value.respond_to?(:dig) or raise TypeError, "object does not respond to dig"
value.dig(*keys)
end
end
But I agree with Marc-Andre that a NoMethodError
contract would be more natural as it removes the special case:
def dig(key, *keys)
value = self[key] # may raise TypeError
if keys.empty? || value.nil?
value
else
value.dig(*keys)
end
end
That is,
hash.dig(a, b, c...)
would be exactly equivalent to
hash[a]&.[](b)&.[](c)...
However I must say I am disappointed that dig
will raise exceptions. Given that definition it won't be useful for digging in JSON responses whose format has not already been validated. We can already write
hash[a][b][c] rescue nil
So is dig
worth adding to Ruby at all, given the definition that raises exceptions? It doesn't seem so to me.
Updated by matz (Yukihiro Matsumoto) almost 9 years ago
Suppose you have a Hash/Array tree from JSON, with some attributes optional, you can expect three types of results;
(1) a value from valid tree structure
(2) nil from optional attributes
(3) exception from invalid tree
If #dig returns nil instead of exception, as you want, we cannot distinguish case 2 and case 3.
Matz.
Updated by colindkelley (Colin Kelley) almost 9 years ago
If #dig returns nil instead of exception, as you want, we cannot distinguish case 2 and case 3.
I've looked at a lot of JSON parsing code in Ruby and haven't found examples that were looking to draw that distinction. (Those that do would probably use a gem like ruby-json-schema to validate the precise format and get a precise exception.)
What I have found is code that navigates the nested JSON response and ignores the possibility of the exception NoMethodError: undefined method '[]' for nil:NilClass
or TypeError: no implicit conversion of Symbol into Integer
being raised during the navigation. I looked at several examples in my company's codebase and the first two pages of results from the Google search "ruby json parse example" and all of them suffered from the problem.
Here is the first example from Stackoverflow:
parsed = JSON.parse(string) # returns a hash
p parsed["desc"]["someKey"]
p parsed["main_item"]["stats"]["a"]
and one from Temboo:
title = data["items"][0]["snippet"]["title"]
description = data["items"][0]["snippet"]["description"]
and one from richonrails:
h["results"][0]["name"] # => "ABC Widget 2.0"
and one from developer.yahoo:
irb(main):003:0> news['ResultSet']['Result'].each { |result|
irb(main):004:1* print "#{result['Title']} => #{result['Url']}\n"
irb(main):005:1> }
and one from ruby-forum:
adbreaks = parsed['TOD']['AdBreaks']['AdBreak']
lengths_of_breaks = adbreaks.map {|ad| ad['duration'] }
list_of_30s_breaks = adbreaks.select {|ad| ad['duration'] == 30 }
... [and later from the original poster]
I am attempting to do the following:
uri = parsed['TOD']['AdBreaks']['AdBreak'][0]['Ad'][0]['MediaFile']['Impression']['Log'][0]['uri']
but it does not work. I get the following: NoMethodError: undefined method `[]' for nil:NilClass
I couldn't find a single example in my sample that navigates nested JSON without stepping into the undefined method '[]' for nil:NilClass
trap. We dread that exception because it never has enough detail to debug the problem.
That's why I'm so eager to see dig
unify the navigation to a single contract. It could raise an explicit exception. But I think it would be more idiomatic Ruby to return nil
if the value is not found. This makes it natural for the typical calling case where the data is considered optional:
if a = parsed.dig("main_item", "stats", "a")
... # use a
end
or to raise an exception itself:
a = parsed.dig("main_item", "stats", "a") or raise "unexpected format #{parsed.inspect}"
Updated by colindkelley (Colin Kelley) almost 9 years ago
Hi Matz, do you have any reactions to the above?
Updated by avit (Andrew Vit) almost 9 years ago
Yukihiro Matsumoto wrote:
If #dig returns nil instead of exception, as you want, we cannot distinguish case 2 and case 3.
In that case, how about dig
vs. dig!
?
Updated by matz (Yukihiro Matsumoto) almost 9 years ago
Andrew,
I don't think dig!
is a good name, because !
usually denotes dangerous version of a method in Ruby naming convention.
Colin,
Thank you for the investigation. Your survey means most code does not consider exceptional cases. So dig
should be, I think.
If you really want nil
from corrupted tree, just add rescue nil
after dig
call. It's much better, I think, because we can't distinguish optional value and corrupted tree once we give nil
from dig
for both cases.
Matz.