Project

General

Profile

Actions

Bug #11762

closed

Array#dig can raise TypeError: no implicit conversion of Symbol/String into Integer

Added by colindkelley (Colin Kelley) over 8 years ago. Updated about 8 years ago.

Status:
Closed
Target version:
-
ruby -v:
2.3.0-preview2
[ruby-core:71798]

Description

If you try to dig in an Array using a symbol or string, a TypeError exception will be raised:

irb> ['zero', 'one', 'two'].dig(:first)
TypeError: no implicit conversion of Symbol into Integer
from (irb):1:in `dig'
from (irb):1

I think it should return nil in this case. The most typical use case for dig is to dig through parsed JSON and either find the result we expected or else nil. Wouldn't it defeat the purpose of dig if we had to wrap calls to it in a rescue to handle the case that an Array was present where we expected a Hash?

Can we clarify the desired behavior for this case, then update the documentation and tests to reflect that?


Files

11762.patch (3.19 KB) 11762.patch colindkelley (Colin Kelley), 12/08/2015 04:02 AM

Updated by colindkelley (Colin Kelley) over 8 years ago

I think it should return nil in this case.

Anyone else have an opinion?

Updated by marcandre (Marc-Andre Lafortune) over 8 years ago

  • Assignee set to matz (Yukihiro Matsumoto)

I feel that either dig should be safe only against nil somewhere in the digging path (a bit like &.), or it's should always safe, even when digging through unexpected objects or indices.

Currently:

{a: 'hello'}.dig(:a, :b) # => nil
{a: []}.dig(:a, :b) # => TypeError

I believe they should either both raise an error, or both return nil.

I'm unsure as to the best solution. I'd guess, like Colin, that returning nil is probably the best. Makes debugging harder when writing new code, but makes backward compatibility easier, since old code using dig wouldn't bomb if the data layout changes in the future.

Matz, what do you think about this?

diff --git a/object.c b/object.c
index ff2db0b..c863fbe 100644
--- a/object.c
+++ b/object.c
@@ -3184,7 +3184,12 @@ rb_obj_dig(int argc, VALUE *argv, VALUE obj, VALUE notfound)
                break;
              case T_ARRAY:
                if (dig_basic_p(obj, &ary)) {
-                   obj = rb_ary_at(obj, *argv);
+                   VALUE index = rb_check_to_int(*argv);
+                   if (NIL_P(index)) {
+                       obj = Qnil;
+                   } else {
+                       obj = rb_ary_at(obj, index);
+                   }
                    continue;
                }
                break;
diff --git a/test/ruby/test_array.rb b/test/ruby/test_array.rb
index 0922cb4..b117a7e 100644
--- a/test/ruby/test_array.rb
+++ b/test/ruby/test_array.rb
@@ -2655,6 +2655,7 @@ def test_dig
     h = @cls[@cls[{a: 1}], 0]
     assert_equal(1, h.dig(0, 0, :a))
     assert_nil(h.dig(1, 0))
+    assert_nil(h.dig(0, :foo))
   end
 
   private

Updated by matz (Yukihiro Matsumoto) over 8 years ago

  • Status changed from Open to Rejected

I believe dig should only ignore nil receiver as its description.
Hiding argument/type error is not a good idea, I think.

Matz.

Updated by marcandre (Marc-Andre Lafortune) over 8 years ago

  • Status changed from Rejected to Open

Reopening, since \the following should raise an error:

{a: 'hello'}.dig(:a, :b) # => nil, should raise an Error

Right?

Updated by colindkelley (Colin Kelley) over 8 years ago

I'd guess, like Colin, that returning nil is probably the best. Makes debugging harder when writing new code, but makes backward compatibility easier, since old code using dig wouldn't bomb if the data layout changes in the future.

Yes, exactly. If we have to wrap the navigation with rescue, then we don't really need dig at all; we always had the option to wrap our collection navigating code with a rescue.

hash = {a: []}
result = hash.dig(:a, :b) rescue nil

would be about the same as what we can do now without dig:

result = hash[:a][:b] rescue nil

I think we need dig to navigate into a nested collection whose layout is not certain and return nil if the value wasn't found. This is essentially extending the Hash#[] behavior to nested collections that are so common now.

Updated by marcandre (Marc-Andre Lafortune) over 8 years ago

The current doc also gives examples that should return raise an error, if I understand correctly:

a = [[1, [2, 3]]]
a.dig(0, 0, 0)                    #=> nil 

Since 1.dig(0) is invalid (and 1 is not nil), that should raise a NoMethodError, undefined method 'dig' for 1:Fixnum, right?

Matz, could you confirm this?

Note: I'm updating the documentation a bit, so I've changed this example for now. I'll change it back if need be.

Updated by colindkelley (Colin Kelley) over 8 years ago

Here is my suggested documentation on how Hash#dig should behave as part of a general dig protocol. The patch includes equivalent changes to the documentation for Array#dig.

 * Extracts a nested value by navigating the given +key+
 * (and keys, recursively).  This is useful for navigating
 * parsed JSON or other nested data.  Nested data may be any
 * object that implements the +dig+ method.
 *
 * Calls the +[]+ operator to look up the key so that
 * subclasses may provide their own implementation.
 *
 * Returns the nested value, or nil if a key is not found
 * at any level, or if a nested key navigates to an object that
 * does not implement +dig+.  Does not raise exceptions.
 *
 *  == The dig Protocol
 *
 * The +dig+ method behaves like this at each level:
 *
 *    def dig(key, *keys)
 *      value = self[key] rescue nil
 *      if keys.empty? || value.nil?
 *        value
 *      elsif value.respond_to?(:dig)
 *        value.dig(*keys)
 *      else
 *        nil
 *      end
 *    end
 *
 *  == Example Usage
 *
 *    Address = Struct.new(:street, :city, :state, :country)
 *
 *    hash = {ceo: {name: "Pat", email: "pat@example.com"},
 *            managers: [
 *                {name: "Jose", email: "jose@example.com"},
 *                {name: "Sally", email: "sally@example.com"}
 *                {name: "Bob", email: "bob@example.com"}
 *            ],
 *            office: Address.new("9 Oak St", "New York", "NY", "USA")
 *           }
 *
 *    hash.dig(:ceo, :name)           #=> "Pat"
 *    hash.dig(:ceo, 0, :email)       #=> nil
 *    hash.dig(:managers, 1, :name)   #=> "Sally"
 *    hash.dig(:managers, :name)      #=> nil
 *    hash.dig(:office, :city)        #=> "New York"

This example is designed to call attention to the common case where you intended to access a hash but an array was there:

 *    hash.dig(:managers, :name)      #=> nil

I feel the dig method would be much less useful if a surprise as shown above were to raise an exception. (That would force a rescue wrapper. Furthermore, note that the exception would be of limited value since it would be ambiguous as to what level it occurred at.)

Updated by marcandre (Marc-Andre Lafortune) over 8 years ago

  • ruby -v changed from 2.3.0-preview1 to 2.3.0-preview2

Matz, please confirm we should change {a: 'hello'}.dig(:a, :b) from returning nil to raising an error, since you said only nil should be protected against.

Updated by matz (Yukihiro Matsumoto) over 8 years ago

Hmm,

{a: 'hello'}.dig(:a, :b) should raise TypeError since a string do not have method dig.

Matz.

Actions #10

Updated by nobu (Nobuyoshi Nakada) over 8 years ago

  • Status changed from Open to Closed

Applied in changeset r53058.


object.c: raise TypeError

Updated by marcandre (Marc-Andre Lafortune) over 8 years ago

Yukihiro Matsumoto wrote:

Hmm,

{a: 'hello'}.dig(:a, :b) should raise TypeError since a string do not have method dig.

Thanks Matz

I don't understand why it's would be a TypeError. I thought it would be a NoMethodError (or alternatively an ArgumentError, as reducing the number of arguments could make that error disappear).

My understanding was that a TypeError occurs iff an argument of an unexpected type occurs, but the error won't go away by changing the type of any argument.

Are you sure that TypeError is the best error to raise in this case?

Thanks

Updated by colindkelley (Colin Kelley) over 8 years ago

As described by Matz, it sounds like the implementation would be equivalent to

def dig(key, *keys)
  value = self[key]        # may raise TypeError
  if keys.empty? || value.nil?
    value
  else
    value.respond_to?(:dig) or raise TypeError, "object does not respond to dig"
    value.dig(*keys)
  end
end

But I agree with Marc-Andre that a NoMethodError contract would be more natural as it removes the special case:

def dig(key, *keys)
  value = self[key]        # may raise TypeError
  if keys.empty? || value.nil?
    value
  else
    value.dig(*keys)
  end
end

That is,

hash.dig(a, b, c...)

would be exactly equivalent to

hash[a]&.[](b)&.[](c)...

However I must say I am disappointed that dig will raise exceptions. Given that definition it won't be useful for digging in JSON responses whose format has not already been validated. We can already write

hash[a][b][c] rescue nil

So is dig worth adding to Ruby at all, given the definition that raises exceptions? It doesn't seem so to me.

Updated by matz (Yukihiro Matsumoto) over 8 years ago

Suppose you have a Hash/Array tree from JSON, with some attributes optional, you can expect three types of results;

(1) a value from valid tree structure
(2) nil from optional attributes
(3) exception from invalid tree

If #dig returns nil instead of exception, as you want, we cannot distinguish case 2 and case 3.

Matz.

Updated by colindkelley (Colin Kelley) over 8 years ago

If #dig returns nil instead of exception, as you want, we cannot distinguish case 2 and case 3.

I've looked at a lot of JSON parsing code in Ruby and haven't found examples that were looking to draw that distinction. (Those that do would probably use a gem like ruby-json-schema to validate the precise format and get a precise exception.)

What I have found is code that navigates the nested JSON response and ignores the possibility of the exception NoMethodError: undefined method '[]' for nil:NilClass or TypeError: no implicit conversion of Symbol into Integer being raised during the navigation. I looked at several examples in my company's codebase and the first two pages of results from the Google search "ruby json parse example" and all of them suffered from the problem.

Here is the first example from Stackoverflow:

parsed = JSON.parse(string) # returns a hash

p parsed["desc"]["someKey"]
p parsed["main_item"]["stats"]["a"]

and one from Temboo:

title = data["items"][0]["snippet"]["title"]
description = data["items"][0]["snippet"]["description"]

and one from richonrails:

h["results"][0]["name"] # => "ABC Widget 2.0"

and one from developer.yahoo:

irb(main):003:0> news['ResultSet']['Result'].each { |result|
irb(main):004:1* print "#{result['Title']} => #{result['Url']}\n"
irb(main):005:1> }

and one from ruby-forum:

  adbreaks = parsed['TOD']['AdBreaks']['AdBreak']
  lengths_of_breaks = adbreaks.map {|ad| ad['duration'] }
  list_of_30s_breaks = adbreaks.select {|ad| ad['duration'] == 30 }

... [and later from the original poster]

I am attempting to do the following:
uri = parsed['TOD']['AdBreaks']['AdBreak'][0]['Ad'][0]['MediaFile']['Impression']['Log'][0]['uri']

but it does not work. I get the following:  NoMethodError: undefined method `[]' for nil:NilClass

I couldn't find a single example in my sample that navigates nested JSON without stepping into the undefined method '[]' for nil:NilClass trap. We dread that exception because it never has enough detail to debug the problem.

That's why I'm so eager to see dig unify the navigation to a single contract. It could raise an explicit exception. But I think it would be more idiomatic Ruby to return nil if the value is not found. This makes it natural for the typical calling case where the data is considered optional:

if a = parsed.dig("main_item", "stats", "a")
  ... # use a
end

or to raise an exception itself:

a = parsed.dig("main_item", "stats", "a") or raise "unexpected format #{parsed.inspect}"

Updated by colindkelley (Colin Kelley) over 8 years ago

Hi Matz, do you have any reactions to the above?

Updated by avit (Andrew Vit) about 8 years ago

Yukihiro Matsumoto wrote:

If #dig returns nil instead of exception, as you want, we cannot distinguish case 2 and case 3.

In that case, how about dig vs. dig! ?

Updated by matz (Yukihiro Matsumoto) about 8 years ago

Andrew,

I don't think dig! is a good name, because ! usually denotes dangerous version of a method in Ruby naming convention.

Colin,

Thank you for the investigation. Your survey means most code does not consider exceptional cases. So dig should be, I think.
If you really want nil from corrupted tree, just add rescue nil after dig call. It's much better, I think, because we can't distinguish optional value and corrupted tree once we give nil from dig for both cases.

Matz.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0