Feature #7854

New method Symbol[string]

Added by Nathan Zook about 1 year ago. Updated about 1 month ago.

[ruby-core:52267]
Status:Closed
Priority:Normal
Assignee:Yui NARUSE
Category:core
Target version:next minor

Description

I propose a new class method [] on Symbol. If a symbol s already exists such that s.to_s == string, then s is returned. If not, nil is returned.

The inspiration for this method is a question I was asked, and the answer I was given: "Why would you want to turn a tainted string into a symbol?" "I don't--I want to access an existing symbol with tainted data". Symbol[] accesses the symbol table like hash[] accesses the elements of a hash.

I believe that this completely addresses the problems behind tickets #7791 and #7839. I believe that it is a more intuitive solution than my proposal #7795, and I believe that this will also be useful for YAML.safe_load and similar initiatives.

symbol_lookup.patch Magnifier (1.83 KB) Matthew Kerwin, 03/07/2013 12:50 PM

symbol_lookup2.patch Magnifier (1.36 KB) Matthew Kerwin, 03/08/2013 08:48 AM

symbol_lookup3.patch Magnifier (1.07 KB) Matthew Kerwin, 03/09/2013 05:54 PM

symbol_lookup3_warn.patch Magnifier (1.16 KB) Matthew Kerwin, 03/09/2013 05:54 PM


Related issues

Related to ruby-trunk - Feature #7839: Symbol.freeze_symbols Rejected 02/13/2013

Associated revisions

Revision 45175
Added by Yui NARUSE about 1 month ago

  • string.c (sym_find): Add Symbol.find(str), which returns whether given string is defined as symbol or not. [Feature #7854]

History

#1 Updated by Matthew Kerwin about 1 year ago

Note that this is closely related to #7795 (Symbol.defined? and/or toexistingsymbol)

In existing code, Symbol.[] could be implemented as:

class Symbol
  def self.[](string)
    all_symbols.find{|sym| sym.to_s == string}
  end
end

#2 Updated by Nathan Zook about 1 year ago

It could, but it would be extraordinarily slow, as all_symbols returns an array which is quite large in many applications.

#3 Updated by Eric Hodel about 1 year ago

To make this proposal useful all existing libraries must be updated to use the method to create symbols.

Other proposals such as #7839 or #7791 allow rubyists to avoid a symbol creation DoS without forcing them to ask for new releases of a library.

#4 Updated by Nathan Zook about 1 year ago

=begin
Ticket #7839 requires the manipulation of global state. I'm not sure why I have to explain that this is a REALLY bad idea.

Ticket #7791 has two possible implementations. One is to GC symbols globally. This would require treating not just symbols like objects, but methods (whose names are in fact symbols) as well. I do not believe that methods are even currently part of the object system.

Another implementation would be to divide symbols into two kinds depending on how they are created. The theory being that symbols used for method names would be immune to GC. The first problem with this is that there is no reason to believe that method declarations are the first place that a particular symbol would be declared. The second is that dynamic method creation is an important part of ruby. If the goal is to protect against memory leaks in this fashion, it is not at all certain that the leak does not extend into the realm of method creation.

In other words, both of these implementations involve complex changes to the guts of Ruby, and lead to the likelihood of a significant behavioural fork with other rubys. (Not to mention the relatively high risk of bug introduction.) Since this is a security feature, I think that it is important to lead the way in a direction that is easy to import to other rubies (and also to backport as a security patch!) I expect Symbol[] to have a very straightforward implementation that is well-isolated from the rest of Ruby, with the possible exception of YAML.load, which might well benefit from such a feature.

As for requiring the libraries to all be updated to make use of this feature--I consider that to be a good thing. #7839 creates a change in MRI's behaviour that WILL break apparently "safe" use of existing libraries. #7791 necessarily dramatically affects Symbol's runtime performance, and thus means that any highly-tuned ruby is going to have issues--assuming that no bugs occur, and that the other rubys pick it up.

Furthermore, for most, perhaps even all, libraries, (({grep -R to_sym lib})) is going to tell you what you need to examine to make use of this feature. Certainly, it would be nice to avoid having to do such things, but because of the recent exploits, the more security-minded portion of the community (such as myself) is ALREADY nervously poking around in their libraries.

This feature gives the community a clean way to patch questionable code, which is itself relatively easy to identify in manner that makes it easy for other rubies to quickly follow. I do not believe that the other proposals do.
=end

#5 Updated by Koichi Sasada about 1 year ago

  • Assignee set to Yukihiro Matsumoto

#6 Updated by Matthew Kerwin about 1 year ago

=begin
I've attached a patch that defines ((%Symbol[str]%)). If ((|str|)) is a string and there exists a symbol such that (({symbol.to_s == str})), it returns that symbol. Otherwise it returns ((|nil|)). Raises a TypeError if ((|str|)) is not a string.

I also made a unit test, currently available as a gist: https://gist.github.com/phluid61/5105458
=end

#7 Updated by Nathan Zook about 1 year ago

:)

#8 Updated by Nobuyoshi Nakada about 1 year ago

To obtain existing symbol, rbcheckid() is already available, so you don't have to add new extern function.

#9 Updated by Matthew Kerwin about 1 year ago

nobu (Nobuyoshi Nakada) wrote:

To obtain existing symbol, rbcheckid() is already available, so you don't have to add new extern function.

Thank you for the feedback. With that in mind, I've made a less invasive version which only modifies string.c

Please let me know if my enthusiasm gets annoying. :)

#10 Updated by Nobuyoshi Nakada about 1 year ago

Why does it have -1 arity?

And I don't think it's harmful if the method allows a Smbol too.

#11 Updated by Matthew Kerwin about 1 year ago

nobu (Nobuyoshi Nakada) wrote:

Why does it have -1 arity?

And I don't think it's harmful if the method allows a Smbol too.

To the first: an oversight on my part, there's no real reason. I have rewritten it with an arity of 1.

To the second: I can easily change it to allow a Symbol as well. However since the original discussion that spawned this proposal was focused on the idea of not creating unwanted/unneeded Symbols, I wonder should it emit a warning in that case?

I see now, too, that I was rather overzealous in my original attempts. I should have realised most of the hard work has already been done. :)

Now I suppose it's up to Matz to approve it or not.

#12 Updated by Matthew Kerwin 10 months ago

=begin
In the intervening months I've created a gem ((URL:https://rubygems.org/gems/symbol_lookup)) that implements ((%Symbol.[]%)), as well as two methods inspired by #7795 :
* String#interned => gets an existing symbol, returning the symbol or nil
* String#toexistingsym => gets an existing symbol, raising an argument error if it doesn't exist

The problem is that they are written as C extensions, and I'm not familiar enough with non-MRI implementations to port (or create a multi-platform version of) the gem; so the uptake is relatively limited. If it was promoted to core the functionality would become available to everyone, and I'm certain it would be used, for example the Rails team could use it to build an alternate solution to the problem addressed in #7839.
=end

#13 Updated by Carsten Bormann 10 months ago

Let me just point out that this is the right way to solve a problem that we have in Ruby-based protocol implementations.
Right now, there is no safe way on the Ruby level to use symbols to represent strings coming in on an unchecked interface.
Of course, at the API level, I can easily use rbcheckid().

The fact that this API exists should alert you to the fact that something is missing at the language level.

Kudos to Matthew for making this available as a gem for now.
I'm not too wild about #interned as the name, but I'm used to Ruby method names being slightly idiosyncratic.

➔ This needs to go into Ruby core sooner than later.

The "alternatives" mentioned:
#7791 is a pipe dream (well, something like it could be made to work with an allocator region concept).

#7839 is serious damage (it could also be implemented on top of a global allocator region setting, which is still damage).

But why allocate at all if you know you don't want to?

#14 Updated by Nathan Zook 7 months ago

This was set to "next minor" a LONG time ago, but I don't see it in 2.1. ??? This would aid security in a couple of ways.

#15 Updated by Nathan Zook 7 months ago

Is this feature request rejected? I thought it would be in 2.1

On 10/01/2013 06:15 PM, Student (Nathan Zook) wrote:

Issue #7854 has been updated by Student (Nathan Zook).

This was set to "next minor" a LONG time ago, but I don't see it in 2.1. ??? This would aid security in a couple of ways.


Feature #7854: New method Symbol[string]
https://bugs.ruby-lang.org/issues/7854#change-42188

Author: Student (Nathan Zook)
Status: Open
Priority: Normal
Assignee: matz (Yukihiro Matsumoto)
Category: core
Target version: next minor

I propose a new class method [] on Symbol. If a symbol s already exists such that s.to_s == string, then s is returned. If not, nil is returned.

The inspiration for this method is a question I was asked, and the answer I was given: "Why would you want to turn a tainted string into a symbol?" "I don't--I want to access an existing symbol with tainted data". Symbol[] accesses the symbol table like hash[] accesses the elements of a hash.

I believe that this completely addresses the problems behind tickets #7791 and #7839. I believe that it is a more intuitive solution than my proposal #7795, and I believe that this will also be useful for YAML.safe_load and similar initiatives.

#16 Updated by Yui NARUSE about 1 month ago

#17 Updated by Yukihiro Matsumoto about 1 month ago

  • Status changed from Open to Rejected

I like the basic idea but the name Symbol[] is not descriptive.
As I replied to #7839, the method should be a variation of #intern.

Matz.

#18 Updated by Shugo Maeda about 1 month ago

Yukihiro Matsumoto wrote:

I like the basic idea but the name Symbol[] is not descriptive.
As I replied to #7839, the method should be a variation of #intern.

How about Symbol.find?

I guess String#intern came from Lisp, and Common Lisp has find-symbol, which returns a symbol only when it's found in a table.

#19 Updated by Yui NARUSE about 1 month ago

diff --git a/string.c b/string.c
index 4e30cb3..1e26a25 100644
--- a/string.c
+++ b/string.c
@@ -8231,6 +8231,27 @@ str_scrub_bang(int argc, VALUE *argv, VALUE str)
 /*
  *  call-seq:
+ *     Symbol.find(str)   -> symbol or nil
+ *
+ *  Return the related symbol if the symbol already exists.
+ *  Return nil if not.
+ */
+
+static VALUE
+sym_find(VALUE dummy, VALUE sym)
+{
+    ID id = rb_check_id(&sym);
+
+    if (id) {
+   return ID2SYM(id);
+    }
+    else {
+   return Qnil;
+    }
+}
+
+/*
+ *  call-seq:
  *     sym == obj   -> true or false
  *
  *  Equality---If <i>sym</i> and <i>obj</i> are exactly the same
@@ -8787,6 +8808,7 @@ Init_String(void)
     rb_undef_alloc_func(rb_cSymbol);
     rb_undef_method(CLASS_OF(rb_cSymbol), "new");
     rb_define_singleton_method(rb_cSymbol, "all_symbols", rb_sym_all_symbols, 0); /* in parse.y */
+    rb_define_singleton_method(rb_cSymbol, "find", sym_find, 1);
     rb_define_method(rb_cSymbol, "==", sym_equal, 1);
     rb_define_method(rb_cSymbol, "===", sym_equal, 1);
diff --git a/test/ruby/test_symbol.rb b/test/ruby/test_symbol.rb
index 7f261b6..cebaf43 100644
--- a/test/ruby/test_symbol.rb
+++ b/test/ruby/test_symbol.rb
@@ -1,4 +1,5 @@
 require 'test/unit'
+require_relative 'envutil'

 class TestSymbol < Test::Unit::TestCase
   # 
@@ -206,4 +207,12 @@ class TestSymbol < Test::Unit::TestCase
     assert_equal(true, "foo#{Time.now.to_i}".to_sym.frozen?)
     assert_equal(true, :foo.to_sym.frozen?)
   end
+
+  def test_sym_find
+    assert_separately(%w[--disable=gems], <<-"end;")
+      assert_equal :intern, Symbol.find("intern")
+      assert_equal nil, Symbol.find("hoge")
+      assert_raise(TypeError){ Symbol.find(true) }
+    end;
+  end
 end

#20 Updated by Yukihiro Matsumoto about 1 month ago

  • Status changed from Rejected to Assigned
  • Assignee changed from Yukihiro Matsumoto to Yui NARUSE

Symbol.find is OK for me.

Matz.

#21 Updated by Yui NARUSE about 1 month ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

Applied in changeset r45175.


  • string.c (sym_find): Add Symbol.find(str), which returns whether given string is defined as symbol or not. [Feature #7854]

Also available in: Atom PDF