Feature #7854
closedNew method Symbol[string]
Added by Student (Nathan Zook) almost 12 years ago. Updated over 10 years ago.
Description
I propose a new class method [] on Symbol. If a symbol s already exists such that s.to_s == string, then s is returned. If not, nil is returned.
The inspiration for this method is a question I was asked, and the answer I was given: "Why would you want to turn a tainted string into a symbol?" "I don't--I want to access an existing symbol with tainted data". Symbol[] accesses the symbol table like hash[] accesses the elements of a hash.
I believe that this completely addresses the problems behind tickets #7791 and #7839. I believe that it is a more intuitive solution than my proposal #7795, and I believe that this will also be useful for YAML.safe_load and similar initiatives.
Files
symbol_lookup.patch (1.83 KB) symbol_lookup.patch | phluid61 (Matthew Kerwin), 03/07/2013 12:50 PM | ||
symbol_lookup2.patch (1.36 KB) symbol_lookup2.patch | phluid61 (Matthew Kerwin), 03/08/2013 08:48 AM | ||
symbol_lookup3.patch (1.07 KB) symbol_lookup3.patch | phluid61 (Matthew Kerwin), 03/09/2013 05:54 PM | ||
symbol_lookup3_warn.patch (1.16 KB) symbol_lookup3_warn.patch | phluid61 (Matthew Kerwin), 03/09/2013 05:54 PM |
Updated by phluid61 (Matthew Kerwin) almost 12 years ago
Note that this is closely related to #7795 (Symbol.defined? and/or to_existing_symbol)
In existing code, Symbol.[] could be implemented as:
class Symbol
def self.[](string)
all_symbols.find{|sym| sym.to_s == string}
end
end
Updated by Student (Nathan Zook) almost 12 years ago
It could, but it would be extraordinarily slow, as all_symbols returns an array which is quite large in many applications.
Updated by drbrain (Eric Hodel) almost 12 years ago
Updated by Student (Nathan Zook) almost 12 years ago
=begin
Ticket #7839 requires the manipulation of global state. I'm not sure why I have to explain that this is a REALLY bad idea.
Ticket #7791 has two possible implementations. One is to GC symbols globally. This would require treating not just symbols like objects, but methods (whose names are in fact symbols) as well. I do not believe that methods are even currently part of the object system.
Another implementation would be to divide symbols into two kinds depending on how they are created. The theory being that symbols used for method names would be immune to GC. The first problem with this is that there is no reason to believe that method declarations are the first place that a particular symbol would be declared. The second is that dynamic method creation is an important part of ruby. If the goal is to protect against memory leaks in this fashion, it is not at all certain that the leak does not extend into the realm of method creation.
In other words, both of these implementations involve complex changes to the guts of Ruby, and lead to the likelihood of a significant behavioural fork with other rubys. (Not to mention the relatively high risk of bug introduction.) Since this is a security feature, I think that it is important to lead the way in a direction that is easy to import to other rubies (and also to backport as a security patch!) I expect Symbol[] to have a very straightforward implementation that is well-isolated from the rest of Ruby, with the possible exception of YAML.load, which might well benefit from such a feature.
As for requiring the libraries to all be updated to make use of this feature--I consider that to be a good thing. #7839 creates a change in MRI's behaviour that WILL break apparently "safe" use of existing libraries. #7791 necessarily dramatically affects Symbol's runtime performance, and thus means that any highly-tuned ruby is going to have issues--assuming that no bugs occur, and that the other rubys pick it up.
Furthermore, for most, perhaps even all, libraries, (({grep -R to_sym lib})) is going to tell you what you need to examine to make use of this feature. Certainly, it would be nice to avoid having to do such things, but because of the recent exploits, the more security-minded portion of the community (such as myself) is ALREADY nervously poking around in their libraries.
This feature gives the community a clean way to patch questionable code, which is itself relatively easy to identify in manner that makes it easy for other rubies to quickly follow. I do not believe that the other proposals do.
=end
Updated by ko1 (Koichi Sasada) almost 12 years ago
- Assignee set to matz (Yukihiro Matsumoto)
Updated by phluid61 (Matthew Kerwin) over 11 years ago
- File symbol_lookup.patch symbol_lookup.patch added
=begin
I've attached a patch that defines ((%Symbol[str]%)). If ((|str|)) is a string and there exists a symbol such that (({symbol.to_s == str})), it returns that symbol. Otherwise it returns ((|nil|)). Raises a TypeError if ((|str|)) is not a string.
I also made a unit test, currently available as a gist: https://gist.github.com/phluid61/5105458
=end
Updated by nobu (Nobuyoshi Nakada) over 11 years ago
To obtain existing symbol, rb_check_id() is already available, so you don't have to add new extern function.
Updated by phluid61 (Matthew Kerwin) over 11 years ago
- File symbol_lookup2.patch symbol_lookup2.patch added
nobu (Nobuyoshi Nakada) wrote:
To obtain existing symbol, rb_check_id() is already available, so you don't have to add new extern function.
Thank you for the feedback. With that in mind, I've made a less invasive version which only modifies string.c
Please let me know if my enthusiasm gets annoying. :)
Updated by nobu (Nobuyoshi Nakada) over 11 years ago
Why does it have -1 arity?
And I don't think it's harmful if the method allows a Smbol too.
Updated by phluid61 (Matthew Kerwin) over 11 years ago
- File symbol_lookup3.patch symbol_lookup3.patch added
- File symbol_lookup3_warn.patch symbol_lookup3_warn.patch added
nobu (Nobuyoshi Nakada) wrote:
Why does it have -1 arity?
And I don't think it's harmful if the method allows a Smbol too.
To the first: an oversight on my part, there's no real reason. I have rewritten it with an arity of 1.
To the second: I can easily change it to allow a Symbol as well. However since the original discussion that spawned this proposal was focused on the idea of not creating unwanted/unneeded Symbols, I wonder should it emit a warning in that case?
I see now, too, that I was rather overzealous in my original attempts. I should have realised most of the hard work has already been done. :)
Now I suppose it's up to Matz to approve it or not.
Updated by phluid61 (Matthew Kerwin) over 11 years ago
=begin
In the intervening months I've created a gem ((URL:https://rubygems.org/gems/symbol_lookup)) that implements ((%Symbol.[]%)), as well as two methods inspired by #7795 :
- String#interned => gets an existing symbol, returning the symbol or nil
- String#to_existing_sym => gets an existing symbol, raising an argument error if it doesn't exist
The problem is that they are written as C extensions, and I'm not familiar enough with non-MRI implementations to port (or create a multi-platform version of) the gem; so the uptake is relatively limited. If it was promoted to core the functionality would become available to everyone, and I'm certain it would be used, for example the Rails team could use it to build an alternate solution to the problem addressed in #7839.
=end
Updated by cabo (Carsten Bormann) over 11 years ago
Let me just point out that this is the right way to solve a problem that we have in Ruby-based protocol implementations.
Right now, there is no safe way on the Ruby level to use symbols to represent strings coming in on an unchecked interface.
Of course, at the API level, I can easily use rb_check_id().
The fact that this API exists should alert you to the fact that something is missing at the language level.
Kudos to Matthew for making this available as a gem for now.
I'm not too wild about #interned as the name, but I'm used to Ruby method names being slightly idiosyncratic.
➔ This needs to go into Ruby core sooner than later.
The "alternatives" mentioned:
#7791 is a pipe dream (well, something like it could be made to work with an allocator region concept).
#7839 is serious damage (it could also be implemented on top of a global allocator region setting, which is still damage).
But why allocate at all if you know you don't want to?
Updated by Student (Nathan Zook) about 11 years ago
This was set to "next minor" a LONG time ago, but I don't see it in 2.1. ??? This would aid security in a couple of ways.
Updated by Student (Nathan Zook) about 11 years ago
Is this feature request rejected? I thought it would be in 2.1
On 10/01/2013 06:15 PM, Student (Nathan Zook) wrote:
Issue #7854 has been updated by Student (Nathan Zook).
This was set to "next minor" a LONG time ago, but I don't see it in 2.1. ??? This would aid security in a couple of ways.
Feature #7854: New method Symbol[string]
https://bugs.ruby-lang.org/issues/7854#change-42188Author: Student (Nathan Zook)
Status: Open
Priority: Normal
Assignee: matz (Yukihiro Matsumoto)
Category: core
Target version: next minorI propose a new class method [] on Symbol. If a symbol s already exists such that s.to_s == string, then s is returned. If not, nil is returned.
The inspiration for this method is a question I was asked, and the answer I was given: "Why would you want to turn a tainted string into a symbol?" "I don't--I want to access an existing symbol with tainted data". Symbol[] accesses the symbol table like hash[] accesses the elements of a hash.
I believe that this completely addresses the problems behind tickets #7791 and #7839. I believe that it is a more intuitive solution than my proposal #7795, and I believe that this will also be useful for YAML.safe_load and similar initiatives.
Updated by naruse (Yui NARUSE) over 10 years ago
- Related to Feature #7839: Symbol.freeze_symbols added
Updated by matz (Yukihiro Matsumoto) over 10 years ago
- Status changed from Open to Rejected
I like the basic idea but the name Symbol[]
is not descriptive.
As I replied to #7839, the method should be a variation of #intern.
Matz.
Updated by shugo (Shugo Maeda) over 10 years ago
Yukihiro Matsumoto wrote:
I like the basic idea but the name
Symbol[]
is not descriptive.
As I replied to #7839, the method should be a variation of #intern.
How about Symbol.find?
I guess String#intern came from Lisp, and Common Lisp has find-symbol, which returns a symbol only when it's found in a table.
Updated by naruse (Yui NARUSE) over 10 years ago
diff --git a/string.c b/string.c
index 4e30cb3..1e26a25 100644
--- a/string.c
+++ b/string.c
@@ -8231,6 +8231,27 @@ str_scrub_bang(int argc, VALUE *argv, VALUE str)
/*
* call-seq:
+ * Symbol.find(str) -> symbol or nil
+ *
+ * Return the related symbol if the symbol already exists.
+ * Return nil if not.
+ */
+
+static VALUE
+sym_find(VALUE dummy, VALUE sym)
+{
+ ID id = rb_check_id(&sym);
+
+ if (id) {
+ return ID2SYM(id);
+ }
+ else {
+ return Qnil;
+ }
+}
+
+/*
+ * call-seq:
* sym == obj -> true or false
*
* Equality---If <i>sym</i> and <i>obj</i> are exactly the same
@@ -8787,6 +8808,7 @@ Init_String(void)
rb_undef_alloc_func(rb_cSymbol);
rb_undef_method(CLASS_OF(rb_cSymbol), "new");
rb_define_singleton_method(rb_cSymbol, "all_symbols", rb_sym_all_symbols, 0); /* in parse.y */
+ rb_define_singleton_method(rb_cSymbol, "find", sym_find, 1);
rb_define_method(rb_cSymbol, "==", sym_equal, 1);
rb_define_method(rb_cSymbol, "===", sym_equal, 1);
diff --git a/test/ruby/test_symbol.rb b/test/ruby/test_symbol.rb
index 7f261b6..cebaf43 100644
--- a/test/ruby/test_symbol.rb
+++ b/test/ruby/test_symbol.rb
@@ -1,4 +1,5 @@
require 'test/unit'
+require_relative 'envutil'
class TestSymbol < Test::Unit::TestCase
# [ruby-core:3573]
@@ -206,4 +207,12 @@ class TestSymbol < Test::Unit::TestCase
assert_equal(true, "foo#{Time.now.to_i}".to_sym.frozen?)
assert_equal(true, :foo.to_sym.frozen?)
end
+
+ def test_sym_find
+ assert_separately(%w[--disable=gems], <<-"end;")
+ assert_equal :intern, Symbol.find("intern")
+ assert_equal nil, Symbol.find("hoge")
+ assert_raise(TypeError){ Symbol.find(true) }
+ end;
+ end
end
Updated by matz (Yukihiro Matsumoto) over 10 years ago
- Status changed from Rejected to Assigned
- Assignee changed from matz (Yukihiro Matsumoto) to naruse (Yui NARUSE)
Symbol.find is OK for me.
Matz.
Updated by naruse (Yui NARUSE) over 10 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
Applied in changeset r45175.
- string.c (sym_find): Add Symbol.find(str), which returns whether given
string is defined as symbol or not. [Feature #7854]