Feature #3845

"in" infix operator

Added by Yusuke Endoh over 4 years ago. Updated almost 3 years ago.

[ruby-core:32454]
Status:Rejected
Priority:Normal
Assignee:Yukihiro Matsumoto

Description

=begin
Hi,

I'd propose "in" infix operator.

( in ) yields true when is included in .
Otherwise it yields false.

p "found" if 1 in 1, 2, 3 #=> found
p "not found" if 0 in 1, 2, 3 #=> not found

"in" operator is clearer to the reader than Array#include?:

p "found" if [1, 2, 3].include?(1)
p "not found" if [1, 2, 3].include?(0)

This proposal is similar to Object#in? proposed in .
But there are two differences:

  • "in" operator does not pollute name space of Object class

  • each candidate of "in" is evaluated lazily; for example,

    1 in 1, 2, foo()

    does not call the method "foo" because 1 is found before that.

    Note that this proposal ensures the syntax compatibility, since
    "in" is already a keyword for "for" statement. But "for" statement
    is rarely used. This proposal utilizes the rarely-used keyword.

    I wrote an experimental patch. It implements the operator as a
    syntactic sugar to "case" statement:

    in
    => (case ; when ; true; else false; end)

    The patch causes no parser conflict.

    One more thing. The following expression is rejected:

    foo(x in 1, 2, 3)

    This is because it is ambiguous; this expression can be interpreted
    as three ways:

    foo((x in 1), 2, 3)
    foo((x in 1, 2), 3)
    foo((x in 1, 2, 3))

    You need write parentheses explicitly.

    What do you think?

    diff --git a/parse.y b/parse.y
    index e085088..64318bd 100644
    --- a/parse.y
    +++ b/parse.y
    @@ -745,6 +745,7 @@ static void token_info_pop(struct parser_params*, const char token);
    %nonassoc modifier_if modifier_unless modifier_while modifier_until
    %left keyword_or keyword_and
    %right keyword_not
    +%nonassoc keyword_in
    %nonassoc keyword_defined
    %right '=' tOP_ASGN
    %left modifier_rescue
    @@ -1258,6 +1259,14 @@ expr : command_call
    $$ = dispatch2(unary, ripper_id2sym('!'), $2);
    %
    /
    }

    • | expr keyword_in args
    • {
    • /*%%%*/
    • $$ = NEW_CASE($1, NEW_WHEN($3, NEW_TRUE(), NEW_FALSE()));
    • /*%
    • $$ = dispatch2(in, $1, $3);
    • %*/
    • } | arg ;

    diff --git a/test/ripper/test_parser_events.rb b/test/ripper/test_parser_events.rb
    index 5d76941..6005457 100644
    --- a/test/ripper/test_parser_events.rb
    +++ b/test/ripper/test_parser_events.rb
    @@ -1107,4 +1107,8 @@ class TestRipper::ParserEvents < Test::Unit::TestCase
    parse('/', :compile_error) {|msg| compile_error = msg}
    assert_equal("unterminated regexp meets end of file", compile_error)
    end
    +
    + def test_in
    + assert_equal("[in(1,[1,2,3])]", parse('1 in 1, 2, 3'))
    + end
    end if ripper_test

    Yusuke Endoh mame@tsg.ne.jp
    =end

in.expression.diff Magnifier (698 Bytes) Michael Edgar, 07/10/2011 09:22 AM


Related issues

Duplicated by Ruby trunk - Feature #4402: Include an "in" operator Closed 02/16/2011

History

#1 Updated by Yukihiro Matsumoto over 4 years ago

=begin
Hi,

In message "Re: [Ruby 1.9-Feature#3845][Open] "in" infix operator"
on Fri, 17 Sep 2010 19:30:27 +0900, Yusuke Endoh redmine@ruby-lang.org writes:

|Hi,
|
|I'd propose "in" infix operator.
|
|( in ) yields true when is included in .
|Otherwise it yields false.

I am neutral for this proposal. But the patch uses "case" internally
thus comparison is done by "===". Is this expected behavior?

                        matz.

=end

#2 Updated by Yusuke Endoh over 4 years ago

=begin
Matz,

Thank you for your comment!

|I'd propose "in" infix operator.
|
|( in ) yields true when is included in .
|Otherwise it yields false.

I am neutral for this proposal. But the patch uses "case" internally
thus comparison is done by "===". Is this expected behavior?

The patch is just proof of concept.
I was not so particular about the implementation. But, I thought of
the following code:

if n in 5..10
# ...
end

This code works only when "===" is used (of course, unless range is
specially handled). So I prefer "===".

However, I'm happy to rewrite a patch if you say "==" should be used.

--
Yusuke Endoh mame@tsg.ne.jp
=end

#3 Updated by Benoit Daloze over 4 years ago

=begin
On 17 September 2010 12:30, Yusuke Endoh redmine@ruby-lang.org wrote:

Feature #3845: "in" infix operator

What do you think?

It is indeed more elegant than #include?, and reusing "in" is nice.

I am somehow used to the question mark of #in?, but being a keyword,
it would be weird.

 do_sth if element in collection

 do_sth if collection.include? element

Yep, that is definitely nicer.

| But, I thought of the following code:
|
| if n in 5..10
| # ...
| end
|
| This code works only when "===" is used (of course, unless range is
| specially handled). So I prefer "===".

Maybe if there is only one element to the right of "in",
it should be checked if having an #include? method and then call it ?
(or check if it is Enumerable)

Or maybe always use #include? when possible, and #== otherwise ?

I think #=== is not really appropriate as it rarely check for
inclusion, and would likely lead to unexpected results.

Regards,
Benoit Daloze

=end

#4 Updated by Nikolai Weibull over 4 years ago

=begin
On Tue, Sep 21, 2010 at 18:53, Benoit Daloze eregontp@gmail.com wrote:

   do_sth if element in collection

   do_sth if collection.include? element

Yep, that is definitely nicer.

do_sth if element in elements

do_sth if elements.include? element

=end

#5 Updated by Yusuke Endoh over 4 years ago

=begin
Hi,

2010/9/22 Benoit Daloze eregontp@gmail.com:

On 17 September 2010 12:30, Yusuke Endoh redmine@ruby-lang.org wrote:

Feature #3845: "in" infix operator

What do you think?

It is indeed more elegant than #include?, and reusing "in" is nice.

Thanks :-)

| But, I thought of the following code:
|
| if n in 5..10
| # ...
| end
|
| This code works only when "===" is used (of course, unless range is
| specially handled). So I prefer "===".

Maybe if there is only one element to the right of "in",
it should be checked if having an #include? method and then call it ?
(or check if it is Enumerable)

Hmm. I received the similar opinion (via Japanese twitter).
I think that that makes the semantics complex (for me), but it is ok
as long as the behavior of usual cases is intuitive.

Or maybe always use #include? when possible, and #== otherwise ?

I think #=== is not really appropriate as it rarely check for
inclusion, and would likely lead to unexpected results.

#=== has been used to check for inclusion in "case" statement, I think.
So it is not so inappropriate, imo.

case n
when 0... 5 then # ...
when 5...10 then # ...
when 10...20 then # ...
end

case var
when Integer then # ...
when String then # ...
end

--
Yusuke Endoh mame@tsg.ne.jp

=end

#6 Updated by Johan Holmberg over 4 years ago

=begin
On Wed, Sep 22, 2010 at 1:48 AM, Yusuke ENDOH mame@tsg.ne.jp wrote:

Maybe if there is only one element to the right of "in",
 it should be checked if having an #include? method and then call it ?
(or check if it is Enumerable)

Hmm.  I received the similar opinion (via Japanese twitter).
I think that that makes the semantics complex (for me), but it is ok
as long as the behavior of usual cases is intuitive.

I also like the "x in foo" infix syntax. I would expect it to be a
membership test, for example in the following situations:

 an_arr = [11,22,33]
 a_set = Set.new([11,22,33])

 p (10 in an_arr)            # false
 p (11 in an_arr)            # true
 p (10 in a_set)             # false
 p (11 in a_set)             # true

and probably similarly for other "collection type objects" too (I
believe it works something like that in Python).

Regards,
/Johan Holmberg

=end

#7 Updated by Shyouhei Urabe over 4 years ago

  • Status changed from Open to Assigned

=begin

=end

#8 Updated by Martin Dürst over 4 years ago

=begin

On 2010/09/23 4:41, Johan Holmberg wrote:

On Wed, Sep 22, 2010 at 1:48 AM, Yusuke ENDOHmame@tsg.ne.jp wrote:

I also like the "x in foo" infix syntax. I would expect it to be a
membership test, for example in the following situations:

 an_arr = [11,22,33]
 a_set = Set.new([11,22,33])

 p (10 in an_arr)            # false
 p (11 in an_arr)            # true
 p (10 in a_set)             # false
 p (11 in a_set)             # true

and probably similarly for other "collection type objects" too (I
believe it works something like that in Python).

Just my two cents, but I don't see why this case is important enough to
warrant deviating from the usual, object-oriented syntax (which may not
always look optimal, but is easy and straightforward).

Regards, Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp

=end

#9 Updated by Benoit Daloze over 4 years ago

=begin
On 24 September 2010 09:28, "Martin J. Dürst" duerst@it.aoyama.ac.jp wrote:

On 2010/09/23 4:41, Johan Holmberg wrote:

On Wed, Sep 22, 2010 at 1:48 AM, Yusuke ENDOHmame@tsg.ne.jp  wrote:

I also like the "x in foo" infix syntax. I would expect it to be a
membership test, for example in the following situations:

    an_arr = [11,22,33]
    a_set = Set.new([11,22,33])

    p (10 in an_arr)            # false
    p (11 in an_arr)            # true
    p (10 in a_set)             # false
    p (11 in a_set)             # true

and probably similarly for other "collection type objects" too (I
believe it works something like that in Python).

Just my two cents, but I don't see why this case is important enough to
warrant deviating from the usual, object-oriented syntax (which may not
always look optimal, but is easy and straightforward).

Regards,    Martin.

It is indeed some kind of syntactic sugar, and you are probably right
to mention it, because maybe it should be implemented it with OO
syntax before.
The main problem being here to add a method to Object (which "pollute"
almost all objects, and might be irrelevant/useless for some).

This proposition avoid that, and I believe for this reason (and the
beauty of it) it is somehow better (but unexpected for OO, and making
a new particular case).

Regards,
B.D.

=end

#10 Updated by Jorge L. Cangas over 4 years ago

=begin
I think is better separate both roles:
-for membrership operator use 'in?' as

item in? somearray

  • for enumeration use 'in' as

    for(a in [1,2,3,4])

=end

#11 Updated by Yusuke Endoh over 4 years ago

=begin
Hi,

2010/9/24 "Martin J. Dürst" duerst@it.aoyama.ac.jp:

Just my two cents, but I don't see why this case is important enough to
warrant deviating from the usual, object-oriented syntax (which may not
always look optimal, but is easy and straightforward).

Thank you for your comment.

Indeed, an idiom [a, b, c].include?(x) can be used as a substitute.
However, it has three problems:

1) the word order is weird; x should appear first (at least, many
people feel so)
2) the idiom is too long (even though it is often used)
3) it is inefficient; new array object is created every times

Also, x == a || x == b || x == c can be used. It has another problem:
it becomes very verbose when "x" is long.

 http_request.http_method == :get  ||
 http_request.http_method == :post ||
 http_request.http_method == :put  ||
 http_request.http_method == :delete

vs.

 http_request.http_method in :get, :post, :put, :delete

What is worse is that we often want to write this kind of code.
If we rarely wrote this kind of code, I would think that new syntax
was not needed.

--
Yusuke Endoh mame@tsg.ne.jp

=end

#12 Updated by Yusuke Endoh over 4 years ago

=begin
Hi,

2010/9/29 Roger Pack rogerdpack2@gmail.com:

I like it.  There's a kind of elegance in it

if a in [1,2,3]
   p ' it is in 1,2,3'
end

It works well with my head.  +1

Thanks, but it is a bit different from my original suggestion.
My suggestion does not require brackets:

if a in 1,2,3
p 'it is in 1,2,3'
end

a in [1,2,3] is slightly verbose (for me) and inefficient because
it creates new array object when evaluated.
If you want to write array there, you can use splat operator:

ary = [1,2,3]
if a in *ary
p 'it is in ary'
end

--
Yusuke Endoh mame@tsg.ne.jp

=end

#13 Updated by Martin Dürst over 4 years ago

=begin
Hello Yusuke,

On 2010/10/05 23:28, Yusuke ENDOH wrote:

Hi,

2010/9/24 "Martin J. Dürst"duerst@it.aoyama.ac.jp:

Just my two cents, but I don't see why this case is important enough to
warrant deviating from the usual, object-oriented syntax (which may not
always look optimal, but is easy and straightforward).

Thank you for your comment.

Indeed, an idiom [a, b, c].include?(x) can be used as a substitute.
However, it has three problems:

1) the word order is weird; x should appear first (at least, many
people feel so)

I would understand that if it were [a, b, c].included? x
But it's include?, so the order seems just fine. Easy to read as
"does [a, b, c] include x?". Any other order would feel strange,
wouldn't it?

Also, an 'in?' method on Object has been proposed, so that you can write
x.in? [a, b, c]
That's very short, and fully object oriented pure Ruby, no syntactic
sugar necessary.

2) the idiom is too long (even though it is often used)

I don't think Ruby method names are optimized according to usage
frequency. And where is "too long"? 'include?' is 8 characters. [If it
is really too long, what about maybe 'incl?'? (I don't like that
personally, but just in case.)]

3) it is inefficient; new array object is created every times

That's a problem for a good compiler/interpreter. There are many cases
in Ruby where similar stuff happen, and nevertheless, many people are
using Ruby. If it really needs to be fast, why not use C or so?

Also, x == a || x == b || x == c can be used. It has another problem:
it becomes very verbose when "x" is long.

 http_request.http_method == :get  ||
 http_request.http_method == :post ||
 http_request.http_method == :put  ||
 http_request.http_method == :delete

vs.

 http_request.http_method in :get, :post, :put, :delete

Well, [:get, :post, :put, :delete].include?(http_request.http_method)
is still available. Just use that.

As for the 'new array object created every time' problem, the best thing
here would be to define something like:

HTTP::REST_METHODS = [:get, :post, :put, :delete]

and then later just do:
HTTP::REST_METHODS.include?(http_request.http_method)
or so.

Another way to do it would be:
case http_request.http_method
when :get, :post, :put, :delete
...
end

What is worse is that we often want to write this kind of code.
If we rarely wrote this kind of code, I would think that new syntax
was not needed.

There are many other methods in Ruby that are used very often. Do we
want to create special syntax for all of them?

Regards, Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp

=end

#14 Updated by Yusuke Endoh over 4 years ago

=begin
Hi,

2010/10/8 "Martin J. Dürst" duerst@it.aoyama.ac.jp:

I would understand that if it were [a, b, c].included? x
But it's include?, so the order seems just fine. Easy to read as
"does [a, b, c] include x?". Any other order would feel strange, wouldn't
it?

I'm not talking about English, but a "subject" of a sentence.
Because I think that the "subject" of this sentence is "x", I want to
write "x" first, such as "is x included in [a, b, c]?".

Consider Python's join: '-'.join(["a", "b", "c"])
I think that it is awkward NOT because it is unnatural English word order,
but because an array (that is the "subject" of this sentence) appears
later.

Also, an 'in?' method on Object has been proposed, so that you can write
x.in? [a, b, c]
That's very short, and fully object oriented pure Ruby, no syntactic sugar
necessary.

Some people say that it is against OO. They say that Object class should
not have "in?" method because "in?" is not a property of Object.
Personally, I'm not against Object#in?, but I can also understand their
opinions.

  2) the idiom is too long (even though it is often used)

I don't think Ruby method names are optimized according to usage frequency.

Though there are many exceptions, Ruby certainly has a design principle
("akr theory" called in ) that encouraged methods should
have short names.

An extreme example is . matz once suggested String#sg
that is a reformed version of String#gsub. Though it was not committed.

Another way to do it would be:
   case http_request.http_method
   when :get, :post, :put, :delete
     ...
   end

I agree that case statement is a good idea. When I imformally suggested
"in?" operator (on IRC or twitter), some people also suggested me to use
case statement, and I was satisfied once.

But there is still two problems; case cannot be postpositive, and cannot
be used in else clauses (like "elsif").

An extreme example again: I heard that Sasada-san even created a patch for
postpositive case statement:

p "foo" case http_request.http_method when :get, :post, :put, :delete

I believe that this shows that many people suffer from the word order
problem, though "in" operator is much better than this syntax :-)

--
Yusuke Endoh mame@tsg.ne.jp

=end

#15 Updated by Martin Dürst over 4 years ago

=begin
Hello Yusuke,

On 2010/10/08 21:29, Yusuke ENDOH wrote:

Hi,

2010/10/8 "Martin J. Dürst"duerst@it.aoyama.ac.jp:

I would understand that if it were [a, b, c].included? x
But it's include?, so the order seems just fine. Easy to read as
"does [a, b, c] include x?". Any other order would feel strange, wouldn't
it?

I'm not talking about English, but a "subject" of a sentence.
Because I think that the "subject" of this sentence is "x", I want to
write "x" first, such as "is x included in [a, b, c]?".

[a, b, c] includes x, not the other way round. Of course, you can change
the verb to passive voice (includ*ed*) and make the former object (x) a
grammatical subject. But how do you expect people to deduce that you
think about it in the passive voice from the method name 'include'?

Consider Python's join: '-'.join(["a", "b", "c"])
I think that it is awkward NOT because it is unnatural English word order,
but because an array (that is the "subject" of this sentence) appears
later.

I think that you can both say "'-' joins the array" and "the array joins
itself with the '-'", so both Ruby and Python have a point. I think the
awkwardness (which I feel too) is mainly because we are used to Ruby,
not to Python.

Also, an 'in?' method on Object has been proposed, so that you can write
x.in? [a, b, c]
That's very short, and fully object oriented pure Ruby, no syntactic sugar
necessary.

Some people say that it is against OO. They say that Object class should
not have "in?" method because "in?" is not a property of Object.

Given that collections of various kinds are extremely important in
programming, it may not be too far-fetched to say that it's a property
of any Object in Ruby to be (potentially) included in a collection.

2) the idiom is too long (even though it is often used)

I don't think Ruby method names are optimized according to usage frequency.

Though there are many exceptions, Ruby certainly has a design principle
("akr theory" called in ) that encouraged methods should
have short names.

Yes. Everything else being equal, that's a good policy to follow. But
Ruby naming doesn't go as far as dropping vowels and such the way Unix
commands do.

An extreme example is . matz once suggested String#sg
that is a reformed version of String#gsub. Though it was not committed.

Good programmers know that program readability is important, and 'sg' is
definitely not readable.

Another way to do it would be:
case http_request.http_method
when :get, :post, :put, :delete
...
end

I agree that case statement is a good idea. When I imformally suggested
"in?" operator (on IRC or twitter), some people also suggested me to use
case statement, and I was satisfied once.

But there is still two problems; case cannot be postpositive, and cannot
be used in else clauses (like "elsif").

An extreme example again: I heard that Sasada-san even created a patch for
postpositive case statement:

p "foo" case http_request.http_method when :get, :post, :put, :delete

I believe that this shows that many people suffer from the word order
problem, though "in" operator is much better than this syntax :-)

That case statement would in my view not be as bad as the 'in' proposal.
The case statement just completes the postpositive versions of
if/unless/while. The 'in' is a totally new construction.

Regards, Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp

=end

#16 Updated by Yusuke Endoh over 4 years ago

=begin
Hi,

2010/10/9 "Martin J. Dürst" duerst@it.aoyama.ac.jp:

[a, b, c] includes x, not the other way round. Of course, you can change the
verb to passive voice (includ*ed*) and make the former object (x) a
grammatical subject. But how do you expect people to deduce that you think
about it in the passive voice from the method name 'include'?

More straightforwardly (for me), I will write:

"is x (in) a, b or c?" 「x は a, b, c のいずれか?」

When we want to check whether http_method is :get or :post, is it natural
for English speaker to say:

"do :get or :post include the http_method ?"

?

I think that you can both say "'-' joins the array" and "the array joins
itself with the '-'", so both Ruby and Python have a point. I think the
awkwardness (which I feel too) is mainly because we are used to Ruby, not to
Python.

I admit it is one of the reasons of awkwardness. But I still think the
biggest reason is the word order. I'm curious to know whether

":get == http_method or :post == http_method"

is more natural (or equal) for you than

"http_method == :get or http_method == :post"

.

An extreme example again: I heard that Sasada-san even created a patch for
postpositive case statement:

  p "foo" case http_request.http_method when :get, :post, :put, :delete

I believe that this shows that many people suffer from the word order
problem, though "in" operator is much better than this syntax :-)

That case statement would in my view not be as bad as the 'in' proposal. The
case statement just completes the postpositive versions of if/unless/while.
The 'in' is a totally new construction.

That approach requires a new keyword "elscase." And, "if" and "case" is
too exclusive; it is cumbersome to combinate normal condition and case
condition, like this:

if (x in a, b, c) && y == 1
...
end

I believe that "case" statement is not originally designed for such a use
case. Rather, the intended use case of "case" statement is to jump
execution to multiple "when" clauses.
I think it is better to introduce a new construction than to force to
extend "case" statement.

--
Yusuke Endoh mame@tsg.ne.jp

=end

#17 Updated by Michael Edgar over 3 years ago

I personally believe in belongs as an operator, it should match natural, mathematical, set-inclusion notation, and it should invoke include?.

Many have discussed how it is just as possible to write "does S include x" as well as "is x in S": especially in English, there are many ways of writing things. As there are in Ruby! This should not distract us from why the idea has been proposed in the first place.

In mathematics, we very rarely write "Set S includes x", let alone as a predicate. Instead, we write, (in LaTeX), x \in S. This is because most commonly the focus of this predicate is the element in question: is it in the set or not? This is a property of the set, but the focus of discourse is the (potential) element. The Ruby method belongs on the set, OO-speaking, because it is in charge of the information involved. But it is not unreasonable to note that writing S.includes?(x) introduces a mismatch between how computer scientists typically consider such questions.

Ruby's OO syntax does not naturally allow us to express this "foo in S" idea, in that order, without adding a method to all potential elements. I don't believe introducing a new method, .in? is a good idea. There should be no reason to introduce a misleading method name that suggests an element might know what sets it is in. Instead, I think that introducing in as an syntactic construct (much like for loop syntax) is appropriate. Since Ruby already has an idiomatic inclusion method name, include?, I believe it should invoke that, right to left (foo in bar means bar.include?(foo)). Here, the parallel with for loop syntax becomes more clear.

For loops address the same concern: it makes sense to write, in English, "array, each of your elements, x, should do this" just as it makes sense to say "for each x in the array, do this". OO-style invocation supports the former, but not the latter, and so we have a syntax which reverses the order: in for loops, the receiver comes after the variable names it uses. This supports a different, natural way to describe loops. Just like in, it works by using a convention-based method name. For loops in Ruby are maligned primarily due to potentially-surprising scoping issues, but their syntax itself is subjectively attractive.

I have attached a patch which incorporates this approach. It includes the appropriate ripper event(s).

#18 Updated by Yukihiro Matsumoto almost 3 years ago

  • Description updated (diff)
  • Status changed from Assigned to Rejected

This proposal is only for cosmetics.
I don't want a new operator that does not introduce something new.

Matz.

Also available in: Atom PDF