Project

General

Profile

Feature #13712

String#start_with? with regexp

Added by naruse (Yui NARUSE) over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:81897]

Description

String#start_with? should receive regexp.

When I write a parser, I want to check a string is start with a pattern or not.
It's just the same thing with StringScanner#match

If I want to do the same thing with normal string method, it needs to write like /\A#{re}/.match(…).
But if re is argument, it needs to create a new temporary regexp every time.

Though we have a workaround as follows but it's bit tricky.

"foo ".rindex(/fo+./, 0)

A patch is following:

diff --git a/re.c b/re.c
index d0aa2a792e..f672ba75ec 100644
--- a/re.c
+++ b/re.c
@@ -1588,6 +1588,84 @@ rb_reg_search(VALUE re, VALUE str, long pos, int reverse)
     return rb_reg_search0(re, str, pos, reverse, 1);
 }

+bool
+rb_reg_start_with_p(VALUE re, VALUE str)
+{
+    long pos = 0;
+    long result;
+    VALUE match;
+    struct re_registers regi, *regs = &regi;
+    regex_t *reg;
+    int tmpreg;
+    onig_errmsg_buffer err = "";
+
+    reg = rb_reg_prepare_re0(re, str, err);
+    tmpreg = reg != RREGEXP_PTR(re);
+    if (!tmpreg) RREGEXP(re)->usecnt++;
+
+    match = rb_backref_get();
+    if (!NIL_P(match)) {
+   if (FL_TEST(match, MATCH_BUSY)) {
+       match = Qnil;
+   }
+   else {
+       regs = RMATCH_REGS(match);
+   }
+    }
+    if (NIL_P(match)) {
+   MEMZERO(regs, struct re_registers, 1);
+    }
+    result = onig_match(reg,
+            (UChar*)(RSTRING_PTR(str)),
+            ((UChar*)(RSTRING_PTR(str)) + RSTRING_LEN(str)),
+            (UChar*)(RSTRING_PTR(str)),
+            regs, ONIG_OPTION_NONE);
+    if (!tmpreg) RREGEXP(re)->usecnt--;
+    if (tmpreg) {
+   if (RREGEXP(re)->usecnt) {
+       onig_free(reg);
+   }
+   else {
+       onig_free(RREGEXP_PTR(re));
+       RREGEXP_PTR(re) = reg;
+   }
+    }
+    if (result < 0) {
+   if (regs == &regi)
+       onig_region_free(regs, 0);
+   if (result == ONIG_MISMATCH) {
+       rb_backref_set(Qnil);
+       return false;
+   }
+   else {
+       onig_error_code_to_str((UChar*)err, (int)result);
+       rb_reg_raise(RREGEXP_SRC_PTR(re), RREGEXP_SRC_LEN(re), err, re);
+   }
+    }
+
+    if (NIL_P(match)) {
+   int err;
+   match = match_alloc(rb_cMatch);
+   err = rb_reg_region_copy(RMATCH_REGS(match), regs);
+   onig_region_free(regs, 0);
+   if (err) rb_memerror();
+    }
+    else {
+   FL_UNSET(match, FL_TAINT);
+    }
+
+    RMATCH(match)->str = rb_str_new4(str);
+    OBJ_INFECT(match, str);
+
+    RMATCH(match)->regexp = re;
+    RMATCH(match)->rmatch->char_offset_updated = 0;
+    rb_backref_set(match);
+
+    OBJ_INFECT(match, re);
+
+    return true;
+}
+
 VALUE
 rb_reg_nth_defined(int nth, VALUE match)
 {
diff --git a/string.c b/string.c
index 072f1329ee..6542a4acb1 100644
--- a/string.c
+++ b/string.c
@@ -9126,6 +9126,7 @@ rb_str_rpartition(VALUE str, VALUE sep)
                    RSTRING_LEN(str)-pos-RSTRING_LEN(sep)));
 }

+extern bool rb_reg_start_with_p(VALUE re, VALUE str);
 /*
  *  call-seq:
  *     str.start_with?([prefixes]+)   -> true or false
@@ -9146,11 +9147,20 @@ rb_str_start_with(int argc, VALUE *argv, VALUE str)

     for (i=0; i<argc; i++) {
    VALUE tmp = argv[i];
-   StringValue(tmp);
-   rb_enc_check(str, tmp);
-   if (RSTRING_LEN(str) < RSTRING_LEN(tmp)) continue;
-   if (memcmp(RSTRING_PTR(str), RSTRING_PTR(tmp), RSTRING_LEN(tmp)) == 0)
-       return Qtrue;
+   switch (BUILTIN_TYPE(tmp)) {
+     case T_REGEXP:
+       {
+       bool r = rb_reg_start_with_p(tmp, str);
+       if (r) return Qtrue;
+       }
+       break;
+     default:
+       StringValue(tmp);
+       rb_enc_check(str, tmp);
+       if (RSTRING_LEN(str) < RSTRING_LEN(tmp)) continue;
+       if (memcmp(RSTRING_PTR(str), RSTRING_PTR(tmp), RSTRING_LEN(tmp)) == 0)
+       return Qtrue;
+   }
     }
     return Qfalse;
 }

Related issues

Related to Ruby trunk - Feature #3388: regexp support for start_with? and end_with?Feedback

Associated revisions

Revision 6187b000
Added by naruse (Yui NARUSE) about 1 year ago

[Feature #13712] String#start_with? supports regexp

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60234 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 60234
Added by naruse (Yui NARUSE) about 1 year ago

[Feature #13712] String#start_with? supports regexp

Revision 60234
Added by naruse (Yui NARUSE) about 1 year ago

[Feature #13712] String#start_with? supports regexp

Revision 87ccf7e5
Added by nobu (Nobuyoshi Nakada) 6 months ago

string.c: doc for [Feature #13712]

  • string.c (rb_str_start_with): [DOC] start_with? example with regexp.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63541 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 63541
Added by nobu (Nobuyoshi Nakada) 6 months ago

string.c: doc for [Feature #13712]

  • string.c (rb_str_start_with): [DOC] start_with? example with regexp.

History

#1 [ruby-core:81898] Updated by Eregon (Benoit Daloze) over 1 year ago

Agreed, this would be great and intuitive.

I wonder, could the symmetrical String#end_with? also work with a Regexp? (having the same effect as a trailing \z in the Regexp)

#2 [ruby-core:81899] Updated by shevegen (Robert A. Heiler) over 1 year ago

I agree as well, would be nice. More than one way to do things. Should also be the same for .start_with? and .end_with?

#3 [ruby-core:81906] Updated by shyouhei (Shyouhei Urabe) over 1 year ago

+1 for start_with? but I have no practical usage of end_with? so a bit negative about that part. Do people really need regexp version of .end_with?

#4 [ruby-core:81908] Updated by phluid61 (Matthew Kerwin) over 1 year ago

shyouhei (Shyouhei Urabe) wrote:

+1 for start_with? but I have no practical usage of end_with? so a bit negative about that part. Do people really need regexp version of .end_with?

I've used regexen at different times to match final punctuation (e.g. /\?[!.]*/) and trailing whitespace (e.g. /\s/). I think it's more readable having str.end_with? /pattern/ instead of str =~ /pattern\z/

#5 [ruby-core:81909] Updated by shyouhei (Shyouhei Urabe) over 1 year ago

phluid61 (Matthew Kerwin) wrote:

I've used regexen at different times to match final punctuation (e.g. /\?[!.]*/) and trailing whitespace (e.g. /\s/). I think it's more readable having str.end_with? /pattern/ instead of str =~ /pattern\z/

I see. Thank you.

#6 [ruby-core:81910] Updated by duerst (Martin Dürst) over 1 year ago

shyouhei (Shyouhei Urabe) wrote:

+1 for start_with? but I have no practical usage of end_with? so a bit negative about that part. Do people really need regexp version of .end_with?

In addition, even if we don't have a direct use case, it's very easy for somebody to try out, and then send a bug report here if it's not available. I know we don't add functionality just because "somebody eventually may need it", but in this case, it seems to be justified to streamline things.

#7 [ruby-core:81911] Updated by nobu (Nobuyoshi Nakada) over 1 year ago

Will you need $~ after start_with?(re)?

#8 [ruby-core:81912] Updated by phluid61 (Matthew Kerwin) over 1 year ago

nobu (Nobuyoshi Nakada) wrote:

Will you need $~ after start_with?(re)?

Personally, I don't see that I'll ever need it. If people do want it, they can lodge a feature request in future?

#9 [ruby-core:81914] Updated by Eregon (Benoit Daloze) over 1 year ago

nobu (Nobuyoshi Nakada) wrote:

Will you need $~ after start_with?(re)?

It might be quite useful when parsing, to avoid doing a second match just to get captures.

#10 [ruby-core:81915] Updated by phluid61 (Matthew Kerwin) over 1 year ago

Eregon (Benoit Daloze) wrote:

It might be quite useful when parsing, to avoid doing a second match just to get captures.

That could depend on whether $&, $1, $2, etc. are set. I assumed nobu (Nobuyoshi Nakada) was only asking about $~ because allocating a whole MatchData object is heavier than just allocating some strings.

#11 [ruby-core:81917] Updated by Eregon (Benoit Daloze) over 1 year ago

phluid61 (Matthew Kerwin) wrote:

That could depend on whether $&, $1, $2, etc. are set. I assumed nobu (Nobuyoshi Nakada) was only asking about $~ because allocating a whole MatchData object is heavier than just allocating some strings.

$&, $1, etc always just read from $~, so it's the same thing.

#12 [ruby-core:81986] Updated by shevegen (Robert A. Heiler) over 1 year ago

Shyouhei Urabe) wrote:

+1 for start_with? but I have no practical usage of end_with? so a bit negative about
that part. Do people really need regexp version of .end_with?

I do not know if the use case frequency is the same. Perhaps you are right that anchoring
or .start_with? is more frequent than .end_with?, via regexes.

But I think that, even when there is a much smaller use case for .end_with? (let's just
assume it for the moment), I think that both .start_with? and .end_with? should behave
the same. Otherwise people may then ask "why does .start_with? allow regex input but
.end_with? does not?". :)

I think it may be useful though?

x = 'abc def'
puts 'yep, ends with either e or f' if x.end_with? /e|f/

At the least to me it seems to be mostly symmetrical use cases, even if one may
be more prevalent than the others. I guess the point may be that it just gives people
more flexibility - in these cases, if they would rather want to use a regexp than
a string, they can do so.

#13 [ruby-core:83383] Updated by matz (Yukihiro Matsumoto) about 1 year ago

Agreed. Need to update Regexp.last_math.

Matz.

#14 Updated by naruse (Yui NARUSE) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset trunk|r60234.


[Feature #13712] String#start_with? supports regexp

#15 Updated by mame (Yusuke Endoh) 12 months ago

  • Related to Feature #3388: regexp support for start_with? and end_with? added

Also available in: Atom PDF