Feature #15899: String#before and String#after - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #15899

open

String#before and String#after

Added by kke (Kimmo Lehto) about 6 years ago. Updated over 5 years ago.

Status:

Open

Assignee:

Target version:

[ruby-core:92972]

Description

There seems to be no methods for getting a substring before or after a marker.

Too often I see and have to resort to variations of:

str[/(.+?);/, 1]
str.split(';').first
substr, _ = str.split(';', 2)
str.sub(/.*;/, '')
str[0...str.index(';')]

These create intermediate objects or/and are ugly.

The String#delete_suffix and String#delete_prefix do not accept regexps and thus only can be used if you first figure out the full prefix or suffix.

For this reason, I suggest something like:

> str = 'application/json; charset=utf-8'
> str.before(';')
=> "application/json"
> str.after(';')
=> " charset=utf-8"

What should happen if the marker isn't found? In my opinion, before should return the full string and after an empty string.

Files

Download all files

test.rb (712 Bytes) test.rb		edd314159 (Edd Morgan), 07/09/2019 06:33 PM
test_mem.rb (326 Bytes) test_mem.rb		edd314159 (Edd Morgan), 07/09/2019 06:33 PM
2269.diff (3.77 KB) 2269.diff		edd314159 (Edd Morgan), 07/09/2019 06:33 PM

Actions

Copy link

#1 [ruby-core:92973]

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

Since you are mentioning that String#delete_suffix and String#delete_prefix do not accept regexps and that is a weak point, you should better use regexps in the examples illustrating your proposal.

Actions

Copy link

#2 [ruby-core:92974]

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

Using partition looks reasonable, and it can accept regexes.

str = 'application/json; charset=utf-8'
before, _, after = str.partition(/; /)
before # => "application/json"
after # => "charset=utf-8"

Actions

Copy link

#3 [ruby-core:92976]

Updated by shevegen (Robert A. Heiler) about 6 years ago

I can see where it may be useful, since it could shorten code like this:

first_part = "hello world!".split(' ').first

To:

first_part = "hello world!.before(' ')

It is not a huge improvement in my opinion, though. (My comment here has
not yet addressed the other part about using regexes - see a bit later for
that.)

I am not a big fan of the names, though. I somehow associate #before and #after
more with time-based operations; and rack/sinatra middleware (route) filters.

I do not have a better or alternative suggestion, although since we already have
delete_prefix, perhaps we could have some methods that return the desired prefix
instead (or suffix).

As for lack of regex support, I think sawa already pointed out that it may be
better to reason for changing delete_prefix and delete_suffix instead. That way
your demonstrated use case could be simplified as well.

Actions

Copy link

#4 [ruby-core:92995]

Updated by kke (Kimmo Lehto) about 6 years ago

Using partition looks reasonable, and it can accept regexes.

It also has the problem of creating extra objects that you need to discard with _ or assign and just leave unused.

I am not a big fan of the names, though. I somehow associate #before and #after
more with time-based operations; and rack/sinatra middleware (route) filters.

How about str.preceding(';') and str.following(';')?

Perhaps str.prior_to(';') and str.behind(';')?

Possibility of opposite reading direction can make these problematic.

str.left_from(';'), str.right_from(';')? Sounds a bit clunky.

Head and tail could be the unixy choice and more versatile for other use cases.

class String
  def head(count = 10, separator = "\n")
    ...
  end

  def tail(count = 10, separator = "\n")
    ...
  end
end

For my example use case, it would become:

str = "application/json; charset=utf-8"
mime = str.head(1, ';')
labels = str.tail(1, ';')

And to emulate something like $ curl xttp://x.example.com | head you would use response.body.head

Actions

Copy link

#5 [ruby-core:93132]

Updated by kke (Kimmo Lehto) about 6 years ago

How about first and last?

'hello world'.first(2)
 => 'he'
'hello world'.last(2)
 => 'ld'
'hello world'.first
 => 'h'
'hello world'.last
 => 'd'
'hello world'.first(1, ' ')
 => 'hello'
'hello world'.last(1, ' ')
 => 'world'
'application/json; charset=utf-8'.first(1, ';')
 => 'application/json'

Actions

Copy link

#6 [ruby-core:93143]

Updated by marcandre (Marc-Andre Lafortune) about 6 years ago

sawa is right. Just use partition and rpartition.

Actions

Copy link Download all files

#7 [ruby-core:93645]

Updated by edd314159 (Edd Morgan) about 6 years ago

File test_mem.rb test_mem.rb added
File test.rb test.rb added
File 2269.diff 2269.diff added

I'd like to add my +1 to this idea. Splitting a string by a substring (and only caring about the first result) is a use case I run into all the time. In fact, the example given by @kke of splitting a Content-Type HTTP header by the semicolon is the one I needed it for most recently.

It's true, partition and rpartition can absolutely achieve the same thing. But they have the side effect of returning (and, of course, allocating) extra String objects that are frequently discarded. This not only negatively impacts performance, but results in less readable code: we have to resort to the convention of prefixing the throwaway variable name with an underscore. This underscore is a convention agreed upon, informally, by humans to indicate the irrelevance of the variable, and I'm sure many Ruby programmers are unaware of the convention, or simply forget about it.

I have suggested an implementation in PR #2269 on Github: https://github.com/ruby/ruby/pull/2269

I also attach the following benchmark to show that when these new methods are used for this use case, performance is ~30% improved for splitting by a String (and moreso when splitting by Regex):

eddmorgan@eddbook ~/Projects/rubydev/build → make run

../ruby/revision.h unchanged
./miniruby -I../ruby/lib -I. -I.ext/common   ../ruby/test.rb
                       user     system      total        real
String#before      0.182367   0.000587   0.182954 (  0.183625)
String#partition   0.303105   0.000877   0.303982 (  0.304961)
                       user     system      total        real
String#after       0.199295   0.000672   0.199967 (  0.200794)
String#partition   0.302300   0.001409   0.303709 (  0.305278)

Actions

Copy link

#8 [ruby-core:95677]

Updated by jonathanhefner (Jonathan Hefner) over 5 years ago

I use monkey-patched versions of these in many of my Ruby scripts. They have a few benefits vs. the alternatives:

vs. split + first / last
- using split can cause an unintended result when the delimiter is not present, e.g. "abc".split("x", 2).last == "abc"
vs. partition
- before and after can be chained, and can result in fewer object allocations
vs. regex + capture group
- before and after are easier to read (and write)

I've also found before_last and after_last helpful for similar reasons.

kke (Kimmo Lehto) wrote:

What should happen if the marker isn't found? In my opinion, before should return the full string and after an empty string.

Regarding before, I agree.

Regarding after, I originally wrote my monkey-patched after to return an empty string, but eventually changed it to return nil. I was hesitant because a nil result can be an unexpected "gotcha", but an empty string seems wrong because it throws away information. For example, if str.after("x") == "", it might be because the delimiter wasn't found, or because the delimiter was at the end of the string. (Compared to str.before("x") == str, which always means the delimiter wasn't found.)

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like1Like0Like0Like1Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #15899

String#before and String#after

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

Updated by sawa (Tsuyoshi Sawada) about 6 years ago

Updated by shevegen (Robert A. Heiler) about 6 years ago

Updated by kke (Kimmo Lehto) about 6 years ago

Updated by kke (Kimmo Lehto) about 6 years ago

Updated by marcandre (Marc-Andre Lafortune) about 6 years ago

Updated by edd314159 (Edd Morgan) about 6 years ago

Updated by jonathanhefner (Jonathan Hefner) over 5 years ago