Project

General

Profile

Actions

Feature #15899

open

String#before and String#after

Added by kke (Kimmo Lehto) over 5 years ago. Updated about 5 years ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:92972]

Description

There seems to be no methods for getting a substring before or after a marker.

Too often I see and have to resort to variations of:

str[/(.+?);/, 1]
str.split(';').first
substr, _ = str.split(';', 2)
str.sub(/.*;/, '')
str[0...str.index(';')]

These create intermediate objects or/and are ugly.

The String#delete_suffix and String#delete_prefix do not accept regexps and thus only can be used if you first figure out the full prefix or suffix.

For this reason, I suggest something like:

> str = 'application/json; charset=utf-8'
> str.before(';')
=> "application/json"
> str.after(';')
=> " charset=utf-8"

What should happen if the marker isn't found? In my opinion, before should return the full string and after an empty string.


Files

test.rb (712 Bytes) test.rb edd314159 (Edd Morgan), 07/09/2019 06:33 PM
test_mem.rb (326 Bytes) test_mem.rb edd314159 (Edd Morgan), 07/09/2019 06:33 PM
2269.diff (3.77 KB) 2269.diff edd314159 (Edd Morgan), 07/09/2019 06:33 PM

Updated by sawa (Tsuyoshi Sawada) over 5 years ago

Since you are mentioning that String#delete_suffix and String#delete_prefix do not accept regexps and that is a weak point, you should better use regexps in the examples illustrating your proposal.

Updated by sawa (Tsuyoshi Sawada) over 5 years ago

Using partition looks reasonable, and it can accept regexes.

str = 'application/json; charset=utf-8'
before, _, after = str.partition(/; /)
before # => "application/json"
after # => "charset=utf-8"

Updated by shevegen (Robert A. Heiler) over 5 years ago

I can see where it may be useful, since it could shorten code like this:

first_part = "hello world!".split(' ').first

To:

first_part = "hello world!.before(' ')

It is not a huge improvement in my opinion, though. (My comment here has
not yet addressed the other part about using regexes - see a bit later for
that.)

I am not a big fan of the names, though. I somehow associate #before and #after
more with time-based operations; and rack/sinatra middleware (route) filters.

I do not have a better or alternative suggestion, although since we already have
delete_prefix, perhaps we could have some methods that return the desired prefix
instead (or suffix).

As for lack of regex support, I think sawa already pointed out that it may be
better to reason for changing delete_prefix and delete_suffix instead. That way
your demonstrated use case could be simplified as well.

Updated by kke (Kimmo Lehto) over 5 years ago

Using partition looks reasonable, and it can accept regexes.

It also has the problem of creating extra objects that you need to discard with _ or assign and just leave unused.

I am not a big fan of the names, though. I somehow associate #before and #after
more with time-based operations; and rack/sinatra middleware (route) filters.

How about str.preceding(';') and str.following(';')?

Perhaps str.prior_to(';') and str.behind(';')?

Possibility of opposite reading direction can make these problematic.

str.left_from(';'), str.right_from(';')? Sounds a bit clunky.

Head and tail could be the unixy choice and more versatile for other use cases.

class String
  def head(count = 10, separator = "\n")
    ...
  end

  def tail(count = 10, separator = "\n")
    ...
  end
end

For my example use case, it would become:

str = "application/json; charset=utf-8"
mime = str.head(1, ';')
labels = str.tail(1, ';')

And to emulate something like $ curl xttp://x.example.com | head you would use response.body.head

Updated by kke (Kimmo Lehto) over 5 years ago

How about first and last?

'hello world'.first(2)
 => 'he'
'hello world'.last(2)
 => 'ld'
'hello world'.first
 => 'h'
'hello world'.last
 => 'd'
'hello world'.first(1, ' ')
 => 'hello'
'hello world'.last(1, ' ')
 => 'world'
'application/json; charset=utf-8'.first(1, ';')
 => 'application/json'

Updated by marcandre (Marc-Andre Lafortune) over 5 years ago

sawa is right. Just use partition and rpartition.

Updated by edd314159 (Edd Morgan) over 5 years ago

I'd like to add my +1 to this idea. Splitting a string by a substring (and only caring about the first result) is a use case I run into all the time. In fact, the example given by @kke (Kimmo Lehto) of splitting a Content-Type HTTP header by the semicolon is the one I needed it for most recently.

It's true, partition and rpartition can absolutely achieve the same thing. But they have the side effect of returning (and, of course, allocating) extra String objects that are frequently discarded. This not only negatively impacts performance, but results in less readable code: we have to resort to the convention of prefixing the throwaway variable name with an underscore. This underscore is a convention agreed upon, informally, by humans to indicate the irrelevance of the variable, and I'm sure many Ruby programmers are unaware of the convention, or simply forget about it.

I have suggested an implementation in PR #2269 on Github: https://github.com/ruby/ruby/pull/2269

I also attach the following benchmark to show that when these new methods are used for this use case, performance is ~30% improved for splitting by a String (and moreso when splitting by Regex):

eddmorgan@eddbook ~/Projects/rubydev/build  make run

../ruby/revision.h unchanged
./miniruby -I../ruby/lib -I. -I.ext/common   ../ruby/test.rb
                       user     system      total        real
String#before      0.182367   0.000587   0.182954 (  0.183625)
String#partition   0.303105   0.000877   0.303982 (  0.304961)
                       user     system      total        real
String#after       0.199295   0.000672   0.199967 (  0.200794)
String#partition   0.302300   0.001409   0.303709 (  0.305278)

Updated by jonathanhefner (Jonathan Hefner) about 5 years ago

I use monkey-patched versions of these in many of my Ruby scripts. They have a few benefits vs. the alternatives:

  • vs. split + first / last
    • using split can cause an unintended result when the delimiter is not present, e.g. "abc".split("x", 2).last == "abc"
  • vs. partition
    • before and after can be chained, and can result in fewer object allocations
  • vs. regex + capture group
    • before and after are easier to read (and write)

I've also found before_last and after_last helpful for similar reasons.

kke (Kimmo Lehto) wrote:

What should happen if the marker isn't found? In my opinion, before should return the full string and after an empty string.

Regarding before, I agree.

Regarding after, I originally wrote my monkey-patched after to return an empty string, but eventually changed it to return nil. I was hesitant because a nil result can be an unexpected "gotcha", but an empty string seems wrong because it throws away information. For example, if str.after("x") == "", it might be because the delimiter wasn't found, or because the delimiter was at the end of the string. (Compared to str.before("x") == str, which always means the delimiter wasn't found.)

Actions

Also available in: Atom PDF

Like0
Like0Like1Like0Like0Like1Like0Like0Like0