Project

General

Profile

Actions

Feature #21552

open

allow String.strip and similar to take a parameter similar to String.delete

Feature #21552: allow String.strip and similar to take a parameter similar to String.delete
1

Added by MSP-Greg (Greg L) 3 months ago. Updated 1 day ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:123063]

Description

Regrading String.strip (and lstrip, rstrip, and ! versions)

Some text data representations differentiate between what one might call vertical and horizontal white space, and the 'strip' methods currently strip both.

It would be helpful if they had an optional parameter similar to String.delete with a one multi-character selector, so one could do:

t = str.strip " \t"

One can use a regex for this, but this much simpler.


Related issues 1 (1 open0 closed)

Related to Ruby - Feature #7845: Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1)OpenActions

Updated by Dan0042 (Daniel DeLorme) 3 months ago Actions #1 [ruby-core:123233]

Agreed. I tend to use str.sub(/[\ \t]+\z/,'') for this, but an end-anchored regexp has pretty bad worst-case performance. Try to benchmark the previous when str = " "*1000+"a" ๐Ÿ˜ฆ

Updated by mame (Yusuke Endoh) 24 days ago Actions #2

  • Related to Feature #7845: Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1) added

Updated by shugo (Shugo Maeda) 3 days ago Actions #3 [ruby-core:124019]

I just heard someone ask for a strip function that doesn't remove NUL characters.
Since Python's str.strip takes an optional argument, it might be a good idea to introduce a similar feature.

I've created a pull request at https://github.com/ruby/ruby/pull/15400 and here's a benchmark result:

voyager:ruby$ cat benchmark_strip.rb                                          (git)-[feature/allow-strip-to-take[0/1816]
require "benchmark"

TARGET = " \t\r\n\f\v\0" + "x" * 1024 + "\0 \t\r\n\f\v"

Benchmark.bmbm do |x|
  x.report("strip") do
    10000.times do
      TARGET.strip
    end
  end

  x.report("gsub") do
    10000.times do
      TARGET.gsub(/\A\s+|\s+\z/, "")
    end
  end

  x.report('strip(" \t\r\n\f\v")') do
    10000.times do
      TARGET.strip(" \t\r\n\f\v")
    end
  end
end
voyager:ruby$ ./tool/runruby.rb benchmark_strip.rb                            (git)-[feature/allow-strip-to-take-chars]
Rehearsal --------------------------------------------------------
strip                  0.005475   0.000065   0.005540 (  0.005546)
gsub                   0.022467   0.000000   0.022467 (  0.022470)
strip(" \t\r\n\f\v")   0.004772   0.000000   0.004772 (  0.004773)
----------------------------------------------- total: 0.032779sec

                           user     system      total        real
strip                  0.000759   0.000961   0.001720 (  0.001720)
gsub                   0.019911   0.000000   0.019911 (  0.019912)
strip(" \t\r\n\f\v")   0.004958   0.000000   0.004958 (  0.004961)

Updated by shugo (Shugo Maeda) 3 days ago Actions #4 [ruby-core:124021]

Suggested by nobu, I've added documentation and tests for character selectors: https://github.com/ruby/ruby/pull/15400/commits/a9ad44007dbb0ea543ce1eb8748edd4213083c5f

Exmaples:

"012abc345".strip("0-9") # "abc"
"012abc345".strip("^a-z") # "abc"

Unlike String#delete, the current implementation doesn't take multiple arguments.
I'm not sure whether there's a use case for it.

Updated by shugo (Shugo Maeda) 2 days ago Actions #5 [ruby-core:124031]

shugo (Shugo Maeda) wrote in #note-4:

Unlike String#delete, the current implementation doesn't take multiple arguments.
I'm not sure whether there's a use case for it.

I've noticed that String#count also take multiple selectors, so I've applied the same changes to String#strip etc. for consistency.

Updated by mame (Yusuke Endoh) 2 days ago Actions #6 [ruby-core:124035]

I'm not strongly opposed, but this kind of API that use a string to represent a collection of characters feel outdated. It is sometimes convenient, though.

Updated by KitaitiMakoto (็œŸ ๅŒ—ๅธ‚) 1 day ago ยท Edited Actions #7 [ruby-core:124039]

Thank you, shugo.

"someone" he says is me. My use case is here.

I want to extract chunks from a file and pass them to a neural network model to detect the file type. The model requires two chunks: the lstripped beggining portion and the rstripped ending portion, except that null characters must not be stripped. It's useful if I can call:

beg_portion.lstrip("\t\n\v\f\r ") # ["\t", "\n", "\v," "\f," "\r", " "] or `/\s/` is preferred?
end_portion.rstrip("\t\n\v\f\r ")

I'm not sure why the model requires such chunks, but I guess it was trained in Python framework and Python's strip family doesn't strip null characters by default.

As an aside, I was surprised when I saw null characters were stripped by lstrip and rstrip because I'm familiar with Regexp's \s as "whitespace", though the String's documentation explains what is "whitespace". It might be a signal to notice what characters are stripped if the methods accept the argument.

Tips:
For the case of str = " "*1000+"a", reverseing it gets faster than using \s+\z:

str.sub(/\A\s+/, "").reverse.sub(/\A\s+/, "").reverse

But, if many poeple use the trick just for speed, I don't hope such situation.

Actions

Also available in: PDF Atom