Feature #21552
openallow String.strip and similar to take a parameter similar to String.delete
Description
Regrading String.strip (and lstrip, rstrip, and ! versions)
Some text data representations differentiate between what one might call vertical and horizontal white space, and the 'strip' methods currently strip both.
It would be helpful if they had an optional parameter similar to String.delete with a one multi-character selector, so one could do:
t = str.strip " \t"
One can use a regex for this, but this much simpler.
Updated by Dan0042 (Daniel DeLorme) 3 months ago
Agreed. I tend to use str.sub(/[\ \t]+\z/,'') for this, but an end-anchored regexp has pretty bad worst-case performance. Try to benchmark the previous when str = " "*1000+"a" ๐ฆ
Updated by mame (Yusuke Endoh) 24 days ago
- Related to Feature #7845: Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1) added
Updated by shugo (Shugo Maeda) 3 days ago
I just heard someone ask for a strip function that doesn't remove NUL characters.
Since Python's str.strip takes an optional argument, it might be a good idea to introduce a similar feature.
I've created a pull request at https://github.com/ruby/ruby/pull/15400 and here's a benchmark result:
voyager:ruby$ cat benchmark_strip.rb (git)-[feature/allow-strip-to-take[0/1816]
require "benchmark"
TARGET = " \t\r\n\f\v\0" + "x" * 1024 + "\0 \t\r\n\f\v"
Benchmark.bmbm do |x|
x.report("strip") do
10000.times do
TARGET.strip
end
end
x.report("gsub") do
10000.times do
TARGET.gsub(/\A\s+|\s+\z/, "")
end
end
x.report('strip(" \t\r\n\f\v")') do
10000.times do
TARGET.strip(" \t\r\n\f\v")
end
end
end
voyager:ruby$ ./tool/runruby.rb benchmark_strip.rb (git)-[feature/allow-strip-to-take-chars]
Rehearsal --------------------------------------------------------
strip 0.005475 0.000065 0.005540 ( 0.005546)
gsub 0.022467 0.000000 0.022467 ( 0.022470)
strip(" \t\r\n\f\v") 0.004772 0.000000 0.004772 ( 0.004773)
----------------------------------------------- total: 0.032779sec
user system total real
strip 0.000759 0.000961 0.001720 ( 0.001720)
gsub 0.019911 0.000000 0.019911 ( 0.019912)
strip(" \t\r\n\f\v") 0.004958 0.000000 0.004958 ( 0.004961)
Updated by shugo (Shugo Maeda) 3 days ago
Suggested by nobu, I've added documentation and tests for character selectors: https://github.com/ruby/ruby/pull/15400/commits/a9ad44007dbb0ea543ce1eb8748edd4213083c5f
Exmaples:
"012abc345".strip("0-9") # "abc"
"012abc345".strip("^a-z") # "abc"
Unlike String#delete, the current implementation doesn't take multiple arguments.
I'm not sure whether there's a use case for it.
Updated by shugo (Shugo Maeda) 2 days ago
shugo (Shugo Maeda) wrote in #note-4:
Unlike String#delete, the current implementation doesn't take multiple arguments.
I'm not sure whether there's a use case for it.
I've noticed that String#count also take multiple selectors, so I've applied the same changes to String#strip etc. for consistency.
Updated by mame (Yusuke Endoh) 2 days ago
I'm not strongly opposed, but this kind of API that use a string to represent a collection of characters feel outdated. It is sometimes convenient, though.
Updated by KitaitiMakoto (็ ๅๅธ) 1 day ago
ยท Edited
Thank you, shugo.
"someone" he says is me. My use case is here.
I want to extract chunks from a file and pass them to a neural network model to detect the file type. The model requires two chunks: the lstripped beggining portion and the rstripped ending portion, except that null characters must not be stripped. It's useful if I can call:
beg_portion.lstrip("\t\n\v\f\r ") # ["\t", "\n", "\v," "\f," "\r", " "] or `/\s/` is preferred?
end_portion.rstrip("\t\n\v\f\r ")
I'm not sure why the model requires such chunks, but I guess it was trained in Python framework and Python's strip family doesn't strip null characters by default.
As an aside, I was surprised when I saw null characters were stripped by lstrip and rstrip because I'm familiar with Regexp's \s as "whitespace", though the String's documentation explains what is "whitespace". It might be a signal to notice what characters are stripped if the methods accept the argument.
Tips:
For the case of str = " "*1000+"a", reverseing it gets faster than using \s+\z:
str.sub(/\A\s+/, "").reverse.sub(/\A\s+/, "").reverse
But, if many poeple use the trick just for speed, I don't hope such situation.