Project

General

Profile

Feature #15771

Add `String#split` option to set `split_type string` with a single space separator

Added by 284km (kazuma furuhashi) over 1 year ago. Updated 5 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:92301]

Description

When String#split's separator is a single space character, it executes under split_type: awk.

When you want to split literally by a single space " ", and not a sequence of space characters, you need to take special care. For example, the CSV library detours this behavior like this:

if @column_separator == " ".encode(@encoding)
  @split_column_separator = Regexp.new(@escaped_column_separator)
else
  @split_column_separator = @column_separator
end

Unfortunately, using a regexp here makes it slower than using a string. The following result shows it is about nine times slower.

$ be benchmark-driver string_split_string-regexp.yml --rbenv '2.6.2'
Comparison:
              string:   3161117.6 i/s
              regexp:    344448.0 i/s - 9.18x  slower

I want to add a :literal option to execute the method under split_type: string as follows:

" a  b   c    ".split(" ")                    # => ["a", "b", "c"]
" a  b   c    ".split(" ", literal: true)     # => ["", "a", "", "b", "", "", "c"]
" a  b   c    ".split(" ", -1)                # => ["a", "b", "c", ""]
" a  b   c    ".split(" ", -1, literal: true) # => ["", "a", "", "b", "", "", "c", "", "", "", ""]

Implementation

Also available in: Atom PDF