Project

General

Profile

Feature #15771

Updated by sawa (Tsuyoshi Sawada) 6 months ago

When `String#split`'s In String#split, when separator is a single space character, it executes under `split_type: awk`. execute as split_type: awk. 

 The For example, CSV library detours this behavior handles it like [this](https://github.com/ruby/csv/blob/7ff57a50e81c368029fa9b664700bec4a456b81b/lib/csv/parser.rb#L508-L512): this. 
 https://github.com/ruby/csv/blob/7ff57a50e81c368029fa9b664700bec4a456b81b/lib/csv/parser.rb#L508-L512 

 ```ruby ``` 
 if @column_separator == " ".encode(@encoding) 
   @split_column_separator = Regexp.new(@escaped_column_separator) 
 else 
   @split_column_separator = @column_separator 
 end 
 ``` 

 Unfortunately, using a in this case regexp here makes it is slower than using a string. [The For example, 
 the following result](https://github.com/284km/benchmarks_no_yatu#stringsplitstring-or-regexp) shows it result is about nine 9 times slower. 
 https://github.com/284km/benchmarks_no_yatu#stringsplitstring-or-regexp 

 ```sh ``` 
 $ be benchmark-driver string_split_string-regexp.yml --rbenv '2.6.2' 
 Comparison: 
               string:     3161117.6 i/s 
               regexp:      344448.0 i/s - 9.18x    slower 
 ``` 

 So I want to add a `:literal` the :literal option to execute the method under `split_type: string`. run as split_type: string. 

 ### # Implementation 

 - https://github.com/284km/ruby/tree/split_space 
     - test code: https://github.com/284km/ruby/blob/split_space/test/ruby/test_string.rb#L1708-L1713 

 This change will result in the following: 

 ```ruby ``` 
 " a    b     c      ".split(" ") 
 # => ["a", "b", "c"] 
 " a    b     c      ".split(" ", -1) 
 # => ["a", "b", "c", ""] 
 " a    b     c      ".split(" ", literal: true) 
 # => ["", "a", "", "b", "", "", "c"] 
 " a    b     c      ".split(" ", -1, literal: true) 
 # => ["", "a", "", "b", "", "", "c", "", "", "", ""] 
 ``` 

Back