Project

General

Profile

Bug #21870

Updated by jneen (Jeanine Adkisson) 2 months ago

```ruby 
 $VERBOSE = true 
 # warning: character class has duplicated range: /[\p{Word}\p{S}]/ 
 regex = /[\p{Word}\p{S}]/ 
 ``` 

 As far as I can tell this is a perfectly valid and non-redundant set of unicode properties, but I am still being spammed with warnings. Using `/(?:\p{Word}|\p{S})/` is kind of a workaround, but it is slower (see benchmarks below), and also less clear. 

 They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges. 

 For a similar example, consider `/[\p{Word}\p{Cf}]/`, which overlap precisely on ZWJ and ZWNJ. Even with this very small overlap, Ruby issues a warning, despite neither class being removable without changing the meaning of the regexp. The regexp is valid and as far as I can tell has no practical issues - Onigmo seems to be capable of intersecting overlapping codepoint ranges. 

 This warning was introduced back in 2009 2011 with #1831, to help surface instances of things like `/[:lower:]/` instead of `/[[:lower:]]/`, but even then the reporter suggested only warning if the class both begins and ends with `:`. 

 Is it appropriate to warn here? Is this a job best left to a static linter like Rubocop, which didn't exist at the time #1831 was opened? Or perhaps would it be better to warn only in the very specific case that #1831 was opened to address?

Back