In the example below, adding a vertical bar to the end of a regular expression changes what is being matched by the preceding regular expression. ``` irb(main):001:0> /(|a){3}b/.match("aab") => #<MatchData "aab" 1:""> irb(main):002...jirkamarsik (Jirka Marsik)
In TruffleRuby, we implement null checks which take into account both matching position and captures, in the spirit of Ruby's regular expressions. Both of the examples from your original issue description evaluate to `0`, and `/((?=(a)))...jirkamarsik (Jirka Marsik)
As I understand it, the idea behind the null check is for the regex matcher to be able to identify unproductive branches in the regex execution, branches which are guaranteed to never terminate. When executing the expression `X*`, where ...jirkamarsik (Jirka Marsik)
The regular expression engine can sometimes produce wrong results when using multiplex backreferences near the end of the input string. ``` ruby irb(main):001:0> /(?<x>a)(?<x>aa)\k<x>/.match("aaaaa") => #<MatchData "aaaaa" x:"a" x:"...jirkamarsik (Jirka Marsik)
duerst (Martin Dürst) wrote in #note-2: > Just a question: What's the purpose of nested character classes? They are useful in combination with the set intersection operator `&&`. They let you, e.g., exclude characters from some chara...jirkamarsik (Jirka Marsik)
``` irb(main):001:0> /[^a-c]/i.match("A") => nil irb(main):002:0> /[[^a-c]]/i.match("A") => #<MatchData "A"> ``` The two regular expressions above match different strings, because the character classes denote different sets of ch...jirkamarsik (Jirka Marsik)
Some Unicode characters case-fold to strings of multiple code points, e.g. the ligature `\ufb00` can match the string `ff`. ``` irb(main):001:0> /\A[\ufb00]\z/i.match("\ufb00") => #<MatchData "ff"> irb(main):002:0> /\A[\ufb00]\z/i.m...jirkamarsik (Jirka Marsik)