Bug #18337


Ruby allows zero-width characters in identifiers

Added by duerst (Martin Dürst) 2 months ago. Updated 2 months ago.

Target version:


Ruby allows zero-width characters in identifiers, which can be shown with the following small test:

irb(main):001:0> script = "ab = 20; a\u200Bb = 30; puts ab;"
=> "ab = 20; a​b = 30; puts ab;"
irb(main):002:0> eval(script)
=> nil

The first line creates the script. It contains a zero-width space (ZWSP), but that's not visible in most contexts (see next line). Looking at the script, one expects 30 as an output, but the output is 20 because there are two variables involved, one with a ZWSP and one without. I propose we fix this by disallowing such characters in identifiers. I'll give more details in a followup.

Related issues

Related to Ruby master - Feature #18336: How to deal with Trojan Source vulnerabilityFeedbackActions
Actions #1

Updated by duerst (Martin Dürst) 2 months ago

  • Related to Feature #18336: How to deal with Trojan Source vulnerability added

Updated by duerst (Martin Dürst) 2 months ago

The "Trojan source" paper (, in section VII.D, says the following:
"That said, our experimental evidence suggests that this theoretical attack already has defenses employed against it by most modern compilers, and thus is unlikely to work in practice."

My suspicion is that this is because most languages that extend identifier syntax to Unicode do this following Unicode® Standard Annex #31,
Unicode Identifier and Pattern Syntax ( Written in Ruby, that document defines identifiers essentially as anything matching /^\p{id_start}\p{id_continue}*$/. It shouldn't be too difficult to do that in Ruby.

Updated by duerst (Martin Dürst) 2 months ago

  • Assignee set to duerst (Martin Dürst)
  • Status changed from Open to Assigned

As far as I remember the discussion at the recent developers' meeting, we discussed the fact that Ruby currently allows to use unassigned code points in identifiers, and that this was probably being too lose. Also, the fact that Ruby, in contrast to other languages, allows multiple encodings for the source code makes implementing this feature somewhat more difficult. I'll try to create a patch to improve the situation. When such a patch is available, we can discuss again.


Also available in: Atom PDF