Bug #18337: Ruby allows zero-width characters in identifiers - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #18337

open

Ruby allows zero-width characters in identifiers

Added by duerst (Martin Dürst) over 3 years ago. Updated over 3 years ago.

Status:

Assigned

Assignee:

duerst (Martin Dürst)

Target version:

ruby -v:

Backport:

2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN

[ruby-core:106056]

Description

Ruby allows zero-width characters in identifiers, which can be shown with the following small test:

irb(main):001:0> script = "ab = 20; a\u200Bb = 30; puts ab;"
=> "ab = 20; ab = 30; puts ab;"
irb(main):002:0> eval(script)
20
=> nil

The first line creates the script. It contains a zero-width space (ZWSP), but that's not visible in most contexts (see next line). Looking at the script, one expects 30 as an output, but the output is 20 because there are two variables involved, one with a ZWSP and one without. I propose we fix this by disallowing such characters in identifiers. I'll give more details in a followup.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by duerst (Martin Dürst) over 3 years ago

Related to Feature #18336: How to deal with Trojan Source vulnerability added

Actions

Copy link

#2 [ruby-core:106062]

Updated by duerst (Martin Dürst) over 3 years ago

The "Trojan source" paper (https://www.trojansource.codes/trojan-source.pdf), in section VII.D, says the following:
"That said, our experimental evidence suggests that this theoretical attack already has defenses employed against it by most modern compilers, and thus is unlikely to work in practice."

My suspicion is that this is because most languages that extend identifier syntax to Unicode do this following Unicode® Standard Annex #31,
Unicode Identifier and Pattern Syntax (https://www.unicode.org/reports/tr31/). Written in Ruby, that document defines identifiers essentially as anything matching /^\p{id_start}\p{id_continue}*$/. It shouldn't be too difficult to do that in Ruby.

Actions

Copy link

#3 [ruby-core:106250]

Updated by duerst (Martin Dürst) over 3 years ago

Status changed from Open to Assigned
Assignee set to duerst (Martin Dürst)

As far as I remember the discussion at the recent developers' meeting, we discussed the fact that Ruby currently allows to use unassigned code points in identifiers, and that this was probably being too lose. Also, the fact that Ruby, in contrast to other languages, allows multiple encodings for the source code makes implementing this feature somewhat more difficult. I'll try to create a patch to improve the situation. When such a patch is available, we can discuss again.

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #18337

Ruby allows zero-width characters in identifiers

Updated by duerst (Martin Dürst) over 3 years ago

Updated by duerst (Martin Dürst) over 3 years ago

Updated by duerst (Martin Dürst) over 3 years ago