Project

General

Profile

Actions

Bug #19728

closed

Automate (checking of) Regexp character property documentation

Added by duerst (Martin Dürst) 11 months ago. Updated 10 months ago.

Status:
Closed
Target version:
-
[ruby-core:113895]

Description

This came up in a discussion at https://github.com/ruby/ruby/pull/7923.

The documentation at doc/regexp.rdoc currently contains a list of character properties that can be used in regular expressions. But there is no guarantee that this list is updated when the Unicode version is updated.

One idea is to create a ruby equivalent of https://github.com/k-takata/Onigmo/blob/master/tool/update-doc.py. Another idea is to just write a test that checks enc/unicode/$UNICODE_VERSION/name2ctype.h against the relevant part of the documentation file. This might make it easier for the documentation to be rewritten while guaranteeing that no properties get forgotten.

Updated by janosch-x (Janosch Müller) 11 months ago

How about doing it in enc-unicode.rb?

On the one hand, this script is a bit convoluted as it is, and does not need another responsibility.

On the other hand, it already passes a (quote) "human-friendly name for the group" to its #make_const method for every property that it creates, and the sections of the document could be based on that. It also has the abbreviations (e.g. LL for lowercase letter) available in its aliases variable. Generating the doc here would ensure an exact match of docs and code, whereas a test would probably not ensure e.g. that properties are in the correct section of the doc.

Updated by janosch-x (Janosch Müller) 11 months ago

I found that enc-unicode.rb deals with some inconsistent unicode data (i.e. some data which uses short property names and some data which uses long names), so it doesn't provide much useful context. I've made a PR to create documentation from the result instead: https://github.com/ruby/ruby/pull/7944

Actions #3

Updated by janosch-x (Janosch Müller) 10 months ago

  • Status changed from Open to Closed

Applied in changeset git|08b3fb11524e6cde453476f24ac80fd60457dfef.


[Bug #19728] Auto-generate unicode property docs

https://bugs.ruby-lang.org/issues/19728

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0