Bug #19728
closed
Automate (checking of) Regexp character property documentation
Added by duerst (Martin Dürst) over 1 year ago.
Updated over 1 year ago.
Description
This came up in a discussion at https://github.com/ruby/ruby/pull/7923.
The documentation at doc/regexp.rdoc currently contains a list of character properties that can be used in regular expressions. But there is no guarantee that this list is updated when the Unicode version is updated.
One idea is to create a ruby equivalent of https://github.com/k-takata/Onigmo/blob/master/tool/update-doc.py. Another idea is to just write a test that checks enc/unicode/$UNICODE_VERSION/name2ctype.h against the relevant part of the documentation file. This might make it easier for the documentation to be rewritten while guaranteeing that no properties get forgotten.
How about doing it in enc-unicode.rb?
On the one hand, this script is a bit convoluted as it is, and does not need another responsibility.
On the other hand, it already passes a (quote) "human-friendly name for the group" to its #make_const
method for every property that it creates, and the sections of the document could be based on that. It also has the abbreviations (e.g. LL for lowercase letter) available in its aliases
variable. Generating the doc here would ensure an exact match of docs and code, whereas a test would probably not ensure e.g. that properties are in the correct section of the doc.
I found that enc-unicode.rb
deals with some inconsistent unicode data (i.e. some data which uses short property names and some data which uses long names), so it doesn't provide much useful context. I've made a PR to create documentation from the result instead: https://github.com/ruby/ruby/pull/7944
- Status changed from Open to Closed
Also available in: Atom
PDF
Like0
Like0Like0Like0