Create mechanism for updating of Unicode data files downstreams when we want
The current mechanism for updating Unicode data files will create the following problem:
Downstream compilers/packagers will download Unicode data files ONE time (they may already have done so).
However, if they don't activate ALWAYS_UPDATE_UNICODE = yes, these files will never get updated, and they will stay on Unicode version 7.0 even if in five years Unicode is e.g. on version 12.0.
On the other hand, if they activate ALWAYS_UPDATE_UNICODE = yes (and assuming issue #10415 gets fixed), they constantly update to the latest version of Unicode. That's good for those who actually want this, but now what our current policy is.
What's missing is that we (Ruby core) can make sure downstream checkouts update to a new Unicode version when we want then to do so (as we e.g. can do for other parts that are based on Unicode data, see e.g. https://bugs.ruby-lang.org/issues/9092), without sending an email to everybody and hoping they read and follow it.
[Currently, the only solution I know will work is the one pointed out by Yui Naruse in https://bugs.ruby-lang.org/issues/10084#note-17, but I'm okay with any other solution.]
#3 [ruby-core:65932] Updated by duerst (Martin Dürst) almost 4 years ago
Yui NARUSE wrote:
For years, file structures of Unicode Data was changed some times.
Therefore there's no guarantee that Unicode 12 can work with the current script.
I agree (but see last paragraph of this comment). But that's not what this issue is about.
What I'm talking about is that next year, at some point in time, we decide that ruby trunk is upgraded to Unicode 8.0 (and so on probably every year). This was the case this year for Unicode 7.0, see issue #9092.
We do this after checking that the new Unicode data files work with the current script (first the beta files and then the final releases), and if they don't work, then we upgrade the script. Then we commit, and everybody on trunk gets the changes when they update. But currently, this is not the case for the Unicode data files, and people on trunk will have to use a special effort to upgrade.
Besides committing lib/unicode_normalize/tables.rb (nobu reverted it but didn't give any reason why), there's another way to achieve this goal:
Note in a file the versions or timestaps of the 'official' version of the Ruby trunk Unicode data files. This could be part of a .mk file, or a new file. Of the three files we currently download, two have a header (first two lines) like this:
Date: 2013-11-27, 09:54:41 GMT [MD]¶
So we could note the version and/or date we want people on trunk to use, and check against it. But one file, UnicodeData.txt, doesn't contain the information in the file, so we have to rely on the date of the Last-Modified http header (which we already use to avoid repeated downloads of the same file).
The reason why UnicodeData.txt doesn't contain is these header lines is that this is a very old file and the Unicode Consortium is actually quite careful to not make any changes that could affect the users of a file. If data of a different type is needed, then it is provided in a separate file.
#4 [ruby-core:66013] Updated by duerst (Martin Dürst) almost 4 years ago
I committed r48194, switching the download location to http://www.unicode.org/Public/7.0.0/ucd/ (i.e. Unicode Version 7.0.0), as discussed at the meeting yesterday. This does not yet address this bug, because when we change this to http://www.unicode.org/Public/8.0.0/ucd/ next year, the new files won't automatically be downloaded.