Project

General

Profile

Actions

Bug #10416

open

Create mechanism for updating of Unicode data files downstreams when we want

Added by duerst (Martin Dürst) almost 10 years ago. Updated 5 months ago.

Status:
Assigned
Target version:
-
ruby -v:
ruby 2.2.0dev (2014-10-22 trunk 48092) [x86_64-cygwin]
[ruby-core:65843]
Tags:

Description

The current mechanism for updating Unicode data files will create the following problem:
Downstream compilers/packagers will download Unicode data files ONE time (they may already have done so).

However, if they don't activate ALWAYS_UPDATE_UNICODE = yes, these files will never get updated, and they will stay on Unicode version 7.0 even if in five years Unicode is e.g. on version 12.0.
On the other hand, if they activate ALWAYS_UPDATE_UNICODE = yes (and assuming issue #10415 gets fixed), they constantly update to the latest version of Unicode. That's good for those who actually want this, but now what our current policy is.
What's missing is that we (Ruby core) can make sure downstream checkouts update to a new Unicode version when we want then to do so (as we e.g. can do for other parts that are based on Unicode data, see e.g. https://bugs.ruby-lang.org/issues/9092), without sending an email to everybody and hoping they read and follow it.

[Currently, the only solution I know will work is the one pointed out by Yui Naruse in https://bugs.ruby-lang.org/issues/10084#note-17, but I'm okay with any other solution.]


Related issues 3 (2 open1 closed)

Related to Ruby master - Bug #10458: After r48196, make cannot complete because of Unicode file download problemClosednobu (Nobuyoshi Nakada)Actions
Related to Ruby master - Feature #19171: Update Unicode data to Unicode Version 15.1Assignedduerst (Martin Dürst)Actions
Related to Ruby master - Feature #19908: Update to Unicode 15.1Assignedduerst (Martin Dürst)Actions

Updated by nobu (Nobuyoshi Nakada) almost 10 years ago

It affects only developers who build from the repository.
Released packages should have the latest (and fixed) version at the release time.

Updated by naruse (Yui NARUSE) almost 10 years ago

For years, file structures of Unicode Data was changed some times.
Therefore there's no guarantee that Unicode 12 can work with the current script.

Updated by duerst (Martin Dürst) almost 10 years ago

Yui NARUSE wrote:

For years, file structures of Unicode Data was changed some times.
Therefore there's no guarantee that Unicode 12 can work with the current script.

I agree (but see last paragraph of this comment). But that's not what this issue is about.

What I'm talking about is that next year, at some point in time, we decide that ruby trunk is upgraded to Unicode 8.0 (and so on probably every year). This was the case this year for Unicode 7.0, see issue #9092.

We do this after checking that the new Unicode data files work with the current script (first the beta files and then the final releases), and if they don't work, then we upgrade the script. Then we commit, and everybody on trunk gets the changes when they update. But currently, this is not the case for the Unicode data files, and people on trunk will have to use a special effort to upgrade.

Besides committing lib/unicode_normalize/tables.rb (nobu reverted it but didn't give any reason why), there's another way to achieve this goal:

Note in a file the versions or timestaps of the 'official' version of the Ruby trunk Unicode data files. This could be part of a .mk file, or a new file. Of the three files we currently download, two have a header (first two lines) like this:

# NormalizationTest-7.0.0.txt
# Date: 2013-11-27, 09:54:41 GMT [MD]

So we could note the version and/or date we want people on trunk to use, and check against it. But one file, UnicodeData.txt, doesn't contain the information in the file, so we have to rely on the date of the Last-Modified http header (which we already use to avoid repeated downloads of the same file).

The reason why UnicodeData.txt doesn't contain is these header lines is that this is a very old file and the Unicode Consortium is actually quite careful to not make any changes that could affect the users of a file. If data of a different type is needed, then it is provided in a separate file.

Updated by duerst (Martin Dürst) almost 10 years ago

I committed r48194, switching the download location to http://www.unicode.org/Public/7.0.0/ucd/ (i.e. Unicode Version 7.0.0), as discussed at the meeting yesterday. This does not yet address this bug, because when we change this to http://www.unicode.org/Public/8.0.0/ucd/ next year, the new files won't automatically be downloaded.

Updated by duerst (Martin Dürst) almost 10 years ago

  • Related to Bug #10458: After r48196, make cannot complete because of Unicode file download problem added
Actions #6

Updated by naruse (Yui NARUSE) over 6 years ago

  • Target version deleted (2.2.0)

Updated by jeremyevans0 (Jeremy Evans) almost 3 years ago

@duerst (Martin Dürst) Do you know if this is still in issue in the master branch?

Updated by duerst (Martin Dürst) almost 3 years ago

jeremyevans0 (Jeremy Evans) wrote in #note-7:

@duerst (Martin Dürst) Do you know if this is still in issue in the master branch?

  • I suspect it is still "an issue", i.e. it still happens.
  • Nobody has complained about it, and so it may be that it's an irrelevant issue.
  • I will update Ruby to Unicode 14.0.0 in a couple weeks or so, and will look out for this, and then either close it or push it forward.

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

@duerst (Martin Dürst) Do you think this can be closed?

Actions #10

Updated by duerst (Martin Dürst) about 1 year ago

  • Related to Feature #19171: Update Unicode data to Unicode Version 15.1 added

Updated by duerst (Martin Dürst) about 1 year ago

The next version of Unicode (15.1) will be released in about 3 weeks. I'll check at that point, and close if no longer relevant.

Updated by nobu (Nobuyoshi Nakada) 11 months ago

The current enc-unicode.rb seems to fail because of Indic_Conjunct_break properties with values.

I'm not sure how these properties should be handled well.
/\p{InCB_Liner}/ or /\p{InCB=Liner}/ as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.

Actions #13

Updated by nobu (Nobuyoshi Nakada) 11 months ago

Actions #14

Updated by hsbt (Hiroshi SHIBATA) 5 months ago

  • Status changed from Open to Assigned
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0