Project

General

Profile

Actions

Backport #4489

closed

[PATCH] Encodings with /-(unix|dos|mac)\Z/

Added by quix (James M. Lawrence) about 13 years ago. Updated over 12 years ago.

Status:
Closed
[ruby-core:35476]

Description

=begin
The *-dos and *-mac meta-encodings were not being recognized. utf-8-mac is a special case since it corresponds to a real encoding; its meta-encoding is utf-8-mac-mac.
=end


Files

parse.y.patch (716 Bytes) parse.y.patch quix (James M. Lawrence), 03/10/2011 01:52 PM

Updated by naruse (Yui NARUSE) about 13 years ago

  • Status changed from Open to Rejected

=begin
Emacs can't recognize the name "utf-8-mac", it uses utf-8-hfs.
So this patch won't work well.

When people want to write "coding: utf-8-mac-mac", they should write "coding: utf-8-hfs-mac".
=end

Updated by quix (James M. Lawrence) about 13 years ago

=begin
Notice the patch corrects a typo (5 instead of 4) which has been there since 2007.
Ruby accidentally recognizes utf-8-mac as UTF8-MAC encoding because of the typo.
That accident is, fortunately, correct behavior.

However you think it should work, the code below the typo is not being executed.

  • if (len > 4 && name[nlen = len - 5] == '-') {
  • if (len > 4 && name[nlen = len - 4] == '-') {

=end

Updated by naruse (Yui NARUSE) about 13 years ago

James M. Lawrence wrote:

Notice the patch corrects a typo (5 instead of 4) which has been there since 2007.
Ruby accidentally recognizes utf-8-mac as UTF8-MAC encoding because of the typo.
That accident is, fortunately, correct behavior.

However you think it should work, the code below the typo is not being executed.

  • if (len > 4 && name[nlen = len - 5] == '-') {
  • if (len > 4 && name[nlen = len - 4] == '-') {

I fixed it at r31085, thank you for reporting.

Actions #4

Updated by naruse (Yui NARUSE) about 13 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby master to Backport192
  • Status changed from Rejected to Assigned
  • Assignee set to yugui (Yuki Sonoda)

=begin

=end

Updated by quix (James M. Lawrence) about 13 years ago

=begin
I assume you realize that you have broken backwards compatibility by
rejecting the second part of the patch.

% cat mac.rb

encoding: utf-8-mac

puts "foo".encoding

% ruby19dev -v mac.rb
ruby 1.9.3dev (2011-03-10 trunk 31080) [i386-darwin9.8.0]
UTF8-MAC

% ./ruby -v mac.rb
ruby 1.9.3dev (2011-03-10 trunk 31085) [i386-darwin9.8.0]
UTF-8

The full patch keeps compatibility. You may wish to break it, which is
fine as long as nobody blames me :)

=end

Updated by naruse (Yui NARUSE) about 13 years ago

James M. Lawrence wrote:

I assume you realize that you have broken backwards compatibility by
rejecting the second part of the patch.

% cat mac.rb

encoding: utf-8-mac

puts "foo".encoding

% ruby19dev -v mac.rb
ruby 1.9.3dev (2011-03-10 trunk 31080) [i386-darwin9.8.0]
UTF8-MAC

% ./ruby -v mac.rb
ruby 1.9.3dev (2011-03-10 trunk 31085) [i386-darwin9.8.0]
UTF-8

The full patch keeps compatibility. You may wish to break it, which is
fine as long as nobody blames me :)

Yeas, your point is correct; I fixed it in r31086.
In the commit, I exclude only UTF8-MAC.
It is to allow to specify UTF-8 with CR-newline as UTF-8-MAC.

Updated by quix (James M. Lawrence) about 13 years ago

=begin
Yui NARUSE wrote:

James M. Lawrence wrote:

The full patch keeps compatibility. You may wish to break it, which is
fine as long as nobody blames me :)

Yeas, your point is correct; I fixed it in r31086.
In the commit, I exclude only UTF8-MAC.
It is to allow to specify UTF-8 with CR-newline as UTF-8-MAC.

OK, you want to keep compatibility for utf8-mac but break
compatibility for utf-8-mac because you consider the original behavior
to be a bug. In that case you need to remove the alias.

Since emacs-22 treats utf-8-mac as utf-8, it would seem that ruby
indeed had an unrelated bug.

diff --git a/enc/utf_8.c b/enc/utf_8.c
index 0c44a3e..63e5f8f 100644
--- a/enc/utf_8.c
+++ b/enc/utf_8.c
@@ -452,6 +452,5 @@ ENC_ALIAS("CP65001", "UTF-8")
* Link: http://www.gnu.org/software/emacs/NEWS.23.2
/
ENC_REPLICATE("UTF8-MAC", "UTF-8")
-ENC_ALIAS("UTF-8-MAC", "UTF8-MAC")
ENC_ALIAS("UTF-8-HFS", "UTF8-MAC") /
Emacs 23.2 */

diff --git a/ext/nkf/nkf-utf8/nkf.c b/ext/nkf/nkf-utf8/nkf.c
index 6877afe..28f879e 100644
--- a/ext/nkf/nkf-utf8/nkf.c
+++ b/ext/nkf/nkf-utf8/nkf.c
@@ -247,7 +247,6 @@ struct {
{"UTF-8N", UTF_8N},
{"UTF-8-BOM", UTF_8_BOM},
{"UTF8-MAC", UTF8_MAC},

  • {"UTF-8-MAC", UTF8_MAC},
    {"UTF-16", UTF_16},
    {"UTF-16BE", UTF_16BE},
    {"UTF-16BE-BOM", UTF_16BE_BOM},

=end

Updated by quix (James M. Lawrence) about 13 years ago

=begin
This should not be filed under backports. Trunk has a bug. UTF-8-MAC and UTF8-MAC are no longer aliases. Apply patch http://redmine.ruby-lang.org/issues/4489#note-7
=end

Actions #9

Updated by yugui (Yuki Sonoda) almost 13 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r31774.
James, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merges r31085 from trunk into ruby_1_9_2.

  • parse.y (parser_encode_length): fix typo: the length of
    "-dos" and "-mac" is not 5 but 4.
    patched by James M. Lawrence [ruby-core:35476] fixes #4489

Updated by quix (James M. Lawrence) over 12 years ago

As a reminder, my last message said: This should not be filed under backports. Trunk has a bug. UTF-8-MAC and UTF8-MAC are no longer aliases. Apply patch http://redmine.ruby-lang.org/issues/4489#note-7

Trunk still has this bug, and the patch still applies cleanly.

Updated by naruse (Yui NARUSE) over 12 years ago

James M. Lawrence wrote:

As a reminder, my last message said: This should not be filed under backports. Trunk has a bug. UTF-8-MAC and UTF8-MAC are no longer aliases. Apply patch http://redmine.ruby-lang.org/issues/4489#note-7

Trunk still has this bug, and the patch still applies cleanly.

It is not accepted because iconv uses the name.

Updated by quix (James M. Lawrence) over 12 years ago

It is not accepted because iconv uses the name.

Huh? UTF-8-MAC and UTF8-MAC are not aliases. As you said, "UTF-8 with CR-newline as UTF-8-MAC". Emacs agrees with you -- it recognizes UTF-8-MAC as UTF-8.

UTF8-MAC is not the same encoding as UTF-8. They should not be aliased. http://redmine.ruby-lang.org/issues/4489#note-7 simply removes the alias.

Updated by naruse (Yui NARUSE) over 12 years ago

James M. Lawrence wrote:

It is not accepted because iconv uses the name.

Huh? UTF-8-MAC and UTF8-MAC are not aliases. As you said, "UTF-8 with CR-newline as UTF-8-MAC". Emacs agrees with you -- it recognizes UTF-8-MAC as UTF-8.

UTF8-MAC is not the same encoding as UTF-8. They should not be aliased. http://redmine.ruby-lang.org/issues/4489#note-7 simply removes the alias.

Ruby's encoding is not for Emacs.

Updated by quix (James M. Lawrence) over 12 years ago

Ruby's encoding is not for Emacs.

But you already established that UTF-8-MAC is just UTF-8: "UTF-8 with CR-newline as UTF-8-MAC". That Emacs happens to agree with you is just an extra bonus. You may ignore that fact, if you prefer. The Emacs connection doesn't matter at all.

Will someone please intercede to close the communication gap? This is quite straightforward, and my four attempts to explain it have not elicited a comprehensible response.

Actions #15

Updated by shyouhei (Shyouhei Urabe) over 12 years ago

Yui: I can translate you. Tell me your opinion in Japanese.
ちょっと、とりあえず、日本語で書いてみませんか。

James: don't get upset. Seems Yui is just not making himself understood in English.

Updated by naruse (Yui NARUSE) over 12 years ago

Shyouhei Urabe wrote:

Yui: I can translate you. Tell me your opinion in Japanese.
ちょっと、とりあえず、日本語で書いてみませんか。

この問題は Emacs が encoding名 -unix/mac/dos で改行コードを指定するのと、
UTF-8-MAC というエンコーディング名が衝突しているという話で、
magic comment については utf-8-mac を UTF-8 の指定としてみなすという処理を入れたんですが、
UTF-8-MAC という alias 自体を廃止しようぜと言っているので、Emacs にそこまでしてやる義理はないよ、という話です。

Updated by shyouhei (Shyouhei Urabe) over 12 years ago

OK, so the point is, Yui is talking about "magic comments" while James
is about names of Encodings. This is the gap.

Yui says he cannot take "UTF-8-MAC" as a name of an Encoding because
of iconv compatibility, while he can agree with Emacs on magic
comments. So the current situation; UTF-8-MAC means CR if and only if
you load a file.

Updated by quix (James M. Lawrence) over 12 years ago

Shyouhei, thanks for the intercession. I'm not at all upset, I just wanted to break the pattern I see in this bug report.

You have described the reason that UTF-8-MAC should not be aliased to UTF8-MAC. The patch in http://redmine.ruby-lang.org/issues/4489#note-7 removes the alias.

Yui says he cannot take "UTF-8-MAC" as a name of an Encoding

Good, so shouldn't we remove "UTF-8-MAC" as a name of an Encoding? I don't intend to sound sarcastic, but that seems to be the issue.

Updated by shyouhei (Shyouhei Urabe) over 12 years ago

James sounds more natural to me. So removing UTF-8-MAC has nothing (or at least very little) to do with Emacs, right? It's about ruby's internal consistency.

Updated by naruse (Yui NARUSE) over 12 years ago

Shyouhei Urabe wrote:

James sounds more natural to me. So removing UTF-8-MAC has nothing (or at least very little) to do with Emacs, right? It's about ruby's internal consistency.

This is confliction around iconv, emacs, and ruby's internal consistency.
So this is priority issue.
My priority is iconv > Emacs > ruby's internal consistency.

Updated by quix (James M. Lawrence) over 12 years ago

Yes, but it's not only about internal consistency -- the current behavior causes real problems.

The patch in http://redmine.ruby-lang.org/issues/4489#note-7 should be self-explanatory (no English required) given that UTF-8-MAC doesn't exist and that UTF-8 and UTF8-MAC are different encodings. What is Yui's objection to the patch? I don't see any downside.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0