Backport #4489

[PATCH] Encodings with /-(unix|dos|mac)\Z/

Added by James M. Lawrence about 3 years ago. Updated over 2 years ago.

[ruby-core:35476]
Status:Closed
Priority:Normal
Assignee:Yuki Sonoda

Description

=begin
The *-dos and *-mac meta-encodings were not being recognized. utf-8-mac is a special case since it corresponds to a real encoding; its meta-encoding is utf-8-mac-mac.
=end

parse.y.patch Magnifier (716 Bytes) James M. Lawrence, 03/10/2011 01:52 PM

Associated revisions

Revision 31774
Added by Yuki Sonoda almost 3 years ago

merges r31085 from trunk into ruby19_2.

  • parse.y (parserencodelength): fix typo: the length of "-dos" and "-mac" is not 5 but 4. patched by James M. Lawrence fixes #4489

Revision 31775
Added by Yuki Sonoda almost 3 years ago

merges r31086 from trunk into ruby19_2.

  • parse.y (parserencodelength): add exception as UTF8-MAC for magic comment's emacs newline specifier patched by James M. Lawrence fixes #4489

History

#1 Updated by Yui NARUSE about 3 years ago

  • Status changed from Open to Rejected

=begin
Emacs can't recognize the name "utf-8-mac", it uses utf-8-hfs.
So this patch won't work well.

When people want to write "coding: utf-8-mac-mac", they should write "coding: utf-8-hfs-mac".
=end

#2 Updated by James M. Lawrence about 3 years ago

=begin
Notice the patch corrects a typo (5 instead of 4) which has been there since 2007.
Ruby accidentally recognizes utf-8-mac as UTF8-MAC encoding because of the typo.
That accident is, fortunately, correct behavior.

However you think it should work, the code below the typo is not being executed.

  • if (len > 4 && name[nlen = len - 5] == '-') {
  • if (len > 4 && name[nlen = len - 4] == '-') {

=end

#3 Updated by Yui NARUSE about 3 years ago

James M. Lawrence wrote:

Notice the patch corrects a typo (5 instead of 4) which has been there since 2007.
Ruby accidentally recognizes utf-8-mac as UTF8-MAC encoding because of the typo.
That accident is, fortunately, correct behavior.

However you think it should work, the code below the typo is not being executed.

  • if (len > 4 && name[nlen = len - 5] == '-') {
  • if (len > 4 && name[nlen = len - 4] == '-') {

I fixed it at r31085, thank you for reporting.

#4 Updated by Yui NARUSE about 3 years ago

  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport92
  • Status changed from Rejected to Assigned
  • Assignee set to Yuki Sonoda

=begin

=end

#5 Updated by James M. Lawrence about 3 years ago

=begin
I assume you realize that you have broken backwards compatibility by
rejecting the second part of the patch.

% cat mac.rb
# encoding: utf-8-mac
puts "foo".encoding

% ruby19dev -v mac.rb
ruby 1.9.3dev (2011-03-10 trunk 31080) [i386-darwin9.8.0]
UTF8-MAC

% ./ruby -v mac.rb
ruby 1.9.3dev (2011-03-10 trunk 31085) [i386-darwin9.8.0]
UTF-8

The full patch keeps compatibility. You may wish to break it, which is
fine as long as nobody blames me :)

=end

#6 Updated by Yui NARUSE about 3 years ago

James M. Lawrence wrote:

I assume you realize that you have broken backwards compatibility by
rejecting the second part of the patch.

% cat mac.rb
# encoding: utf-8-mac
puts "foo".encoding

% ruby19dev -v mac.rb
ruby 1.9.3dev (2011-03-10 trunk 31080) [i386-darwin9.8.0]
UTF8-MAC

% ./ruby -v mac.rb
ruby 1.9.3dev (2011-03-10 trunk 31085) [i386-darwin9.8.0]
UTF-8

The full patch keeps compatibility. You may wish to break it, which is
fine as long as nobody blames me :)

Yeas, your point is correct; I fixed it in r31086.
In the commit, I exclude only UTF8-MAC.
It is to allow to specify UTF-8 with CR-newline as UTF-8-MAC.

#7 Updated by James M. Lawrence about 3 years ago

=begin
Yui NARUSE wrote:

James M. Lawrence wrote:

The full patch keeps compatibility. You may wish to break it, which is
fine as long as nobody blames me :)

Yeas, your point is correct; I fixed it in r31086.
In the commit, I exclude only UTF8-MAC.
It is to allow to specify UTF-8 with CR-newline as UTF-8-MAC.

OK, you want to keep compatibility for utf8-mac but break
compatibility for utf-8-mac because you consider the original behavior
to be a bug. In that case you need to remove the alias.

Since emacs-22 treats utf-8-mac as utf-8, it would seem that ruby
indeed had an unrelated bug.

diff --git a/enc/utf8.c b/enc/utf8.c
index 0c44a3e..63e5f8f 100644
--- a/enc/utf8.c
+++ b/enc/utf
8.c
@@ -452,6 +452,5 @@ ENCALIAS("CP65001", "UTF-8")
* Link: http://www.gnu.org/software/emacs/NEWS.23.2
*/
ENC
REPLICATE("UTF8-MAC", "UTF-8")
-ENCALIAS("UTF-8-MAC", "UTF8-MAC")
ENC
ALIAS("UTF-8-HFS", "UTF8-MAC") /* Emacs 23.2 */

diff --git a/ext/nkf/nkf-utf8/nkf.c b/ext/nkf/nkf-utf8/nkf.c
index 6877afe..28f879e 100644
--- a/ext/nkf/nkf-utf8/nkf.c
+++ b/ext/nkf/nkf-utf8/nkf.c
@@ -247,7 +247,6 @@ struct {
{"UTF-8N", UTF8N},
{"UTF-8-BOM", UTF
8BOM},
{"UTF8-MAC", UTF8
MAC},
- {"UTF-8-MAC", UTF8MAC},
{"UTF-16", UTF
16},
{"UTF-16BE", UTF16BE},
{"UTF-16BE-BOM", UTF
16BE_BOM},

=end

#8 Updated by James M. Lawrence about 3 years ago

=begin
This should not be filed under backports. Trunk has a bug. UTF-8-MAC and UTF8-MAC are no longer aliases. Apply patch http://redmine.ruby-lang.org/issues/4489#note-7
=end

#9 Updated by Yuki Sonoda almost 3 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r31774.
James, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merges r31085 from trunk into ruby19_2.

  • parse.y (parserencodelength): fix typo: the length of "-dos" and "-mac" is not 5 but 4. patched by James M. Lawrence fixes #4489

#10 Updated by James M. Lawrence over 2 years ago

As a reminder, my last message said: This should not be filed under backports. Trunk has a bug. UTF-8-MAC and UTF8-MAC are no longer aliases. Apply patch http://redmine.ruby-lang.org/issues/4489#note-7

Trunk still has this bug, and the patch still applies cleanly.

#11 Updated by Yui NARUSE over 2 years ago

James M. Lawrence wrote:

As a reminder, my last message said: This should not be filed under backports. Trunk has a bug. UTF-8-MAC and UTF8-MAC are no longer aliases. Apply patch http://redmine.ruby-lang.org/issues/4489#note-7

Trunk still has this bug, and the patch still applies cleanly.

It is not accepted because iconv uses the name.

#12 Updated by James M. Lawrence over 2 years ago

It is not accepted because iconv uses the name.

Huh? UTF-8-MAC and UTF8-MAC are not aliases. As you said, "UTF-8 with CR-newline as UTF-8-MAC". Emacs agrees with you -- it recognizes UTF-8-MAC as UTF-8.

UTF8-MAC is not the same encoding as UTF-8. They should not be aliased. http://redmine.ruby-lang.org/issues/4489#note-7 simply removes the alias.

#13 Updated by Yui NARUSE over 2 years ago

James M. Lawrence wrote:

It is not accepted because iconv uses the name.

Huh? UTF-8-MAC and UTF8-MAC are not aliases. As you said, "UTF-8 with CR-newline as UTF-8-MAC". Emacs agrees with you -- it recognizes UTF-8-MAC as UTF-8.

UTF8-MAC is not the same encoding as UTF-8. They should not be aliased. http://redmine.ruby-lang.org/issues/4489#note-7 simply removes the alias.

Ruby's encoding is not for Emacs.

#14 Updated by James M. Lawrence over 2 years ago

Ruby's encoding is not for Emacs.

But you already established that UTF-8-MAC is just UTF-8: "UTF-8 with CR-newline as UTF-8-MAC". That Emacs happens to agree with you is just an extra bonus. You may ignore that fact, if you prefer. The Emacs connection doesn't matter at all.

Will someone please intercede to close the communication gap? This is quite straightforward, and my four attempts to explain it have not elicited a comprehensible response.

#15 Updated by Shyouhei Urabe over 2 years ago

Yui: I can translate you. Tell me your opinion in Japanese.
ちょっと、とりあえず、日本語で書いてみませんか。

James: don't get upset. Seems Yui is just not making himself understood in English.

#16 Updated by Yui NARUSE over 2 years ago

Shyouhei Urabe wrote:

Yui: I can translate you. Tell me your opinion in Japanese.
ちょっと、とりあえず、日本語で書いてみませんか。

この問題は Emacs が encoding名 -unix/mac/dos で改行コードを指定するのと、
UTF-8-MAC というエンコーディング名が衝突しているという話で、
magic comment については utf-8-mac を UTF-8 の指定としてみなすという処理を入れたんですが、
UTF-8-MAC という alias 自体を廃止しようぜと言っているので、Emacs にそこまでしてやる義理はないよ、という話です。

#17 Updated by Shyouhei Urabe over 2 years ago

OK, so the point is, Yui is talking about "magic comments" while James
is about names of Encodings. This is the gap.

Yui says he cannot take "UTF-8-MAC" as a name of an Encoding because
of iconv compatibility, while he can agree with Emacs on magic
comments. So the current situation; UTF-8-MAC means CR if and only if
you load a file.

#18 Updated by James M. Lawrence over 2 years ago

Shyouhei, thanks for the intercession. I'm not at all upset, I just wanted to break the pattern I see in this bug report.

You have described the reason that UTF-8-MAC should not be aliased to UTF8-MAC. The patch in http://redmine.ruby-lang.org/issues/4489#note-7 removes the alias.

Yui says he cannot take "UTF-8-MAC" as a name of an Encoding

Good, so shouldn't we remove "UTF-8-MAC" as a name of an Encoding? I don't intend to sound sarcastic, but that seems to be the issue.

#19 Updated by Shyouhei Urabe over 2 years ago

James sounds more natural to me. So removing UTF-8-MAC has nothing (or at least very little) to do with Emacs, right? It's about ruby's internal consistency.

#20 Updated by Yui NARUSE over 2 years ago

Shyouhei Urabe wrote:

James sounds more natural to me. So removing UTF-8-MAC has nothing (or at least very little) to do with Emacs, right? It's about ruby's internal consistency.

This is confliction around iconv, emacs, and ruby's internal consistency.
So this is priority issue.
My priority is iconv > Emacs > ruby's internal consistency.

#21 Updated by James M. Lawrence over 2 years ago

Yes, but it's not only about internal consistency -- the current behavior causes real problems.

The patch in http://redmine.ruby-lang.org/issues/4489#note-7 should be self-explanatory (no English required) given that UTF-8-MAC doesn't exist and that UTF-8 and UTF8-MAC are different encodings. What is Yui's objection to the patch? I don't see any downside.

Also available in: Atom PDF