Maybe we can also on this occasion abolish the -U option and make its
action the default? Matz proposed that quite a long time ago, when we
were moving to 1.9.

Regards, Martin.

On 2012/07/02 3:15, mame (Yusuke Endoh) wrote:

Issue #6679 has been updated by mame (Yusuke Endoh).

Status changed from Open to Assigned
Assignee set to naruse (Yui NARUSE)

Received. Thank you!

Naruse-san, what do you think?

Actions

Copy link

#6 [ruby-core:46111]

Updated by claytrump (Clay Trump) about 13 years ago

File utf.pdf utf.pdf added

claytrump (Clay Trump) wrote:

• Ruby 1.9 forced encoding for code that was not pure ASCII,

Could you elaborate?

Sure. Ruby 1.9 forced us to specify the encoding for code that was not pure
ASCII.

I'm no expert, but I think that in Ruby 1.8, you could write code using an
encoding compatbile with ASCII like 8859-1. Things would kind of work, it
would output the expected sequence of bytes, etc... at least as long as
you're using and expecting that encoding everywhere.

If Ruby 1.9 had assumed utf-8, that legacy code would now output the wrong
stuff, and you might not notice right away. Subttle errors, etc.. So it's
cool that in Ruby 1.9 it produces an error; you need to put the encoding.

So any code like that has the right # coding comment by now.

Attached a slide with clearer sentence¶

Actions

Copy link

#7 [ruby-core:46112]

Updated by claytrump (Clay Trump) about 13 years ago

On Mon, Jul 2, 2012 at 2:34 AM, "Martin J. Dürst" duerst@it.aoyama.ac.jpwrote:

I think this is the right direction to go, and doing it for a major
version (2.0) is the right timing.

Maybe we can also on this occasion abolish the -U option and make its
action the default? Matz proposed that quite a long time ago, when we were
moving to 1.9.

Cool, sounds like a plan.

Actions

Copy link

#8 [ruby-core:46120]

Updated by drbrain (Eric Hodel) about 13 years ago

duerst (Martin Dürst) wrote:

I think this is the right direction to go, and doing it for a major
version (2.0) is the right timing.

Maybe we can also on this occasion abolish the -U option and make its
action the default? Matz proposed that quite a long time ago, when we
were moving to 1.9.

#5206 (make -K warn) may be relevant to removing -U

Actions

Copy link

#9 [ruby-core:46123]

Updated by naruse (Yui NARUSE) about 13 years ago

= Default Ruby source file encoding to utf-8

it almost can keep compatibility but breaks

escaped bytes in string literal like "a\xff", its encoding changed from ASCII-8BIT to UTF-8.
escaped bytes in regexp literal like above

= -U as default

What is the expected merit of this?

Actions

Copy link

#10 [ruby-core:46141]

Updated by rosenfeld (Rodrigo Rosenfeld Rosas) about 13 years ago

You could at least consider it for 3.0 and yielding a deprecation warning in such strings on 2.0... Although I think much more people are currently complaining about UTF-8 not being default when compared to those who might complain because they were using ASCII-8BIT encoded escaped chars in strings.

Actions

Copy link

#11 [ruby-core:46171]

Updated by duerst (Martin Dürst) about 13 years ago

On 2012/07/03 10:33, naruse (Yui NARUSE) wrote:

Issue #6679 has been updated by naruse (Yui NARUSE).

= Default Ruby source file encoding to utf-8

it almost can keep compatibility but breaks

escaped bytes in string literal like "a\xff", its encoding changed from ASCII-8BIT to UTF-8.

escaped bytes in regexp literal like above

Good point. Thinking about it, the rule that \x in strings means these
strings are in the source encoding seems to work well for non-UTF-8
strings. For UTF-8, because we have \u, we could make string containing
\x be ASCII-8BIT.

But maybe that's too complicated.

Regards, Martin.

Actions

Copy link

#12 [ruby-core:46653]

Updated by mame (Yusuke Endoh) about 13 years ago

Clay Trump,

I'm happy to inform you that matz has (basically) accepted your
proposal.

But not that the decision may be cancelled if the compatibility
impact is considered serious.
Naruse-san will implement and experiment it.

--
Yusuke Endoh mame@tsg.ne.jp

Actions

Copy link

#13 [ruby-core:46655]

Updated by Anonymous about 13 years ago

If this seem to be too large of a performance impact, please consider making it a configuration option.

Making it a configuration option may be nice anyway.

Actions

Copy link

#14 [ruby-core:46672]

Updated by rosenfeld (Rodrigo Rosenfeld Rosas) about 13 years ago

You mean the default would be UTF-8 right?

In Ruby I believe happiness > performance :)

Em 23-07-2012 10:57, Perry Smith escreveu:

If this seem to be too large of a performance impact, please consider making it a configuration option.

Making it a configuration option may be nice anyway.

On Jul 23, 2012, at 8:44 AM, mame (Yusuke Endoh) wrote:

Issue #6679 has been updated by mame (Yusuke Endoh).

Clay Trump,

I'm happy to inform you that matz has (basically) accepted your
proposal.

But not that the decision may be cancelled if the compatibility
impact is considered serious.
Naruse-san will implement and experiment it.

--
Yusuke Endohmame@tsg.ne.jp ¶

Feature #6679: Default Ruby source file encoding to utf-8
https://bugs.ruby-lang.org/issues/6679#change-28316

Author: claytrump (Clay Trump)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category:
Target version:

Let's change the default encoding for Ruby source files from US-ASCII
to UTF-8 in Ruby 2.0

• Convention over Configuration
• Ruby 1.9 forced encoding for code that was not pure ASCII, so
existing codebase already has magic comments.

In Ruby 2.0, "# encoding: utf-8" can be the default.¶

--
http://bugs.ruby-lang.org/

Actions

Copy link

#15 [ruby-core:46681]

Updated by naruse (Yui NARUSE) about 13 years ago

匿名ユーザ wrote:

If this seem to be too large of a performance impact, please consider making it a configuration option.

Making it a configuration option may be nice anyway.

Benchmark by yourself, and if it shows performance impact, please report it.

Actions

Copy link

#16 [ruby-core:46698]

Updated by ko1 (Koichi Sasada) almost 13 years ago

(2012/07/23 22:57), Perry Smith wrote:

Making it a configuration option may be nice anyway.

--
// SASADA Koichi at atdot dot net

Actions

Copy link

#17 [ruby-core:46703]

Updated by naruse (Yui NARUSE) almost 13 years ago

mame (Yusuke Endoh) wrote:

I'm happy to inform you that matz has (basically) accepted your
proposal.

But not that the decision may be cancelled if the compatibility
impact is considered serious.
Naruse-san will implement and experiment it.

diff --git a/lib/rexml/encoding.rb b/lib/rexml/encoding.rb
index d1d5172..23e912f 100644
--- a/lib/rexml/encoding.rb
+++ b/lib/rexml/encoding.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
module REXML
module Encoding
# ID ---> Encoding name
diff --git a/lib/rexml/source.rb b/lib/rexml/source.rb
index 112393c..7ecb98f 100644
--- a/lib/rexml/source.rb
+++ b/lib/rexml/source.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'rexml/encoding'

module REXML
diff --git a/parse.y b/parse.y
index 049e356..00e80a2 100644
--- a/parse.y
+++ b/parse.y
@@ -10558,7 +10558,7 @@ parser_initialize(struct parser_params *parser)
#ifdef YYMALLOC
parser->heap = NULL;
#endif

parser->enc = rb_usascii_encoding();

parser->enc = rb_utf8_encoding();
}

#ifdef RIPPER
diff --git a/ruby.c b/ruby.c
index ab4b674..5ab5ca2 100644
--- a/ruby.c
+++ b/ruby.c
@@ -1630,7 +1630,7 @@ load_file_internal(VALUE arg)
enc = rb_locale_encoding();
}
else {

enc = rb_usascii_encoding();

enc = rb_utf8_encoding();
}
if (NIL_P(f)) {
f = rb_str_new(0, 0);
diff --git a/test/base64/test_base64.rb b/test/base64/test_base64.rb
index 9ae54cb..c5e61b3 100644
--- a/test/base64/test_base64.rb
+++ b/test/base64/test_base64.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require "test/unit"
require "base64"

diff --git a/test/dl/test_import.rb b/test/dl/test_import.rb
index 26b9f3c..41def7c 100644
--- a/test/dl/test_import.rb
+++ b/test/dl/test_import.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require_relative 'test_base'
require 'dl/import'

diff --git a/test/logger/test_logger.rb b/test/logger/test_logger.rb
index 8fc02f8..100c1ea 100644
--- a/test/logger/test_logger.rb
+++ b/test/logger/test_logger.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'test/unit'
require 'logger'
require 'tempfile'
diff --git a/test/net/http/test_http.rb b/test/net/http/test_http.rb
index fc7bfa9..cb8bf44 100644
--- a/test/net/http/test_http.rb
+++ b/test/net/http/test_http.rb
@@ -1,5 +1,4 @@
-# $Id$¶

+# coding: US-ASCII
require 'test/unit'
require 'net/http'
require 'stringio'
diff --git a/test/net/http/test_httpresponse.rb b/test/net/http/test_httpresponse.rb
index d57614b..ccff224 100644
--- a/test/net/http/test_httpresponse.rb
+++ b/test/net/http/test_httpresponse.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'net/http'
require 'test/unit'
require 'stringio'
diff --git a/test/openssl/test_x509name.rb b/test/openssl/test_x509name.rb
index 90c0992..968ad97 100644
--- a/test/openssl/test_x509name.rb
+++ b/test/openssl/test_x509name.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require_relative 'utils'

if defined?(OpenSSL)
diff --git a/test/psych/test_yaml.rb b/test/psych/test_yaml.rb
index 807c058..796a44f 100644
--- a/test/psych/test_yaml.rb
+++ b/test/psych/test_yaml.rb
@@ -1,4 +1,4 @@
-# -- mode: ruby; ruby-indent-level: 4; tab-width: 4 --
+# -- coding: us-ascii; mode: ruby; ruby-indent-level: 4; tab-width: 4 --

vim:sw=4:ts=4¶

$Id$¶

diff --git a/test/psych/visitors/test_to_ruby.rb b/test/psych/visitors/test_to_ruby.rb
index 5b0702c..ee473c9 100644
--- a/test/psych/visitors/test_to_ruby.rb
+++ b/test/psych/visitors/test_to_ruby.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'psych/helper'

module Psych
diff --git a/test/ripper/test_ripper.rb b/test/ripper/test_ripper.rb
index 72dc52d..1d6e893 100644
--- a/test/ripper/test_ripper.rb
+++ b/test/ripper/test_ripper.rb
@@ -17,7 +17,7 @@ class TestRipper::Ripper < Test::Unit::TestCase
end

def test_encoding

assert_equal Encoding::US_ASCII, @ripper.encoding

assert_equal Encoding::UTF_8, @ripper.encoding
end

def test_end_seen_eh
diff --git a/test/ruby/test_array.rb b/test/ruby/test_array.rb
index fff55e1..856a994 100644
--- a/test/ruby/test_array.rb
+++ b/test/ruby/test_array.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'test/unit'
require_relative 'envutil'

diff --git a/test/ruby/test_io.rb b/test/ruby/test_io.rb
index d1edaaf..93967c6 100644
--- a/test/ruby/test_io.rb
+++ b/test/ruby/test_io.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'test/unit'
require 'tmpdir'
require "fcntl"
diff --git a/test/ruby/test_io_m17n.rb b/test/ruby/test_io_m17n.rb
index b6358e0..3cc8437 100644
--- a/test/ruby/test_io_m17n.rb
+++ b/test/ruby/test_io_m17n.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'test/unit'
require 'tmpdir'
require 'timeout'
diff --git a/test/ruby/test_m17n.rb b/test/ruby/test_m17n.rb
index dfcaa94..ce94886 100644
--- a/test/ruby/test_m17n.rb
+++ b/test/ruby/test_m17n.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'test/unit'
require_relative 'envutil'

diff --git a/test/ruby/test_pack.rb b/test/ruby/test_pack.rb
index c72035c..4810c6e 100644
--- a/test/ruby/test_pack.rb
+++ b/test/ruby/test_pack.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'test/unit'

class TestPack < Test::Unit::TestCase
diff --git a/test/ruby/test_parse.rb b/test/ruby/test_parse.rb
index 563e2ce..b5d31db 100644
--- a/test/ruby/test_parse.rb
+++ b/test/ruby/test_parse.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'test/unit'
require 'stringio'

diff --git a/test/ruby/test_regexp.rb b/test/ruby/test_regexp.rb
index 7e31e99..781af50 100644
--- a/test/ruby/test_regexp.rb
+++ b/test/ruby/test_regexp.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'test/unit'
require 'envutil'

diff --git a/test/syck/test_yaml.rb b/test/syck/test_yaml.rb
index 132bc92..c286b03 100644
--- a/test/syck/test_yaml.rb
+++ b/test/syck/test_yaml.rb
@@ -1,4 +1,4 @@
-# -- mode: ruby; ruby-indent-level: 4; tab-width: 4; indent-tabs-mode: t --
+# -- coding: us-ascii; mode: ruby; ruby-indent-level: 4; tab-width: 4; indent-tabs-mode: t --

vim:sw=4:ts=4¶

$Id$¶

diff --git a/test/syslog/test_syslog_logger.rb b/test/syslog/test_syslog_logger.rb
index 9224296..d382b4a 100644
--- a/test/syslog/test_syslog_logger.rb
+++ b/test/syslog/test_syslog_logger.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require 'test/unit'
require 'tempfile'
require 'syslog/logger'
diff --git a/test/webrick/test_cgi.rb b/test/webrick/test_cgi.rb
index d930c26..282183e 100644
--- a/test/webrick/test_cgi.rb
+++ b/test/webrick/test_cgi.rb
@@ -1,3 +1,4 @@
+# coding: US-ASCII
require_relative "utils"
require "webrick"
require "test/unit"

Actions

Copy link

#18 [ruby-core:46709]

Updated by duerst (Martin Dürst) almost 13 years ago

On 2012/07/24 3:27, naruse (Yui NARUSE) wrote:

Issue #6679 has been updated by naruse (Yui NARUSE).

匿名ユーザ wrote:

If this seem to be too large of a performance impact, please consider making it a configuration option.

Making it a configuration option may be nice anyway.

Benchmark by yourself, and if it shows performance impact, please report it.

I agree. For a file that's ASCII only, I can't imagine that performance
decreases much (but of course I might be wrong). For a file that's
UTF-8, there's no change. Same for a file that's in another encoding
(because that can't use the default).

Regards, Martin.

Actions

Copy link

#19 [ruby-core:46712]

Updated by naruse (Yui NARUSE) almost 13 years ago

ko1 (Koichi Sasada) wrote:

(2012/07/23 22:57), Perry Smith wrote:

Making it a configuration option may be nice anyway.

+1

diff --git a/ruby.c b/ruby.c
index ab4b674..d6a8a91 100644
--- a/ruby.c
+++ b/ruby.c
@@ -702,6 +702,7 @@ static long
proc_options(long argc, char **argv, struct cmdline_options *opt, int envopt)
{
long n, argc0 = argc;

int opt_K_p = FALSE;
const char *s;

if (argc == 0)
@@ -909,6 +910,7 @@ proc_options(long argc, char **argv, struct cmdline_options *opt, int envopt)
break;
}
if (enc_name) {

      opt_K_p = TRUE;
      opt->src.enc.name = rb_str_new2(enc_name);
      if (!opt->ext.enc.name)
  	opt->ext.enc.name = opt->src.enc.name;

@@ -1013,10 +1015,8 @@ proc_options(long argc, char **argv, struct cmdline_options opt, int envopt)
if (!(s = ++p)) break;
set_encoding_part(internal);
if (!(s = ++p)) break;
-#if defined ALLOW_DEFAULT_SOURCE_ENCODING && ALLOW_DEFAULT_SOURCE_ENCODING
set_encoding_part(source);
if (!(s = ++p)) break;
-#endif
rb_raise(rb_eRuntimeError, "extra argument for %s: %s",
(arg[1] == '-' ? "--encoding" : "-E"), s);

undef set_encoding_part¶

@@ -1028,11 +1028,9 @@ proc_options(long argc, char **argv, struct cmdline_options *opt, int envopt)
else if (is_option_with_arg("external-encoding", Qfalse, Qtrue)) {
set_external_encoding_once(opt, s, 0);
}
-#if defined ALLOW_DEFAULT_SOURCE_ENCODING && ALLOW_DEFAULT_SOURCE_ENCODING
else if (is_option_with_arg("source-encoding", Qfalse, Qtrue)) {
set_source_encoding_once(opt, s, 0);
}
-#endif
else if (strcmp("version", s) == 0) {
if (envopt) goto noenvopt_long;
opt->dump |= DUMP_BIT(version);
@@ -1097,6 +1095,9 @@ proc_options(long argc, char **argv, struct cmdline_options *opt, int envopt)
}

switch_end:

if (opt_K_p)
rb_warning("-K is specified; it is for 1.8 compatibility and may cause odd behavior");
return argc0 - argc;
}

@@ -1268,9 +1269,6 @@ process_options(int argc, char **argv, struct cmdline_options *opt)
opt->intern.enc.name = int_enc_name;
}

if (opt->src.enc.name)
rb_warning("-K is specified; it is for 1.8 compatibility and may cause odd behavior");
if (opt->dump & DUMP_BIT(version)) {
ruby_show_version();
return Qtrue;

Actions

Copy link

#20

Updated by naruse (Yui NARUSE) over 12 years ago

Status changed from Assigned to Closed
% Done changed from 0 to 100

This issue was solved with changeset r37485.
Clay, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

ruby.c (load_file_internal): set default source encoding as
UTF-8 instead of US-ASCII. [ruby-core:46021] [Feature #6679]
parse.y (parser_initialize): set default parser encoding as
UTF-8 instead of US-ASCII.

Actions

Copy link

#21

Updated by mame (Yusuke Endoh) over 12 years ago

Target version set to 2.0.0

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #6679

Default Ruby source file encoding to utf-8

In Ruby 2.0, "# encoding: utf-8" can be the default.¶

Updated by claytrump (Clay Trump) about 13 years ago

Oh, and here's a slide for the feature meetup. It's ugly, I know.¶

Updated by rosenfeld (Rodrigo Rosenfeld Rosas) about 13 years ago

Updated by mame (Yusuke Endoh) about 13 years ago

Updated by nobu (Nobuyoshi Nakada) about 13 years ago

Updated by duerst (Martin Dürst) about 13 years ago

Updated by claytrump (Clay Trump) about 13 years ago

Attached a slide with clearer sentence¶

Updated by claytrump (Clay Trump) about 13 years ago

Updated by drbrain (Eric Hodel) about 13 years ago

Updated by naruse (Yui NARUSE) about 13 years ago

Updated by rosenfeld (Rodrigo Rosenfeld Rosas) about 13 years ago

Updated by duerst (Martin Dürst) about 13 years ago

Updated by mame (Yusuke Endoh) about 13 years ago

Updated by Anonymous about 13 years ago

Updated by rosenfeld (Rodrigo Rosenfeld Rosas) about 13 years ago

--
Yusuke Endohmame@tsg.ne.jp ¶

In Ruby 2.0, "# encoding: utf-8" can be the default.¶

Updated by naruse (Yui NARUSE) about 13 years ago

Updated by ko1 (Koichi Sasada) almost 13 years ago

Updated by naruse (Yui NARUSE) almost 13 years ago

vim:sw=4:ts=4¶

$Id$¶

vim:sw=4:ts=4¶

$Id$¶

Updated by duerst (Martin Dürst) almost 13 years ago

Updated by naruse (Yui NARUSE) almost 13 years ago

undef set_encoding_part¶

Updated by naruse (Yui NARUSE) over 12 years ago

Updated by mame (Yusuke Endoh) over 12 years ago

Project

General

Profile

Ruby

Tags

Custom queries

Feature #6679

Default Ruby source file encoding to utf-8

In Ruby 2.0, "# encoding: utf-8" can be the default.¶

Updated by claytrump (Clay Trump) about 13 years ago

Oh, and here's a slide for the feature meetup. It's ugly, I know.¶

Updated by rosenfeld (Rodrigo Rosenfeld Rosas) about 13 years ago

Updated by mame (Yusuke Endoh) about 13 years ago

Updated by nobu (Nobuyoshi Nakada) about 13 years ago

Updated by duerst (Martin Dürst) about 13 years ago

Updated by claytrump (Clay Trump) about 13 years ago

Attached a slide with clearer sentence¶

Updated by claytrump (Clay Trump) about 13 years ago

Updated by drbrain (Eric Hodel) about 13 years ago

Updated by naruse (Yui NARUSE) about 13 years ago

Updated by rosenfeld (Rodrigo Rosenfeld Rosas) about 13 years ago

Updated by duerst (Martin Dürst) about 13 years ago

Updated by mame (Yusuke Endoh) about 13 years ago

Updated by Anonymous about 13 years ago

Updated by rosenfeld (Rodrigo Rosenfeld Rosas) about 13 years ago

-- Yusuke Endohmame@tsg.ne.jp ¶

In Ruby 2.0, "# encoding: utf-8" can be the default.¶

Updated by naruse (Yui NARUSE) about 13 years ago

Updated by ko1 (Koichi Sasada) almost 13 years ago

Updated by naruse (Yui NARUSE) almost 13 years ago

vim:sw=4:ts=4¶

$Id$¶

vim:sw=4:ts=4¶

$Id$¶

Updated by duerst (Martin Dürst) almost 13 years ago

Updated by naruse (Yui NARUSE) almost 13 years ago

undef set_encoding_part¶

Updated by naruse (Yui NARUSE) over 12 years ago

Updated by mame (Yusuke Endoh) over 12 years ago

--
Yusuke Endohmame@tsg.ne.jp ¶