Project

General

Profile

Feature #11785

add `encoding:` optional argument to `String.new`

Added by usa (Usaku NAKAMURA) over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:71927]

Description

I propose to add encoding: optional argument to String.new.

Ruby doesn't have the syntax to specify the encoding of a string literal.
So we're using String#force_encoding for the purpose when writing m17n script, just like:

str = "\xA4\xA2".force_encoding('euc-jp')

But when using frozen-string-literal: true, force_encoding to literals raise RuntimeError.
So, we must write like:

str = "\xA4\xA2".dup.force_encoding('euc-jp')

or, if don't prefer dup,

str = String.new("\xA4\xA2").force_encoding('euc-jp')

but these are very unshapely.
To begin with, using force_encoding would be the cause of the unshapliness.

Therefore, I propose encoding: optional argument of String.new.
If it's available, we can write:

str = String.new("\xA4\xA2", encoding: 'euc-jp')

This was proposed at the developer meeting on the last August and was generally favorably accepted (in my impression), but was forgotten after it.

Associated revisions

Revision 4466d4ba
Added by usa (Usaku NAKAMURA) over 3 years ago

  • string.c (rb_str_init): now accepts new option parameter `encoding'. [Feature #11785]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 52976
Added by usa (Usaku NAKAMURA) over 3 years ago

  • string.c (rb_str_init): now accepts new option parameter `encoding'. [Feature #11785]

Revision 52976
Added by usa (Usaku NAKAMURA) over 3 years ago

  • string.c (rb_str_init): now accepts new option parameter `encoding'. [Feature #11785]

Revision 52976
Added by usa (Usaku NAKAMURA) over 3 years ago

  • string.c (rb_str_init): now accepts new option parameter `encoding'. [Feature #11785]

Revision 52976
Added by usa (Usaku NAKAMURA) over 3 years ago

  • string.c (rb_str_init): now accepts new option parameter `encoding'. [Feature #11785]

Revision 52976
Added by usa (Usaku NAKAMURA) over 3 years ago

  • string.c (rb_str_init): now accepts new option parameter `encoding'. [Feature #11785]

History

Updated by nobu (Nobuyoshi Nakada) over 3 years ago

Another idea, Encoding#string, came in my mind.

diff --git a/encoding.c b/encoding.c
index eb777c9..f0001b3 100644
--- a/encoding.c
+++ b/encoding.c
@@ -1171,6 +1171,38 @@ enc_names(VALUE self)

 /*
  * call-seq:
+ *   enc.string(str, ...) -> string
+ *
+ * Returns a string in this encoding.  The arguments are binary data
+ * or codepoints.
+ *
+ *  Encoding::EUC_JP.string("\xA4\xA2", 0xa4a1) #=> "\x{A4A2}\x{A4A1}"
+ */
+static VALUE
+enc_string(int argc, VALUE *argv, VALUE self)
+{
+    rb_encoding *enc = DATA_PTR(self);
+    VALUE str = rb_enc_str_new(0, 0, enc);
+    int i;
+
+    for (i = 0; i < argc; ++i) {
+   VALUE s = argv[i];
+   if (RB_TYPE_P(s, T_STRING) || !NIL_P(s = rb_check_string_type(s))) {
+       const char *ptr;
+       long len;
+       RSTRING_GETMEM(s, ptr, len);
+       rb_str_cat(str, ptr, len);
+   }
+   else {
+       rb_str_concat(str, argv[i]);
+   }
+    }
+
+    return str;
+}
+
+/*
+ * call-seq:
  *   Encoding.list -> [enc1, enc2, ...]
  *
  * Returns the list of loaded encodings.
@@ -1924,6 +1956,7 @@ Init_Encoding(void)
     rb_define_method(rb_cEncoding, "dummy?", enc_dummy_p, 0);
     rb_define_method(rb_cEncoding, "ascii_compatible?", enc_ascii_compatible_p, 0);
     rb_define_method(rb_cEncoding, "replicate", enc_replicate, 1);
+    rb_define_method(rb_cEncoding, "string", enc_string, -1);
     rb_define_singleton_method(rb_cEncoding, "list", enc_list, 0);
     rb_define_singleton_method(rb_cEncoding, "name_list", rb_enc_name_list, 0);
     rb_define_singleton_method(rb_cEncoding, "aliases", rb_enc_aliases, 0);
diff --git a/test/ruby/test_encoding.rb b/test/ruby/test_encoding.rb
index abe4317..8399f1a 100644
--- a/test/ruby/test_encoding.rb
+++ b/test/ruby/test_encoding.rb
@@ -123,4 +123,14 @@
       }
     end;
   end
+
+  def test_string
+    Encoding.list.each do |e|
+      if e.ascii_compatible?
+        s = e.string("abc")
+        assert_equal(e, s.encoding)
+        assert_equal("abc", s)
+      end
+    end
+  end
 end

Updated by matz (Yukihiro Matsumoto) over 3 years ago

Agreed. I prefer String.new("", encoding: 'utf-8').

Matz.

#3

Updated by usa (Usaku NAKAMURA) over 3 years ago

  • Status changed from Open to Closed

Applied in changeset r52976.


  • string.c (rb_str_init): now accepts new option parameter `encoding'. [Feature #11785]

Also available in: Atom PDF