Project

General

Profile

Actions

Feature #11785

closed

add `encoding:` optional argument to `String.new`

Added by usa (Usaku NAKAMURA) over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:71927]

Description

I propose to add encoding: optional argument to String.new.

Ruby doesn't have the syntax to specify the encoding of a string literal.
So we're using String#force_encoding for the purpose when writing m17n script, just like:

str = "\xA4\xA2".force_encoding('euc-jp')

But when using frozen-string-literal: true, force_encoding to literals raise RuntimeError.
So, we must write like:

str = "\xA4\xA2".dup.force_encoding('euc-jp')

or, if don't prefer dup,

str = String.new("\xA4\xA2").force_encoding('euc-jp')

but these are very unshapely.
To begin with, using force_encoding would be the cause of the unshapliness.

Therefore, I propose encoding: optional argument of String.new.
If it's available, we can write:

str = String.new("\xA4\xA2", encoding: 'euc-jp')

This was proposed at the developer meeting on the last August and was generally favorably accepted (in my impression), but was forgotten after it.

Updated by nobu (Nobuyoshi Nakada) over 6 years ago

Another idea, Encoding#string, came in my mind.

diff --git a/encoding.c b/encoding.c
index eb777c9..f0001b3 100644
--- a/encoding.c
+++ b/encoding.c
@@ -1171,6 +1171,38 @@ enc_names(VALUE self)
 
 /*
  * call-seq:
+ *   enc.string(str, ...) -> string
+ *
+ * Returns a string in this encoding.  The arguments are binary data
+ * or codepoints.
+ *
+ *  Encoding::EUC_JP.string("\xA4\xA2", 0xa4a1) #=> "\x{A4A2}\x{A4A1}"
+ */
+static VALUE
+enc_string(int argc, VALUE *argv, VALUE self)
+{
+    rb_encoding *enc = DATA_PTR(self);
+    VALUE str = rb_enc_str_new(0, 0, enc);
+    int i;
+
+    for (i = 0; i < argc; ++i) {
+	VALUE s = argv[i];
+	if (RB_TYPE_P(s, T_STRING) || !NIL_P(s = rb_check_string_type(s))) {
+	    const char *ptr;
+	    long len;
+	    RSTRING_GETMEM(s, ptr, len);
+	    rb_str_cat(str, ptr, len);
+	}
+	else {
+	    rb_str_concat(str, argv[i]);
+	}
+    }
+
+    return str;
+}
+
+/*
+ * call-seq:
  *   Encoding.list -> [enc1, enc2, ...]
  *
  * Returns the list of loaded encodings.
@@ -1924,6 +1956,7 @@ Init_Encoding(void)
     rb_define_method(rb_cEncoding, "dummy?", enc_dummy_p, 0);
     rb_define_method(rb_cEncoding, "ascii_compatible?", enc_ascii_compatible_p, 0);
     rb_define_method(rb_cEncoding, "replicate", enc_replicate, 1);
+    rb_define_method(rb_cEncoding, "string", enc_string, -1);
     rb_define_singleton_method(rb_cEncoding, "list", enc_list, 0);
     rb_define_singleton_method(rb_cEncoding, "name_list", rb_enc_name_list, 0);
     rb_define_singleton_method(rb_cEncoding, "aliases", rb_enc_aliases, 0);
diff --git a/test/ruby/test_encoding.rb b/test/ruby/test_encoding.rb
index abe4317..8399f1a 100644
--- a/test/ruby/test_encoding.rb
+++ b/test/ruby/test_encoding.rb
@@ -123,4 +123,14 @@
       }
     end;
   end
+
+  def test_string
+    Encoding.list.each do |e|
+      if e.ascii_compatible?
+        s = e.string("abc")
+        assert_equal(e, s.encoding)
+        assert_equal("abc", s)
+      end
+    end
+  end
 end

Updated by matz (Yukihiro Matsumoto) over 6 years ago

Agreed. I prefer String.new("", encoding: 'utf-8').

Matz.

Actions #3

Updated by usa (Usaku NAKAMURA) over 6 years ago

  • Status changed from Open to Closed

Applied in changeset r52976.


  • string.c (rb_str_init): now accepts new option parameter `encoding'.
    [Feature #11785]
Actions

Also available in: Atom PDF