Project

General

Profile

Actions

Feature #11785

closed

add `encoding:` optional argument to `String.new`

Added by usa (Usaku NAKAMURA) almost 10 years ago. Updated almost 10 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:71927]

Description

I propose to add encoding: optional argument to String.new.

Ruby doesn't have the syntax to specify the encoding of a string literal.
So we're using String#force_encoding for the purpose when writing m17n script, just like:

str = "\xA4\xA2".force_encoding('euc-jp')

But when using frozen-string-literal: true, force_encoding to literals raise RuntimeError.
So, we must write like:

str = "\xA4\xA2".dup.force_encoding('euc-jp')

or, if don't prefer dup,

str = String.new("\xA4\xA2").force_encoding('euc-jp')

but these are very unshapely.
To begin with, using force_encoding would be the cause of the unshapliness.

Therefore, I propose encoding: optional argument of String.new.
If it's available, we can write:

str = String.new("\xA4\xA2", encoding: 'euc-jp')

This was proposed at the developer meeting on the last August and was generally favorably accepted (in my impression), but was forgotten after it.

Updated by nobu (Nobuyoshi Nakada) almost 10 years ago

Another idea, Encoding#string, came in my mind.

diff --git a/encoding.c b/encoding.c
index eb777c9..f0001b3 100644
--- a/encoding.c
+++ b/encoding.c
@@ -1171,6 +1171,38 @@ enc_names(VALUE self)
 
 /*
  * call-seq:
+ *   enc.string(str, ...) -> string
+ *
+ * Returns a string in this encoding.  The arguments are binary data
+ * or codepoints.
+ *
+ *  Encoding::EUC_JP.string("\xA4\xA2", 0xa4a1) #=> "\x{A4A2}\x{A4A1}"
+ */
+static VALUE
+enc_string(int argc, VALUE *argv, VALUE self)
+{
+    rb_encoding *enc = DATA_PTR(self);
+    VALUE str = rb_enc_str_new(0, 0, enc);
+    int i;
+
+    for (i = 0; i < argc; ++i) {
+	VALUE s = argv[i];
+	if (RB_TYPE_P(s, T_STRING) || !NIL_P(s = rb_check_string_type(s))) {
+	    const char *ptr;
+	    long len;
+	    RSTRING_GETMEM(s, ptr, len);
+	    rb_str_cat(str, ptr, len);
+	}
+	else {
+	    rb_str_concat(str, argv[i]);
+	}
+    }
+
+    return str;
+}
+
+/*
+ * call-seq:
  *   Encoding.list -> [enc1, enc2, ...]
  *
  * Returns the list of loaded encodings.
@@ -1924,6 +1956,7 @@ Init_Encoding(void)
     rb_define_method(rb_cEncoding, "dummy?", enc_dummy_p, 0);
     rb_define_method(rb_cEncoding, "ascii_compatible?", enc_ascii_compatible_p, 0);
     rb_define_method(rb_cEncoding, "replicate", enc_replicate, 1);
+    rb_define_method(rb_cEncoding, "string", enc_string, -1);
     rb_define_singleton_method(rb_cEncoding, "list", enc_list, 0);
     rb_define_singleton_method(rb_cEncoding, "name_list", rb_enc_name_list, 0);
     rb_define_singleton_method(rb_cEncoding, "aliases", rb_enc_aliases, 0);
diff --git a/test/ruby/test_encoding.rb b/test/ruby/test_encoding.rb
index abe4317..8399f1a 100644
--- a/test/ruby/test_encoding.rb
+++ b/test/ruby/test_encoding.rb
@@ -123,4 +123,14 @@
       }
     end;
   end
+
+  def test_string
+    Encoding.list.each do |e|
+      if e.ascii_compatible?
+        s = e.string("abc")
+        assert_equal(e, s.encoding)
+        assert_equal("abc", s)
+      end
+    end
+  end
 end

Updated by matz (Yukihiro Matsumoto) almost 10 years ago

Agreed. I prefer String.new("", encoding: 'utf-8').

Matz.

Actions #3

Updated by usa (Usaku NAKAMURA) almost 10 years ago

  • Status changed from Open to Closed

Applied in changeset r52976.


  • string.c (rb_str_init): now accepts new option parameter `encoding'.
    [Feature #11785]
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0