Feature #11785
closedadd `encoding:` optional argument to `String.new`
Description
I propose to add encoding:
optional argument to String.new
.
Ruby doesn't have the syntax to specify the encoding of a string literal.
So we're using String#force_encoding
for the purpose when writing m17n script, just like:
str = "\xA4\xA2".force_encoding('euc-jp')
But when using frozen-string-literal: true
, force_encoding
to literals raise RuntimeError.
So, we must write like:
str = "\xA4\xA2".dup.force_encoding('euc-jp')
or, if don't prefer dup
,
str = String.new("\xA4\xA2").force_encoding('euc-jp')
but these are very unshapely.
To begin with, using force_encoding
would be the cause of the unshapliness.
Therefore, I propose encoding:
optional argument of String.new
.
If it's available, we can write:
str = String.new("\xA4\xA2", encoding: 'euc-jp')
This was proposed at the developer meeting on the last August and was generally favorably accepted (in my impression), but was forgotten after it.
Updated by nobu (Nobuyoshi Nakada) almost 9 years ago
Another idea, Encoding#string
, came in my mind.
diff --git a/encoding.c b/encoding.c
index eb777c9..f0001b3 100644
--- a/encoding.c
+++ b/encoding.c
@@ -1171,6 +1171,38 @@ enc_names(VALUE self)
/*
* call-seq:
+ * enc.string(str, ...) -> string
+ *
+ * Returns a string in this encoding. The arguments are binary data
+ * or codepoints.
+ *
+ * Encoding::EUC_JP.string("\xA4\xA2", 0xa4a1) #=> "\x{A4A2}\x{A4A1}"
+ */
+static VALUE
+enc_string(int argc, VALUE *argv, VALUE self)
+{
+ rb_encoding *enc = DATA_PTR(self);
+ VALUE str = rb_enc_str_new(0, 0, enc);
+ int i;
+
+ for (i = 0; i < argc; ++i) {
+ VALUE s = argv[i];
+ if (RB_TYPE_P(s, T_STRING) || !NIL_P(s = rb_check_string_type(s))) {
+ const char *ptr;
+ long len;
+ RSTRING_GETMEM(s, ptr, len);
+ rb_str_cat(str, ptr, len);
+ }
+ else {
+ rb_str_concat(str, argv[i]);
+ }
+ }
+
+ return str;
+}
+
+/*
+ * call-seq:
* Encoding.list -> [enc1, enc2, ...]
*
* Returns the list of loaded encodings.
@@ -1924,6 +1956,7 @@ Init_Encoding(void)
rb_define_method(rb_cEncoding, "dummy?", enc_dummy_p, 0);
rb_define_method(rb_cEncoding, "ascii_compatible?", enc_ascii_compatible_p, 0);
rb_define_method(rb_cEncoding, "replicate", enc_replicate, 1);
+ rb_define_method(rb_cEncoding, "string", enc_string, -1);
rb_define_singleton_method(rb_cEncoding, "list", enc_list, 0);
rb_define_singleton_method(rb_cEncoding, "name_list", rb_enc_name_list, 0);
rb_define_singleton_method(rb_cEncoding, "aliases", rb_enc_aliases, 0);
diff --git a/test/ruby/test_encoding.rb b/test/ruby/test_encoding.rb
index abe4317..8399f1a 100644
--- a/test/ruby/test_encoding.rb
+++ b/test/ruby/test_encoding.rb
@@ -123,4 +123,14 @@
}
end;
end
+
+ def test_string
+ Encoding.list.each do |e|
+ if e.ascii_compatible?
+ s = e.string("abc")
+ assert_equal(e, s.encoding)
+ assert_equal("abc", s)
+ end
+ end
+ end
end
Updated by matz (Yukihiro Matsumoto) almost 9 years ago
Agreed. I prefer String.new("", encoding: 'utf-8')
.
Matz.
Updated by usa (Usaku NAKAMURA) almost 9 years ago
- Status changed from Open to Closed
Applied in changeset r52976.
- string.c (rb_str_init): now accepts new option parameter `encoding'.
[Feature #11785]