Backport #6377
closed$LOADED_FEATURES entry via YAML is binary data?
Description
=begin
While working with $LOADED_FEATURES, came across this odd result:
trans@logisys:courtier$ irb
irb(main):001:0> require 'yaml'
irb(main):002:0> puts $LOADED_FEATURES.join("\n")
enumerator.so
/home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/x86_64-linux/enc/encdb.so
/home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/x86_64-linux/enc/trans/transdb.so
/home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/rubygems/defaults.rb
...
irb(main):003:0> y $LOADED_FEATURES¶
- enumerator.so
- !binary |-
L2hvbWUvdHJhbnMvLmxvY2FsL2xpYi9yeS9ydWJpZXMvMS45LjMtcDEyNS9s
aWIvcnVieS8xLjkuMS94ODZfNjQtbGludXgvZW5jL2VuY2RiLnNv - /home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/x86_64-linux/enc/trans/transdb.so
- /home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/rubygems/defaults.rb
...
=end
Updated by nobu (Nobuyoshi Nakada) almost 12 years ago
- Status changed from Open to Assigned
- Assignee set to tenderlovemaking (Aaron Patterson)
Sounds like psych deals with ASCII-8BIT strings as binary data always, even if 7bit only.
Updated by tenderlovemaking (Aaron Patterson) almost 12 years ago
I'm not sure how or if I should fix this. There are two problems: 1) we lose encoding information, and 2) how do we decide what to consider "binary" or not.
For #1, if we treat 7bit only ascii strings as "non-binary", it means that when we load the data back in, the string will be tagged as UTF-8 (since "raw" YAML strings are unicode). e.g. today this test passes, but if we treat 7bit ascii strings as non-binary, it will fail:
s = "hello".encode('ASCII-8BIT')
assert_equal s.encoding, YAML.load(YAML.dump(s)).encoding
For #2, I'm not sure how we decide what is binary and what is not. Should strings that contain null bytes be considered binary? If so, we can't use the ascii_only?
method:
"\0".ascii_only? # => true
Given the data loss from #1, and the hardship of #2, I don't think Psych should change. I'm open to suggestions for dealing with these problems.
Nobu: Why are the paths on LOADED_FEATURES encoded as ASCII-8BIT? Shouldn't those paths be tagged with the filesystem encoding?
Updated by tenderlovemaking (Aaron Patterson) over 11 years ago
- Assignee changed from tenderlovemaking (Aaron Patterson) to nobu (Nobuyoshi Nakada)
Nobu, do you know why the paths on LOADED_FEATURES are encoded as ASCII-8BIT? Shouldn't they be tagged with the filesystem encoding? Thanks.
Updated by nobu (Nobuyoshi Nakada) over 11 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r36800.
Thomas, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
load.c: keep encoding of feature name
- file.c (rb_find_file_ext_safe, rb_find_file_safe): default to
US-ASCII for encdb and transdb. - load.c (search_required): keep encoding of feature name. set
loading path to filesystem encoding. [Bug #6377][ruby-core:44750] - ruby.c (add_modules, require_libraries): assume default external
encoding as well as ARGV.
Updated by nobu (Nobuyoshi Nakada) over 11 years ago
- Tracker changed from Bug to Backport
- Project changed from Ruby master to Backport193
- Category deleted (
lib) - Status changed from Closed to Assigned
- Assignee changed from nobu (Nobuyoshi Nakada) to usa (Usaku NAKAMURA)
- Target version deleted (
1.9.3)
Updated by usa (Usaku NAKAMURA) over 11 years ago
- Status changed from Assigned to Closed
This issue was solved with changeset r37209.
Thomas, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
merge revision(s) 36800: [Backport #6377]
-
file.c (rb_find_file_ext_safe, rb_find_file_safe): default to
US-ASCII for encdb and transdb. -
load.c (search_required): keep encoding of feature name. set
loading path to filesystem encoding. [Bug #6377][ruby-core:44750] -
ruby.c (add_modules, require_libraries): assume default external
encoding as well as ARGV.