Project

General

Profile

Actions

Backport #6377

closed

$LOADED_FEATURES entry via YAML is binary data?

Added by trans (Thomas Sawyer) almost 12 years ago. Updated over 11 years ago.

Status:
Closed
[ruby-core:44750]

Description

=begin
While working with $LOADED_FEATURES, came across this odd result:

trans@logisys:courtier$ irb

irb(main):001:0> require 'yaml'

irb(main):002:0> puts $LOADED_FEATURES.join("\n")
enumerator.so
/home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/x86_64-linux/enc/encdb.so
/home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/x86_64-linux/enc/trans/transdb.so
/home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/rubygems/defaults.rb
...

irb(main):003:0> y $LOADED_FEATURES

  • enumerator.so
  • !binary |-
    L2hvbWUvdHJhbnMvLmxvY2FsL2xpYi9yeS9ydWJpZXMvMS45LjMtcDEyNS9s
    aWIvcnVieS8xLjkuMS94ODZfNjQtbGludXgvZW5jL2VuY2RiLnNv
  • /home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/x86_64-linux/enc/trans/transdb.so
  • /home/trans/.local/lib/ry/rubies/1.9.3-p125/lib/ruby/1.9.1/rubygems/defaults.rb
    ...

=end

Updated by nobu (Nobuyoshi Nakada) almost 12 years ago

  • Status changed from Open to Assigned
  • Assignee set to tenderlovemaking (Aaron Patterson)

Sounds like psych deals with ASCII-8BIT strings as binary data always, even if 7bit only.

Updated by tenderlovemaking (Aaron Patterson) almost 12 years ago

I'm not sure how or if I should fix this. There are two problems: 1) we lose encoding information, and 2) how do we decide what to consider "binary" or not.

For #1, if we treat 7bit only ascii strings as "non-binary", it means that when we load the data back in, the string will be tagged as UTF-8 (since "raw" YAML strings are unicode). e.g. today this test passes, but if we treat 7bit ascii strings as non-binary, it will fail:

s = "hello".encode('ASCII-8BIT')
assert_equal s.encoding, YAML.load(YAML.dump(s)).encoding

For #2, I'm not sure how we decide what is binary and what is not. Should strings that contain null bytes be considered binary? If so, we can't use the ascii_only? method:

"\0".ascii_only? # => true

Given the data loss from #1, and the hardship of #2, I don't think Psych should change. I'm open to suggestions for dealing with these problems.

Nobu: Why are the paths on LOADED_FEATURES encoded as ASCII-8BIT? Shouldn't those paths be tagged with the filesystem encoding?

Updated by tenderlovemaking (Aaron Patterson) over 11 years ago

  • Assignee changed from tenderlovemaking (Aaron Patterson) to nobu (Nobuyoshi Nakada)

Nobu, do you know why the paths on LOADED_FEATURES are encoded as ASCII-8BIT? Shouldn't they be tagged with the filesystem encoding? Thanks.

Actions #4

Updated by nobu (Nobuyoshi Nakada) over 11 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r36800.
Thomas, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


load.c: keep encoding of feature name

  • file.c (rb_find_file_ext_safe, rb_find_file_safe): default to
    US-ASCII for encdb and transdb.
  • load.c (search_required): keep encoding of feature name. set
    loading path to filesystem encoding. [Bug #6377][ruby-core:44750]
  • ruby.c (add_modules, require_libraries): assume default external
    encoding as well as ARGV.
Actions #5

Updated by nobu (Nobuyoshi Nakada) over 11 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby master to Backport193
  • Category deleted (lib)
  • Status changed from Closed to Assigned
  • Assignee changed from nobu (Nobuyoshi Nakada) to usa (Usaku NAKAMURA)
  • Target version deleted (1.9.3)
Actions #6

Updated by usa (Usaku NAKAMURA) over 11 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r37209.
Thomas, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 36800: [Backport #6377]

  • file.c (rb_find_file_ext_safe, rb_find_file_safe): default to
    US-ASCII for encdb and transdb.

  • load.c (search_required): keep encoding of feature name. set
    loading path to filesystem encoding. [Bug #6377][ruby-core:44750]

  • ruby.c (add_modules, require_libraries): assume default external
    encoding as well as ARGV.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0