Backport #8165

Problems with require

Added by Alexandr Kruglov about 1 year ago. Updated 12 months ago.

[ruby-core:53733]
Status:Closed
Priority:Normal
Assignee:Tomoyuki Chikanaga

Description

Require doesn't work correctly if path contains cyrillic.

require '/home/mak/test.rb' #-> true
require '/home/mak/test.rb' #-> false

require '/home/mak/Проекты/test.rb' #-> true
require '/home/mak/Проекты/test.rb' #-> true

0001-load.c-fix-require-with-non-ascii-path.patch Magnifier (3.43 KB) Hiroshi Shirosaki, 03/29/2013 07:26 PM

Associated revisions

Revision 40398
Added by Tomoyuki Chikanaga 12 months ago

merge revision(s) 40135,40148,40173: [Backport #8165]

* load.c (features_index_add): use rb_str_subseq() to specify C string
  position properly to fix require non ascii path.
   [Bug #8165]

* test/ruby/test_require.rb (TestRequire#test_require_nonascii_path):
  a test for the above.

* test/ruby/test_require.rb (TestRequire#test_require_nonascii_path):
  fix load path for encoding to run the test as stand-alone.

* test/ruby/test_require.rb (TestRequire#test_require_nonascii_path):
  RUBY_PLATFORM should escape as Regexp,
  because RUBY_PLATFORM may contain '.'.

History

#1 Updated by Luis Lavena about 1 year ago

  • Status changed from Open to Feedback

Hello,

Please provide more details of the platform/OS, version and filesystem used.

Thank you.

#2 Updated by Yusuke Endoh about 1 year ago

  • Status changed from Feedback to Assigned
  • Assignee set to Masaya Tarui
  • Target version set to 2.1.0

It reproduces on my Ubuntu 12.10 with ext4 filesystem.

$ cat t.rb

coding: UTF-8

p require("/home/mame/あいうえお/test.rb")
p require("/home/mame/あいうえお/test.rb")
p require("/home/mame/あいうえお/test.rb")
p(*$")

$ ruby -v t.rb
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]
true
true
true
"enumerator.so"
snip
"/home/mame/あいうえお/test.rb"
"/home/mame/あいうえお/test.rb"
"/home/mame/あいうえお/test.rb"

$ ruby-1.9.3-p392 -v t.rb
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-linux]
true
false
false
"enumerator.so"
snip
"/home/mame/あいうえお/test.rb"

I guess this is related to the performance improvement of require in 2.0.0.
Tarui-san or Shirosaki-san, do you have any idea?

Yusuke Endoh mame@tsg.ne.jp

#3 Updated by Yura Sokolov about 1 year ago

This is occasionally fixed in trunk by https://bugs.ruby-lang.org/issues/8048 ,
by de-abusing Ruby Strings and using plain C strings.

Also backported version of https://bugs.ruby-lang.org/issues/8158 fixes this too,
cause it removes string comparison at all.

#4 Updated by Hiroshi Shirosaki about 1 year ago

It seems short_feature is not proper if the path contains non-ascii characters.
rbstrsubstr() would look up string position per character, not per byte. rbstrsubseq() seems to work for non-ascii.

diff --git a/load.c b/load.c
index 7c69461..dab493d 100644
--- a/load.c
+++ b/load.c
@@ -247,16 +247,16 @@ featuresindexadd(VALUE feature, VALUE offset)
if (p < featurestr)
break;
/* Now *p == '/'. We reach this point for every '/' in feature. */
- short
feature = rbstrsubstr(feature, p + 1 - featurestr, featureend - p - 1);
+ shortfeature = rbstrsubseq(feature, p + 1 - featurestr, featureend - p - 1);
features
indexaddsingle(shortfeature, offset);
if (ext) {
- short
feature = rbstrsubstr(feature, p + 1 - featurestr, ext - p - 1);
+ short
feature = rbstrsubseq(feature, p + 1 - featurestr, ext - p - 1);
features
indexaddsingle(shortfeature, offset);
}
}
features
indexaddsingle(feature, offset);
if (ext) {
- shortfeature = rbstrsubstr(feature, 0, ext - featurestr);
+ shortfeature = rbstrsubseq(feature, 0, ext - featurestr);
featuresindexaddsingle(shortfeature, offset);
}
}

#5 Updated by Hiroshi Shirosaki about 1 year ago

One more issue seems string encoding.
When looking up a feature name in rbfeaturep(), encoding information lacks.
So `short_feature' should not have encoding.
If encoding of the two non-ascii strings is different, hash value would be different.

load.c: rbfeaturep()

feature_val = rb_str_new(feature, len); // not have encoding

this_feature_index = rb_hash_lookup(features_index, feature_val);

I've attached a patch. Tested on both trunk and 2.0.0 branch.

ruby 2.1.0dev (2013-03-29 trunk 39991) [x8664-linux]
ruby 2.0.0p100 (2013-03-27 revision 39954) [x86
64-linux]

#6 Updated by Yura Sokolov about 1 year ago

May be it will be simpler to accept https://bugs.ruby-lang.org/issues/8158 ?
It has less memory requirement, avoids string comparisons, etc

(yes, i'm annoying a bit, sorry)

2013/3/29 h.shirosaki (Hiroshi Shirosaki) h.shirosaki@gmail.com

Issue #8165 has been updated by h.shirosaki (Hiroshi Shirosaki).

File 0001-load.c-fix-require-with-non-ascii-path.patch added

One more issue seems string encoding.
When looking up a feature name in rbfeaturep(), encoding information
lacks.
So `short_feature' should not have encoding.
If encoding of the two non-ascii strings is different, hash value would be
different.

load.c: rbfeaturep()

feature_val = rb_str_new(feature, len); // not have encoding

this_feature_index = rb_hash_lookup(features_index, feature_val);

I've attached a patch. Tested on both trunk and 2.0.0 branch.

ruby 2.1.0dev (2013-03-29 trunk 39991) [x86_64-linux]

ruby 2.0.0p100 (2013-03-27 revision 39954) [x86_64-linux]

Bug #8165: Problems with require
https://bugs.ruby-lang.org/issues/8165#change-38028

Author: Krugloff (Alexandr Kruglov)
Status: Assigned
Priority: Normal
Assignee: tarui (Masaya Tarui)
Category:
Target version: current: 2.1.0
ruby -v: ruby 2.0.0-p0

Require doesn't work correctly if path contains cyrillic.

require '/home/mak/test.rb' #-> true
require '/home/mak/test.rb' #-> false

require '/home/mak/Проекты/test.rb' #-> true
require '/home/mak/Проекты/test.rb' #-> true

http://bugs.ruby-lang.org/

#7 Updated by Hiroshi Shirosaki about 1 year ago

funny_falcon (Yura Sokolov) wrote:

May be it will be simpler to accept https://bugs.ruby-lang.org/issues/8158 ?
It has less memory requirement, avoids string comparisons, etc

Indeed your patch would not have this bug since using C string instead of string object.
I think smaller patch or splitting patch might be better to be reviewed.

#8 Updated by Yura Sokolov about 1 year ago

My patch doesn't store any string value, only hashes of them. It relies on
loadedfeaturepath for string comparison.

Nor it uses ruby arrays for index storage - it organize them in linked
lists which are stored in a single C array.
29.03.2013 20:14 пользователь "h.shirosaki (Hiroshi Shirosaki)" <
h.shirosaki@gmail.com> написал:

Issue #8165 has been updated by h.shirosaki (Hiroshi Shirosaki).

funny_falcon (Yura Sokolov) wrote:

May be it will be simpler to accept
https://bugs.ruby-lang.org/issues/8158 ?
It has less memory requirement, avoids string comparisons, etc

Indeed your patch would not have this bug since using C string instead of
string object.
I think smaller patch or splitting patch might be better to be reviewed.


Bug #8165: Problems with require
https://bugs.ruby-lang.org/issues/8165#change-38032

Author: Krugloff (Alexandr Kruglov)
Status: Assigned
Priority: Normal
Assignee: tarui (Masaya Tarui)
Category:
Target version: current: 2.1.0
ruby -v: ruby 2.0.0-p0

Require doesn't work correctly if path contains cyrillic.

require '/home/mak/test.rb' #-> true
require '/home/mak/test.rb' #-> false

require '/home/mak/Проекты/test.rb' #-> true
require '/home/mak/Проекты/test.rb' #-> true

http://bugs.ruby-lang.org/

#9 Updated by Anonymous about 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r40135.
Alexandr, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • load.c (featuresindexadd): use rbstrsubseq() to specify C string
    position properly to fix require non ascii path.
    [Bug #8165]

  • test/ruby/testrequire.rb (TestRequire#testrequirenonasciipath):
    a test for the above.

#10 Updated by Tomoyuki Chikanaga about 1 year ago

  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport200
  • Status changed from Closed to Assigned
  • Assignee changed from Masaya Tarui to Tomoyuki Chikanaga
  • Target version deleted (2.1.0)

#11 Updated by Tomoyuki Chikanaga about 1 year ago

working memo: backport 40135,40148,40173 but ruby/testrequire.rb TestRequire#testrequirenonasciipath fails.

#12 Updated by Hiroshi Shirosaki about 1 year ago

nagachika (Tomoyuki Chikanaga) wrote:

working memo: backport 40135,40148,40173 but ruby/testrequire.rb TestRequire#testrequirenonasciipath fails.

Thank you for backport. #8048 would also be needed for this issue.
Hash lookup of feature_index fails due to String encoding of the key. r39874 changes hash key from ruby String to C string.

#13 Updated by Tomoyuki Chikanaga about 1 year ago

shirosaki san
Thank you for your advice! I'll try to backport with r39874 later.

#14 Updated by Tomoyuki Chikanaga 12 months ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r40398.
Alexandr, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 40135,40148,40173: [Backport #8165]

* load.c (features_index_add): use rb_str_subseq() to specify C string
  position properly to fix require non ascii path.
   [Bug #8165]

* test/ruby/test_require.rb (TestRequire#test_require_nonascii_path):
  a test for the above.

* test/ruby/test_require.rb (TestRequire#test_require_nonascii_path):
  fix load path for encoding to run the test as stand-alone.

* test/ruby/test_require.rb (TestRequire#test_require_nonascii_path):
  RUBY_PLATFORM should escape as Regexp,
  because RUBY_PLATFORM may contain '.'.

Also available in: Atom PDF