Bug #10111
closedgdbm truncated UTF-8 data problem
Description
Reproducible script is here.
# coding: utf-8
require 'gdbm'
data = "\xEA\xB0\x80ABCDEF"
db = GDBM.new( 'test.db', 0666 )
db['key'] = data
throw 'data truncated!!' if db['key'] != data
Updated by nobu (Nobuyoshi Nakada) almost 10 years ago
gdbm doesn't preserve encodings now.
Updated by testors (KiHyun Kang) almost 10 years ago
Nobuyoshi Nakada wrote:
gdbm doesn't preserve encodings now.
gdbm doesn't have to preserve encodings.
ext/dbm works well but ext/gdbm because ext/gdbm is using 'length' to get size.
'length' is not suitable to determine actual size.
use 'bytesize' instead of 'length'.
Updated by nobu (Nobuyoshi Nakada) almost 10 years ago
KiHyun Kang wrote:
Nobuyoshi Nakada wrote:
gdbm doesn't preserve encodings now.
gdbm doesn't have to preserve encodings.
$ ./ruby -v -rgdbm -e 'data = "\xEA\xB0\x80ABCDEF"' -e 'db = GDBM.new("test.db", 0666)' -e 'db["key"] = data' -e 'p db["key"] == data.b'
ruby 2.1.2p195 (2014-08-04 revision 47056) [x86_64-darwin13.0]
true
ext/dbm works well but ext/gdbm because ext/gdbm is using 'length' to get size.
'length' is not suitable to determine actual size.
use 'bytesize' instead of 'length'.
I can't understand what you mean at all.
Updated by akr (Akira Tanaka) almost 10 years ago
The data is not truncated but has a different encoding (as nobu pointed at first).
% cat t.gdbm.rb
# coding: utf-8
require 'gdbm'
data = "\xEA\xB0\x80ABCDEF"
db = GDBM.new( 'test.db', 0666 )
db['key'] = data
p [db['key'].b, db['key'].encoding]
p [data.b, data.encoding]
throw 'data truncated!!' if db['key'] != data
% ./ruby -v t.gdbm.rb
ruby 2.2.0dev (2014-08-15 trunk 47187) [x86_64-linux]
["\xEA\xB0\x80ABCDEF", #<Encoding:ASCII-8BIT>]
["\xEA\xB0\x80ABCDEF", #<Encoding:UTF-8>]
t.gdbm.rb:10:in `throw': uncaught throw "data truncated!!" (ArgumentError)
from t.gdbm.rb:10:in `<main>'
dbm behaves same as gdbm.
% cat t.dbm.rb
# coding: utf-8
require 'dbm'
data = "\xEA\xB0\x80ABCDEF"
db = DBM.new( 'test.db', 0666 )
db['key'] = data
p [db['key'].b, db['key'].encoding]
p [data.b, data.encoding]
throw 'data truncated!!' if db['key'] != data
% ./ruby -v t.dbm.rb
ruby 2.2.0dev (2014-08-15 trunk 47187) [x86_64-linux]
["\xEA\xB0\x80ABCDEF", #<Encoding:ASCII-8BIT>]
["\xEA\xB0\x80ABCDEF", #<Encoding:UTF-8>]
t.dbm.rb:10:in `throw': uncaught throw "data truncated!!" (ArgumentError)
from t.dbm.rb:10:in `<main>'
Updated by akr (Akira Tanaka) about 9 years ago
- Status changed from Open to Rejected
gdbm (and dbm) doesn't record encoding.
So, current behavior is natural and not a bug, I think.