Project

General

Profile

Actions

Bug #10111

closed

gdbm truncated UTF-8 data problem

Added by testors (KiHyun Kang) almost 10 years ago. Updated about 9 years ago.

Status:
Rejected
Target version:
-
ruby -v:
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
[ruby-core:64198]

Description

Reproducible script is here.

# coding: utf-8
require 'gdbm'

data = "\xEA\xB0\x80ABCDEF"
db = GDBM.new( 'test.db', 0666 )
db['key'] = data

throw 'data truncated!!' if db['key'] != data

Updated by nobu (Nobuyoshi Nakada) almost 10 years ago

gdbm doesn't preserve encodings now.

Updated by testors (KiHyun Kang) almost 10 years ago

Nobuyoshi Nakada wrote:

gdbm doesn't preserve encodings now.

gdbm doesn't have to preserve encodings.

ext/dbm works well but ext/gdbm because ext/gdbm is using 'length' to get size.

'length' is not suitable to determine actual size.

use 'bytesize' instead of 'length'.

Updated by nobu (Nobuyoshi Nakada) almost 10 years ago

KiHyun Kang wrote:

Nobuyoshi Nakada wrote:

gdbm doesn't preserve encodings now.

gdbm doesn't have to preserve encodings.

$ ./ruby -v -rgdbm -e 'data = "\xEA\xB0\x80ABCDEF"' -e 'db = GDBM.new("test.db", 0666)' -e 'db["key"] = data' -e 'p db["key"] == data.b'
ruby 2.1.2p195 (2014-08-04 revision 47056) [x86_64-darwin13.0]
true

ext/dbm works well but ext/gdbm because ext/gdbm is using 'length' to get size.

'length' is not suitable to determine actual size.

use 'bytesize' instead of 'length'.

I can't understand what you mean at all.

Updated by akr (Akira Tanaka) almost 10 years ago

The data is not truncated but has a different encoding (as nobu pointed at first).

% cat t.gdbm.rb
# coding: utf-8
require 'gdbm'

data = "\xEA\xB0\x80ABCDEF"
db = GDBM.new( 'test.db', 0666 )
db['key'] = data

p [db['key'].b, db['key'].encoding]
p [data.b, data.encoding]
throw 'data truncated!!' if db['key'] != data
% ./ruby -v t.gdbm.rb
ruby 2.2.0dev (2014-08-15 trunk 47187) [x86_64-linux]
["\xEA\xB0\x80ABCDEF", #<Encoding:ASCII-8BIT>]
["\xEA\xB0\x80ABCDEF", #<Encoding:UTF-8>]
t.gdbm.rb:10:in `throw': uncaught throw "data truncated!!" (ArgumentError)
	from t.gdbm.rb:10:in `<main>'

dbm behaves same as gdbm.

% cat t.dbm.rb 
# coding: utf-8
require 'dbm'

data = "\xEA\xB0\x80ABCDEF"
db = DBM.new( 'test.db', 0666 )
db['key'] = data

p [db['key'].b, db['key'].encoding]
p [data.b, data.encoding]
throw 'data truncated!!' if db['key'] != data
% ./ruby -v t.dbm.rb 
ruby 2.2.0dev (2014-08-15 trunk 47187) [x86_64-linux]
["\xEA\xB0\x80ABCDEF", #<Encoding:ASCII-8BIT>]
["\xEA\xB0\x80ABCDEF", #<Encoding:UTF-8>]
t.dbm.rb:10:in `throw': uncaught throw "data truncated!!" (ArgumentError)
	from t.dbm.rb:10:in `<main>'
Actions #5

Updated by akr (Akira Tanaka) about 9 years ago

  • Status changed from Open to Rejected

gdbm (and dbm) doesn't record encoding.
So, current behavior is natural and not a bug, I think.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0