Project

General

Profile

Actions

Bug #10111

closed

gdbm truncated UTF-8 data problem

Added by testors (KiHyun Kang) over 9 years ago. Updated about 9 years ago.

Status:
Rejected
Target version:
-
ruby -v:
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
[ruby-core:64198]

Description

Reproducible script is here.

# coding: utf-8
require 'gdbm'

data = "\xEA\xB0\x80ABCDEF"
db = GDBM.new( 'test.db', 0666 )
db['key'] = data

throw 'data truncated!!' if db['key'] != data

Updated by nobu (Nobuyoshi Nakada) over 9 years ago

gdbm doesn't preserve encodings now.

Updated by testors (KiHyun Kang) over 9 years ago

Nobuyoshi Nakada wrote:

gdbm doesn't preserve encodings now.

gdbm doesn't have to preserve encodings.

ext/dbm works well but ext/gdbm because ext/gdbm is using 'length' to get size.

'length' is not suitable to determine actual size.

use 'bytesize' instead of 'length'.

Updated by nobu (Nobuyoshi Nakada) over 9 years ago

KiHyun Kang wrote:

Nobuyoshi Nakada wrote:

gdbm doesn't preserve encodings now.

gdbm doesn't have to preserve encodings.

$ ./ruby -v -rgdbm -e 'data = "\xEA\xB0\x80ABCDEF"' -e 'db = GDBM.new("test.db", 0666)' -e 'db["key"] = data' -e 'p db["key"] == data.b'
ruby 2.1.2p195 (2014-08-04 revision 47056) [x86_64-darwin13.0]
true

ext/dbm works well but ext/gdbm because ext/gdbm is using 'length' to get size.

'length' is not suitable to determine actual size.

use 'bytesize' instead of 'length'.

I can't understand what you mean at all.

Updated by akr (Akira Tanaka) over 9 years ago

The data is not truncated but has a different encoding (as nobu pointed at first).

% cat t.gdbm.rb
# coding: utf-8
require 'gdbm'

data = "\xEA\xB0\x80ABCDEF"
db = GDBM.new( 'test.db', 0666 )
db['key'] = data

p [db['key'].b, db['key'].encoding]
p [data.b, data.encoding]
throw 'data truncated!!' if db['key'] != data
% ./ruby -v t.gdbm.rb
ruby 2.2.0dev (2014-08-15 trunk 47187) [x86_64-linux]
["\xEA\xB0\x80ABCDEF", #<Encoding:ASCII-8BIT>]
["\xEA\xB0\x80ABCDEF", #<Encoding:UTF-8>]
t.gdbm.rb:10:in `throw': uncaught throw "data truncated!!" (ArgumentError)
	from t.gdbm.rb:10:in `<main>'

dbm behaves same as gdbm.

% cat t.dbm.rb 
# coding: utf-8
require 'dbm'

data = "\xEA\xB0\x80ABCDEF"
db = DBM.new( 'test.db', 0666 )
db['key'] = data

p [db['key'].b, db['key'].encoding]
p [data.b, data.encoding]
throw 'data truncated!!' if db['key'] != data
% ./ruby -v t.dbm.rb 
ruby 2.2.0dev (2014-08-15 trunk 47187) [x86_64-linux]
["\xEA\xB0\x80ABCDEF", #<Encoding:ASCII-8BIT>]
["\xEA\xB0\x80ABCDEF", #<Encoding:UTF-8>]
t.dbm.rb:10:in `throw': uncaught throw "data truncated!!" (ArgumentError)
	from t.dbm.rb:10:in `<main>'
Actions #5

Updated by akr (Akira Tanaka) about 9 years ago

  • Status changed from Open to Rejected

gdbm (and dbm) doesn't record encoding.
So, current behavior is natural and not a bug, I think.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0