Bug #18417
closedIO::Buffer problems
Description
Hello. While working for documentation for IO::Buffer
(still WIP), I uncovered these problems. I believe at least some of them should be fixed before the release, though I understand not much time left.
The list is not final, probably (still on it, but wanted to raise awareness ASAP).
These should be probably fixed/discussed before the release
1. I believe to_str
is an unfortunate name for "read string from buffer, maybe entire buffer. It is Ruby's convention that to_str
is an implicit String-conversion, and a) I am not sure the Buffer is "string-like" enough to be implicitly converted, and b) it is really unexpected that #to_s
and #to_str
do different things:
buf = IO::Buffer.for('test')
"buf: #{buf}"
# => "buf: #<IO::Buffer 0x00007fdb7f4e3a20+4 SLICE>"
"buf: " + buf
# => "buf: test"
puts buf
# #<IO::Buffer 0x00007fdb7f4e3a20+4 SLICE>
puts '' + buf
# test
Another concern about #to_str
naming is it is one of the main parts of the buffer interface (alongside #copy
), but doesn't look this way, and looks like some utility method instead. Maybe copy/to_str pair should be symmetrical, like write
/read
, or something like that?
2. I am not sure that type names (:U8
, :S8
, :U16
, :u16
and so on) introduced "from scratch" (?) is a good thing. From one point, they are kinda nice, and consistent. From another, we already have abbreviated names for the types in Array#pack, and having two completely unrelated sets of designation for the same thing in the core of the language seems weird to me. I understand that not all #pack
s designations can be represented by Symbol, and their consistency is questionable (it is S
for 16-bit unsigned LE, s
for 16-bit signed LE, and S>
/s>
for the same in BE), but the discrepancy is questionable too.
3. About flags
as a creation parameter:
-
::new
: it seems that the only acceptable value isMAPPED
; of others,INTERNAL
is the default,EXTERNAL
,IMMUTABLE
andLOCKED
raise on creation; -
::for
: no flags accepted (though it seems that a method to make such buffer immutable might be of some utility?) -
::map
:INTERNAL
,LOCKED
,EXTERNAL
are ignored,MAPPED
is the default; seems like the only usage of the parameter is not to passIMMUTABLE
which is the default: and for that, you'll need to also passsize
andoffset
.
It seems, that we shouldn't probably expose "flags" concept to the user at all, and instead make APIs like ::new(size, mapped: false)
and ::map(io, ..., immutable: true)
?
Probably bugs
- It seems that
#clear
behaves weirdly whenoffset
is provided butlength
is not:
buf = IO::Buffer.new(4)
buf.clear(1, 2) # it seems obvious that I want to clear from byte 2 to the end of the buffer
# in `clear': Offset + length out of bounds! (RuntimeError)
-
IO::Buffer.map('README.md')
(just a string, I experimented "whether it is smart enough") core dumps. - I suspect we might want to set buffer to
IMMUTABLE
if it is created from the frozen string. Currently, this works:
str = 'test'.freeze
buf = IO::Buffer.for(str)
buf.set(:U8, 1, 111)
str
#=> "tost"
- It seems
to_str
always returns a string inUS-ASCII
encoding, I am not sure it is the right behavior. For a low-level buffer, I'd rather expectASCII-8BIT
(what file read in binary mode produces in Ruby). - (Not sure if a bug or just not very useful behavior?)
#resize
applied to a buffer mapped from file seems to "break" the link between the buffer and file; no furthercopy
orset
would be reflected on file. buf = IO::Buffer.new(4); buf.free; buf.inspect
gives a core dumpbuf = IO::Buffer.for("test"); buf.free
— after that,buf.get(:U8, 0)
gives "Buffer has been invalidated! (RuntimeError)", butbuf.to_str
works and produces"\x00\x00\x00\x00"
;buf.to_str(2)
does a core dump
Inconsistencies and other problems
- The buffer raises
RuntimeError
as the only type of the error. I believe that in some places, it should be other standard errors (for example, "Offset + length out of bounds" could be at leastArumentError
?..), and in other occurrences, special error classes: for example, "Buffer is locked" would probably be helpful to have as someIO::Buffer::LockedError
Current#inspect
is kinda nice and useful, but it also might be really annoying when used by Ruby itself. TryIO::Buffer.map(File.open("big file")).unexisting_method
, and you'll get REALLY long error message.Can we maybe have for#resize
a signatureresize(new_size, preserve = self.size)
? It seems a good default to allow justresize(123)
, with preserving everything by default?
Updated by zverok (Victor Shepelev) about 3 years ago
- Description updated (diff)
Updated by zverok (Victor Shepelev) about 3 years ago
- Description updated (diff)
Updated by zverok (Victor Shepelev) about 3 years ago
- Description updated (diff)
Updated by mame (Yusuke Endoh) about 3 years ago
- Assignee set to ioquatix (Samuel Williams)
Updated by zverok (Victor Shepelev) about 3 years ago
- Description updated (diff)
Updated by zverok (Victor Shepelev) about 3 years ago
- Description updated (diff)
Updated by zverok (Victor Shepelev) about 3 years ago
- Description updated (diff)
Updated by zverok (Victor Shepelev) about 3 years ago
- Description updated (diff)
Updated by zverok (Victor Shepelev) about 3 years ago
- Description updated (diff)
(Got the latest master
, updated the problems list :))
Updated by zverok (Victor Shepelev) about 3 years ago
- Description updated (diff)
Updated by ioquatix (Samuel Williams) about 3 years ago
Thanks so much for this awesome list of things we should try to address.
https://github.com/ruby/ruby/pull/5303
We are considering how to_str
should work. Implicit conversion is useful but it could also introduce issues.
Another concern about #to_str naming is it is one of the main parts of the buffer interface (alongside #copy), but doesn't look this way, and looks like some utility method instead. Maybe copy/to_str pair should be symmetrical, like write/read, or something like that?
We need to be a little bit careful because string operations are not the primary point, and we have to also consider that such names clash with IO-like behaviour, e.g. we might use buffer.write(io, ...)
or buffer.write_to(io, ...)
...
Let me first try to address all the bugs, then with the remaining time let's work on features/changes. This interface is marked experimental for a reason and I expect there will be some big changes between 3.1. release and 3.2 release in terms of the user facing interface so while I'd prefer we get this all sorted, I'm not too concerned if we don't have all the answers before the release of 3.1.
Updated by ioquatix (Samuel Williams) about 3 years ago
Just for reference, link to the latest draft of the absolutely amazing documentation: https://github.com/ruby/ruby/pull/5302
Updated by ioquatix (Samuel Williams) about 3 years ago
(2) I am not sure that type names (:U8, :S8, :U16, :u16 and so on) introduced "from scratch" (?) is a good thing. From one point, they are kinda nice, and consistent. From another, we already have abbreviated names for the types in Array#pack, and having two completely unrelated sets of designation for the same thing in the core of the language seems weird to me. I understand that not all #packs designations can be represented by Symbol, and their consistency is questionable (it is S for 16-bit unsigned LE, s for 16-bit signed LE, and S>/s> for the same in BE), but the discrepancy is questionable too.
These type names are based on a common convention which I've seen is used across different languages/environments. Linux kernel follows the exact convention to use __s32
and __u32
which is where these names came from, since I'm used to seeing them frequently in source code for precise bit widths and signed/unsigned. Rust has some similar conventions.pack
and unpack
are not obvious and I end up having to look up the documentation every time.
It's also entirely possible we will introduce packed formats, e.g. :u1
, etc for efficient unpacking of data, so specifying the exact bit width which corresponds to say, an RFC's definition for a binary structure.
# +---------------+
# |Pad Length? (8)|
# +-+-------------+-----------------------------------------------+
# |E| Stream Dependency? (31) |
# +-+-------------+-----------------------------------------------+
# | Weight? (8) |
# +-+-------------+-----------------------------------------------+
# | Header Block Fragment (*) ...
# +---------------------------------------------------------------+
# | Padding (*) ...
# +---------------------------------------------------------------+
FrameHeader = IO::Buffer::Packed[
pad: :u8,
exclusive: :u1,
stream_dependency: :u31,
weight: :u8,
]
Hopefully this gives you some more insights into what I'm thinking and then we can see if it still makes sense.
Updated by zverok (Victor Shepelev) about 3 years ago
We need to be a little bit careful because string operations are not the primary point
Interesting! Because I started to document most of the usage from the point of view of some code mostly using get
and set
, but then I looked into your examples of its intended here, and for what I can get from there, it is:
- uring-based C example uses
rb_io_buffer_get_mutable
/rb_io_buffer_get_immutable
to have a direct access to a memory represented by the buffer - epoll-based C example does the same
- select-based Ruby example uses
copy
/to_str
—that's where I got the idea that it is the main interface (at least from Ruby PoV)
As rb_io_buffer_get_mutable
/rb_io_buffer_get_immutable
aren't exposed on Ruby level at all (IDK, maybe we do want to mention the "additional" API available for C extensions? It is kinda important), that leaves the copy
/to_str` as the "main" ones at this level.
we have to also consider that such names clash with IO-like behaviour
I agree read
/write
are not the best choice of words probably, but I still believe the choice should be symmetrical... Maybe something "dumb" (acceptable for rarely used low-level APIs?) like get_chunk
and put_chunk
, or something?..
These type names are based on a common convention which I've seen is used across different languages/environments.
I agree these names are better! I am just spotting a discrepancy and reporting it, if it is considered OK by core devs (which I am not even one of :)), so be it. I don't have a useful proposal here, because changing Array#pack
convention is impossible, and using that old convention for IO::Buffer
is not convenient either.
Updated by ioquatix (Samuel Williams) about 3 years ago
Yes that all makes sense.
After spending some time today working on it, I decided I don't like the mutable/immutable terminology. The problem is, even if the buffer is "immutable", the underlying data can still change. This is the unfortunate reality of how C/von Neumann computer works.
So, I'm going to rename it to readonly
which In think better conveys the meaning of "A view of a buffer which cannot be used to modify it."
Hopefully this will be merged in the linked PR shortly.
Updated by ioquatix (Samuel Williams) about 3 years ago
- Status changed from Open to Closed
All the issues here have been addressed, except for:
(1) get_value
/ set_value
type names have not changed, but the method name has changed.
(2) Using flags to indicate allocation strategy. I will review it and document the expected behaviour from Ruby.
Because this interface is marked as experimental, I'm not too concerned about either.