Project

General

Profile

Actions

Feature #21796

closed

unpack variant that returns the final offset

Feature #21796: unpack variant that returns the final offset
1

Added by nobu (Nobuyoshi Nakada) 2 months ago. Updated 11 days ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:124312]

Description

mame (Yusuke Endoh) wrote in #note-4:

It's a shame unpack doesn't tell you how many bytes it read. You'd probably want a unpack variant that returns the final offset too, or a specifier that returns the current offset (like o?).

bytes = "\x01\x02\x03"
offset = 0
leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3

mame (Yusuke Endoh) wrote in #note-6:

You could tell how many bytes you read based on the size of the leb128_value returned.

That apparoach is unreliable because LEB128 is redundant. For example, both "\x03" and "\x83\x00" are valid LEB128 encodings of the value 3.
See the note of the section Values - Integers, in the Wasm spec.
https://webassembly.github.io/spec/core/binary/values.html#integers


Related issues 1 (0 open1 closed)

Related to Ruby - Feature #21785: Add signed and unsigned LEB128 support to pack / unpackClosedActions

Updated by byroot (Jean Boussier) 2 months ago Actions #1

  • Description updated (diff)

Updated by byroot (Jean Boussier) 2 months ago Actions #2

  • Related to Feature #21785: Add signed and unsigned LEB128 support to pack / unpack added

Updated by byroot (Jean Boussier) 2 months ago Actions #3 [ruby-core:124314]

It would be useful indeed, but I'm not sure a new method is the best way?

I think the simplest would be a new keyword parameter:

offset, *values = bytes.unpack("Ro", offset: offset, return_offset:true)

Another possibility would be to add an unpack like method to StringScanner, for the case where you want to iteratively deserialize a binary string.

Updated by tenderlovemaking (Aaron Patterson) 2 months ago Actions #4 [ruby-core:124325]

I really like this idea. @jhawthorn (John Hawthorn) suggested ^ instead of o though, and I really like it.

bytes = "\x01\x02\x03"
offset = 0
leb128_value1, offset = bytes.unpack("R^", offset: offset) #=> 1
leb128_value2, offset = bytes.unpack("R^", offset: offset) #=> 2
leb128_value3, offset = bytes.unpack("R^", offset: offset) #=> 3

I think the simplest would be a new keyword parameter

Why a new parameter? You might be interested in more than one location. We already have pack directives for skipping bytes (@, X, and x). It seems natural to add a directive to return the current offset.

Another possibility would be to add an unpack like method to StringScanner, for the case where you want to iteratively deserialize a binary string.

I think this would be very useful in general, but I think maybe a separate Redmine ticket?

Updated by byroot (Jean Boussier) 2 months ago Actions #5 [ruby-core:124328]

Why a new parameter?

because I misread the ticket, I didn't notice the o.

I do think ^ for offset is pure genius though.

Updated by matz (Yukihiro Matsumoto) 2 months ago Actions #6 [ruby-core:124347]

I like ^ specifier too.

Matz.

Updated by nobu (Nobuyoshi Nakada) about 2 months ago ยท Edited Actions #7 [ruby-core:124389]

This might be useful for A, a, and Z as well.
Updated the PR GH-15647 to use ^ with the tests.

Updated by matz (Yukihiro Matsumoto) 11 days ago Actions #8 [ruby-core:124777]

Go ahead.

Matz.

Updated by nobu (Nobuyoshi Nakada) 11 days ago Actions #9

  • Status changed from Open to Closed

Applied in changeset git|98269b6d64f26d1e8f22f3d8fddd30393f009e17.


[Feature #21796] unpack variant ^ that returns the final offset (#15647)

[Feature #21796] unpack variant ^ that returns the current offset

Actions

Also available in: PDF Atom