Project

General

Profile

Actions

Feature #21785

closed

Add signed and unsigned LEB128 support to pack / unpack

Feature #21785: Add signed and unsigned LEB128 support to pack / unpack

Added by tenderlovemaking (Aaron Patterson) 2 months ago. Updated 11 days ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:124258]

Description

Hi,

I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128

LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with.

I sent a pull request here: https://github.com/ruby/ruby/pull/15589

I'm proposing K for the unsigned version and k for the signed version. I just picked k because it was available, I'm open to other format strings.

Thanks for consideration!


Related issues 1 (0 open1 closed)

Related to Ruby - Feature #21796: unpack variant that returns the final offsetClosedActions

Updated by tenderlovemaking (Aaron Patterson) 2 months ago Actions #1 [ruby-core:124259]

Sorry, I probably should have put an example in the original post. Here is a sample of the usage:

irb(main):003> [0xFFF].pack("K")
=> "\xFF\x1F"
irb(main):004> [0xFFF].pack("K").unpack1("K")
=> 4095
irb(main):005> [-123].pack("k")
=> "\x85\x7F"
irb(main):006> [-123].pack("k").unpack1("k")
=> -123

Updated by matz (Yukihiro Matsumoto) 2 months ago Actions #2 [ruby-core:124268]

I am positive about the addition of LEB128. But I don't really like K/k because it doesn't remind me of LEB128 at all (though I know we've used L, E, B already).

Given that the only case pairs not yet used are k, r, and y, either R (vaRiable length), or Y (next to W - BER) would be better than K/k.

Matz.

Updated by tenderlovemaking (Aaron Patterson) 2 months ago Actions #3 [ruby-core:124272]

matz (Yukihiro Matsumoto) wrote in #note-2:

I am positive about the addition of LEB128. But I don't really like K/k because it doesn't remind me of LEB128 at all (though I know we've used L, E, B already).

Given that the only case pairs not yet used are k, r, and y, either R (vaRiable length), or Y (next to W - BER) would be better than K/k.

Matz.

Thanks for the feedback. I've updated the patch to use R/r!

Updated by mame (Yusuke Endoh) 2 months ago 1Actions #4 [ruby-core:124287]

It's a shame unpack doesn't tell you how many bytes it read. You'd probably want a unpack variant that returns the final offset too, or a specifier that returns the current offset (like o?).

bytes = "\x01\x02\x03"
offset = 0
leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3

Updated by tenderlovemaking (Aaron Patterson) 2 months ago Actions #5 [ruby-core:124294]

mame (Yusuke Endoh) wrote in #note-4:

It's a shame unpack doesn't tell you how many bytes it read. You'd probably want a unpack variant that returns the final offset too, or a specifier that returns the current offset (like o?).

bytes = "\x01\x02\x03"
offset = 0
leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1
leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2
leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3

You could tell how many bytes you read based on the size of the leb128_value returned. But I agree, getting the information directly from unpack would be nice.

Updated by mame (Yusuke Endoh) 2 months ago Actions #6 [ruby-core:124298]

You could tell how many bytes you read based on the size of the leb128_value returned.

That apparoach is unreliable because LEB128 is redundant. For example, both "\x03" and "\x83\x00" are valid LEB128 encodings of the value 3.
See the note of the section Values - Integers, in the Wasm spec.
https://webassembly.github.io/spec/core/binary/values.html#integers

Updated by tenderlovemaking (Aaron Patterson) 2 months ago Actions #7 [ruby-core:124304]

mame (Yusuke Endoh) wrote in #note-6:

That apparoach is unreliable because LEB128 is redundant. For example, both "\x03" and "\x83\x00" are valid LEB128 encodings of the value 3.

Ah of course. I didn't think about that. 🤦‍♀️

Updated by tenderlovemaking (Aaron Patterson) 2 months ago Actions #8

  • Status changed from Open to Closed

Applied in changeset git|d0b72429a93e54f1f956b4aedfc25c57dc7001aa.


Add support for signed and unsigned LEB128 to pack/unpack.

This commit adds a new pack format command R and r for unsigned and
signed LEB128 encoding. The "r" mnemonic is because this is a
"vaRiable" length encoding scheme.

LEB128 is used in various formats including DWARF, WebAssembly, MQTT,
and Protobuf.

[Feature #21785]

Updated by byroot (Jean Boussier) 2 months ago Actions #9

  • Related to Feature #21796: unpack variant that returns the final offset added

Updated by matz (Yukihiro Matsumoto) 2 months ago Actions #10 [ruby-core:124334]

It is too late to introduce it in Ruby 4.0, let's aim for 4.1.

Matz.

Updated by byroot (Jean Boussier) 2 months ago Actions #11

  • Status changed from Closed to Open

Updated by tenderlovemaking (Aaron Patterson) 19 days ago Actions #12 [ruby-core:124676]

Is it OK if I merge this again?

Thanks

Updated by tenderlovemaking (Aaron Patterson) 11 days ago Actions #14

  • Status changed from Open to Closed

Applied in changeset git|c61f52a012f0a390a869db4825143187ea468d21.


[Feature #21785] Add LEB128 again (#16123)

  • Revert "Revert pack/unpack support for LEB128"

This reverts commit 77c3a9e447ec477be39e00072e1ce3348d0f4533.

  • Update specs for LEB128
Actions

Also available in: PDF Atom