Actions
Feature #20196
openProposal: Binary data literal
    Feature #20196:
    Proposal: Binary data literal
  
Status:
Open
Assignee:
-
Target version:
-
Description
I sometimes find myself needing to write some bytes in a Ruby string literal, and this experience leaves a lot to be desired:
- Bare strings don't work (potential for encoding corruption unless you remember to 
.force_encodingand never copy-paste just the literal into somewhere else) and are not particularly pleasant given all of the backslashes - Wrapping this in 
String.new("\x89PNG\r\n\x1A\n\x00\x00\x00\rIHDR\x00\x00\x00\x00\x00\x00\b\x06\x00\x00\x00\xE2\x98w8\x00\x000%IDAT", encoding: 'BINARY')is better, but many tools explode with this because they expect all strings to be valid UTF-8 even if they're an argument to String.new, and it still doesn't have the "beauty" one might expect from Ruby (also it's not frozen unless you also freeze it) - 
["9805e474d0a0a1a0000000d094844425000000060000000680600000002e8977830000035294441445"].pack("h*")parses in all tools and is less harsh to look at, but if you're writing binary data, you probably want to annotate it 
Here's my basic syntax proposal:
%b[
  89504e470d0a1a0a # PNG header
  0000000d         # Length = 13 bytes
  49484452         # IHDR chunk
  00000060         # Width = 96px
  00000060         # Height = 96px
  08 06            # 8bpp RGBA
  00 00 00         # deflate / no filter / non-interlaced
]
# => "\x89PNG\r\n\x1A\n\x00\x00\x00\rIHDR\x00\x00\x00`\x00\x00\x00`\b\x06\x00\x00\x00"
More formally:
- To match the nibble ordering of a regular string escape, the hex characters are high nibble first (the same as the 
Hunpack character). - It follows the same rules as other percent literals, and I am flexible on what character is used. I chose 
bbecausehcould be confusing paired with theh/Hunpack characters and the inverted meaning. - We could say that high-nibble-first is capitalized and the lower-case version is low-nibble-first, but I imagine most people will want high-nibble-first. We could also say that 
%b[]returns aStringbut%B[]returns anIO::Buffer, which has greater utility than having the capability of writing low-nibble-first literals - Whitespace is ignored
 - Comments are allowed
 - The encoding is always 
Encoding::BINARY - The resulting string is always frozen (and if 
%B[]means buffer then that is read-only as well) - a-f can also be written A-F
 
Things to consider:
- Interpolation could be allowed like normal strings
 - Embedding strings could be allowed (example below)
 - 
?literals (characters) should be handled identically to how other kinds of strings are embedded if that is allowed - If interpolation is allowed and you interpolate a number, this should either interpolate 
.to_sas you would expect in a string or raise an error, because there is no unsurprising way to take a number and convert it to one or more bytes - Strings encoded as 
Encoding::BINARYcould have their.inspectoutput use this literal - When dealing with bitmasks, it's often convenient to write them out in binary instead of hex so the on bits are easier to identify, but there is no syntax for that here that I am fond of... but someone might have an idea. I thought about 
.00001111or!00001111with mandatory whitespace before resuming hex characters, but those didn't feel right to me 
Example with embedded strings:
%b[
  89 "PNG" 0d0a1a0a # PNG header
  0000000d          # Length = 13 bytes
  "IHDR"            # IHDR chunk
  00000060          # Width = 96px
  00000060          # Height = 96px
  08 06             # 8bpp RGBA
  00 00 00          # deflate / no filter / non-interlaced
]
Example with interpolation:
%b[
  #{png_header}
  #{ihdr = chunk(:ihdr, width: 96, height: 96, bpp: 8, format: :rgba)}
  #{png_crc(ihdr)} # I didn't include this in the other examples but I needed something to demonstrate here
]
Other possible alternatives:
- A library (possibly standard library/gem) could have a function like 
binarytake a string (potentially a heredoc) and parse it according to the same rules I wrote above. You would have to make the parser strip whitespace and comments, and only hex bytes could be interpolated. - A new pack/unpack symbol could be created that does the same thing as above, so you could 
["hex #comments\netc"].pack("...") - You could probably do a lot of this with an array of hex strings and 
packbut it doesn't allow for freeform whitespace and the way you do it is not obvious without reading the docs forpack... and also you allocate a bunch of strings you don't need - A 
Data-like object more closely related toIO::Buffercould be defined that declares the size of things contained within a buffer and then a constant could be written to create an instance of the Data-subclass containing the actual data you want to write out ... but this is a lot of work 
Potential users:
- People writing protocol code in Ruby
 - People who need to write out magic constants (in my case: the RDB encoding of a Redis value)
 - People using something like Metasploit Framework to reverse engineer something
 - Tools could e.g. disassemble x86 into a literal with comments showing the assembly mnemonics
 
Actions