Project

General

Profile

Actions

Backport #8795

closed

"Null byte in string error" on Marshal.load

Added by mml (McClain Looney) over 10 years ago. Updated over 10 years ago.


Description

I have about 2M serialized ruby objects in a database (don't ask). The objects are serialized via Marshal.dump, then zipped, then base64 encoded before being saved. After upgrading to 2.0 (built from source), a tiny minority (3-4) of objects thusly stored will fail to Marshal.load, with "ArgumentError: Null byte in string"

Given that the other 1.9M objects load just fine, and the issue never manifested in 1.8.7 MRI, and further, that the zip CRC's & such were not corrupted, I suspect there may be some subtle bug in the Marshal.dump code.

Please see attached file for a sample. I'd be happy if there'd even be any way to "fix" said broken string, or even any insight into what might be going on, but I'm more worried that there may be a Marshal bug lurking..


Files

badblob (20 KB) badblob mml (McClain Looney), 08/16/2013 10:44 PM
badblob.raw (14.5 KB) badblob.raw mml (McClain Looney), 08/17/2013 10:30 AM

Updated by mml (McClain Looney) over 10 years ago

Doh! Sorry, wrote down the wrong error message "Null byte in string" should read "string contains null byte (ArgumentError)"

Updated by mml (McClain Looney) over 10 years ago

working on a recipe to demonstrate the error..

Updated by mml (McClain Looney) over 10 years ago

require 'bigdecimal'
class Model< Hash;end
class Product< Hash;end
puts "well, that worked just fine! #{Marshal.load eval(File.read("badblob"))}"

->show.rb:4:in _load': string contains null byte (ArgumentError) from show.rb:4:in load'
from show.rb:4:in `'

Interestingly, jruby manages to deserialize the object without issue

Updated by mml (McClain Looney) over 10 years ago

seems the call to memchr on line 1496 of string.c is returning truthy. The string passed to it is 11 bytes of null (0x00). You can see these bytes in the input starting at offset 19592. not sure what they are for, or where they came from..

Updated by mml (McClain Looney) over 10 years ago

tracing the load progress by passing a proc to load yields:

loading true
...
loading {nil=>nil}
./show.rb:7:in `load': dump format error (user class) (ArgumentError)

I've hacked my ruby to not call memchr, an it happily loads my test string, unless i pass a proc as above.

This leads me to wonder if the bug might not be in Marshal.dump? It doesn't seem like that null string should be there in the first place.

Updated by drbrain (Eric Hodel) over 10 years ago

Can you upload a bad blob that doesn't require eval to load? I'm nervous about evalling a 20KB blob of text.

Updated by mml (McClain Looney) over 10 years ago

i wish i had something smaller, but i'm unable to reproduce the issue any other way, unfortunately.

FWIW, it's basically a Hash with hashes & stuff in it.

Updated by mml (McClain Looney) over 10 years ago

also, after hacking string.c, i'm able to Marshal.load this string, but Marshal.dump'ing the result doesn't produce the same evil string of nulls.

Updated by mml (McClain Looney) over 10 years ago

just verified that this tickles the 2.1 nightly snapshot as well.

Updated by drbrain (Eric Hodel) over 10 years ago

Size is not the issue, I don't really want to run eval on a random string. It's too hard to check for exploits.

Can you run this:

ruby -e 'File.write "badblob.raw", eval(File.read("badblob"))'

and attach that file?

Updated by mml (McClain Looney) over 10 years ago

No problem, sorry I didn't think of that first.

Updated by drbrain (Eric Hodel) over 10 years ago

Here is a short Marshal string (a Time) that reproduces the issue:

"\x04\bIu:\tTime\r&^\x1C\x80\x0E\xDCF\xA6\a:\voffseti\xFE\x90\x9D:\tzoneI"\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x06:\x06ET"

Updated by drbrain (Eric Hodel) over 10 years ago

  • Status changed from Open to Assigned
  • Assignee set to naruse (Yui NARUSE)

I'm unsure how you got a time zone full of NULL bytes. This feature was added in r38247, but I can't see how the zone could contain the NULL bytes based on the commit.

Since this feature was added by Yui NARUSE so I will assign it to him. I can't think of a good patch to fix it.

Updated by nobu (Nobuyoshi Nakada) over 10 years ago

  • Status changed from Assigned to Feedback

How was this data generated?
Using 3rd party's library or something?

Actions #15

Updated by nobu (Nobuyoshi Nakada) over 10 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r42596.
McClain, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


time.c: ignore invalid data

Updated by mml (McClain Looney) over 10 years ago

nobu (Nobuyoshi Nakada) wrote:

How was this data generated?
Using 3rd party's library or something?

The inputs consist of very badly malformed xml generally, though there's some malformed csv and json in there as well :) Other than that, the marshal strings were generated in the traditional way (Marshal.dump(obj)). There were a few other objects that triggered this issue as well, i'll verify that all of them were Time objects.

Updated by drbrain (Eric Hodel) over 10 years ago

  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: DONTNEED, 2.0.0: REQUIRED

By the way, I used the marshal-structure gem to discover the problem, along with adding a breakpoint in ruby to determine where in the stack the error occurred.

https://rubygems.org/gems/marshal-structure

Actions #18

Updated by nagachika (Tomoyuki Chikanaga) over 10 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby master to Backport200
  • Category deleted (core)
  • Status changed from Closed to Assigned
  • Assignee changed from naruse (Yui NARUSE) to nagachika (Tomoyuki Chikanaga)
Actions #19

Updated by nagachika (Tomoyuki Chikanaga) over 10 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r42912.
McClain, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 42596,42597,42598,42599: [Backport #8795]

* time.c (time_mload): ignore invalid offset and zone.
  [ruby-core:56648] [Bug #8795]

* time.c (time_mload): ignore auxiliary data, offset and zone, if
  invalid.  [ruby-core:56648] [Bug #8795]

* test/ruby/test_time.rb: use the in_timezone() helper
  and define it at the top with other helpers.
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0