Bug #9473
closedCorruption and Segmentation faults all over
Description
We're in the process of moving from Rails 2.3 to 3.2 (both running on Ruby 1.9.3-p484)
In this process we've run into a snag where we're seeing errors crop up within 2-3 hours of taking production traffic (or replays thereof with siege). We cannot be certain that these errors would not occur with rails 2.3, however they appear more quickly and pervasively in the 3.2 branch.
These corruptions sometimes appear as: (in places where these errors are highly improbable if not impossible):
"string contains null byte"
ActiveModel::MissingAttributeError "missing attribute: ..."
"undefined method `table_name' for false:FalseClass"
for example - this error doesn't make much/any sense:
string contains null byte
activesupport (3.2.16) lib/active_support/core_ext/class/attribute.rb:97:in `block in class_attribute'
As a result we've tried:
- Upgrading ruby 1.9.3 HEAD
- Removing our Garbage collection tweaks
- Turning on/off different areas of our codebase
- upgrading gems with C extensions
and run independent tests on most of these variables but haven't been able to isolate it.
We're assuming these spurious errors are also related to the segmentation faults we've been seeing. I've attached some examples.
The segfaults have happened all over the place including GC, compile, str_replace.
We've tried running against valgrind to identify a root cause and it indicates (on several reproductions) the first error in st.c:330 in st_lookup.
Files
Updated by drasch (David Rasch) over 10 years ago
And we've also gotten from valgrind:
==13233== Thread 5:
==13233== Invalid read of size 8
==13233== at 0x3F2B4326A6: __sigsetjmp (in /lib64/libc-2.12.so)
==13233== Address 0xcef8730 is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid read of size 8
==13233== at 0x3F2B4326CC: __sigsetjmp (in /lib64/libc-2.12.so)
==13233== Address 0xcef8730 is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid read of size 8
==13233== at 0x3F2B4326E1: __sigsetjmp (in /lib64/libc-2.12.so)
==13233== Address 0xcef8730 is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid read of size 4
==13233== at 0x3F2AC0DF98: _dl_fixup (in /lib64/ld-2.12.so)
==13233== Address 0xcef8718 is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid write of size 4
==13233== at 0x3F2AC0E09C: _dl_fixup (in /lib64/ld-2.12.so)
==13233== Address 0xcef871c is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid read of size 4
==13233== at 0x3F2AC0DFAD: _dl_fixup (in /lib64/ld-2.12.so)
==13233== Address 0xcef874c is not stack'd, malloc'd or (recently) free'd
Updated by drbrain (Eric Hodel) over 10 years ago
- Tracker changed from Backport to Bug
- Project changed from Backport193 to Ruby master
- Priority changed from 5 to Normal
Fixed project, tracker and priority
Updated by normalperson (Eric Wong) over 10 years ago
rasch@raschnet.com wrote:
- upgrading gems with C extensions
Can you reproduce this without C extensions?
Which C extensions do you run? Likely one of them is corrupting
memory, so it could be an odd/strange one somewhere..
It looks like one of them (Pool2/Implementation.cpp) is passenger,
so maybe try reproducing the error with unicorn?
Updated by drasch (David Rasch) over 10 years ago
We've been running further tests and when running our app under Unicorn instead of Passenger the problem hasn't occurred yet.
Updated by drasch (David Rasch) over 10 years ago
We've continued to see no crashes under Unicorn. We've done further testing but aren't certain if this is a systemic issue w/ Passenger and our setup.
Updated by normalperson (Eric Wong) over 10 years ago
Interesting. Have you contacted the Passenger developers about this?
Anyways I'm happy unicorn is working well for you :)
Updated by hsbt (Hiroshi SHIBATA) over 8 years ago
- Status changed from Open to Third Party's Issue