Project

General

Profile

Actions

Bug #13851

closed

getting "can't modify string; temporarily locked" on non-frozen instances

Added by chucke (Tiago Cardoso) over 6 years ago. Updated over 4 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:82533]

Description

I'm doing some nonblocking IO, and using a local string for buffering to avoid allocations. Roughly the following

class A

  def initialize
    @buffer = String.new("", encoding: Encoding::BINARY)
  end

  def read(io)
     io.read_nonblock(16_384, @buffer, exception: false)
     # do stuff...
     @buffer.clear
  end

It all goes smoothly, mostly... until I start getting arbitrary error when calling #read_nonblock. Here's an example backtrace:

DEBUG: can't modify string; temporarily locked- - - [30/Aug/2017:13:15:09 +0300] "GET / 1.0" 200 - 0.0000
DEBUG: <internal:prelude>:76:in `__read_nonblock'
DEBUG: <internal:prelude>:76:in `read_nonblock'
DEBUG: /Users/user/Projects/project:NN in `read``
....

I can't unfortunately reproduce this in a deterministic way. I can only say that the read routine is called in multiple threads (different io's instances of that class). The errors come seldom, however.

I'm running ruby 2.3.4, and the frozen string literal is activated for the file where the class is defined. I first assumed that it was the buffer, but due to its initialization, it should be marken as not-frozen.

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

I can't unfortunately reproduce this in a deterministic way. I
can only say that the read routine is called in multiple
threads (different io's instances of that class). The errors
come seldom, however.

That's not good, and reading prelude.rb, io.c and string.c; I'm
not seeing how it's possible unless you have accidental sharing
going on. Can you reproduce the problem with 2.4 or trunk?

Even if not deterministic, can you write a standalone infinite
loop which can eventually reproduce this?

Another option may be to try thread-local storage for @buffer,
instead, if you never need the data long-term. IME, it will
save memory for C10K situations and help avoid accidental
sharing bugs (if indeed your fault :)

I'm running ruby 2.3.4, and the frozen string literal is
activated for the file where the class is defined. I first
assumed that it was the buffer, but due to its initialization,
it should be marken as not-frozen.

Right, I'm not seeing this, either...

Btw, I prefer to use (since Ruby 2.0+):

@buffer = ''.b

Rather than:

@buffer = String.new("", encoding: Encoding::BINARY)

...in your situation, as it is much shorter :)

Updated by chucke (Tiago Cardoso) over 6 years ago

I've been struggling to create a self-contained script to reproduce the problem, but can't seem to.

The problem is that I'm doing all socket monitoring in one thread, and read/write operations on another. Is there any thread-unsafety associated with such operations happening at the same time?

In my research, I've found this: https://bugs.ruby-lang.org/issues/8669 , which establishes a precedent (I also believe that my problem is a buffer being temporarily locked somewhere).

I've also firmly believe that the problem is the IO#read_nonblock(nread, buffer) API (the second argument). When I do the same logic by returning the read buffer (IO#read_nonblock(nread)), the error doesn't happen. Passing exception: false is irrelevant (error is raised with or without it).

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

The problem is that I'm doing all socket monitoring in one
thread, and read/write operations on another. Is there any
thread-unsafety associated with such operations happening at
the same time?

What operation(s) are you doing for "socket monitoring"?
We need to know exactly which methods you're using to know
if there's thread-safety problems with them.

Updated by chucke (Tiago Cardoso) over 6 years ago

What operation(s) are you doing for "socket monitoring"?

i'm using plain IO.select. So the idea is, I have a selector thread, which pushes ready descriptors to a queue; other threads pull from that queue, and read/write from descriptors; if must wait, send them back to selector thread.

In the middle of this process, under significant load, I'm seeing the aforementioned error under the aforementioned read_nonblock API usage.

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

What operation(s) are you doing for "socket monitoring"?

i'm using plain IO.select. So the idea is, I have a selector
thread, which pushes ready descriptors to a queue; other
threads pull from that queue, and read/write from descriptors;
if must wait, send them back to selector thread.

Is there any chance you're calling IO.select on an IO object
which already got pushed into the queue?

In other words, you may forget to remove an IO from the
IO.select arg set once it's "armed" and pushed into the queue;
so it could be in the queue twice, and used by two threads (or
more).

I just checked again: IO.select won't touch String buffers,
it'll only check the internal Ruby buffers which are protected
by GVL, anyways.

Finally, select(2) (and thus IO.select) behavior regarding FD
aliasing/recycling is (I believe to be) undefined and subject
to race conditions...

IO#close, #accept, IO.select, File.open all release the GVL;
which opens up some nasty race conditions especially when
the underlying select(2) syscall operates on an FD; while
IO.select operates on Ruby IO objects.

I can provide a more detailed explanation at another time; but
in short, you REALLY need to be careful with race conditions
around close, select and FD-allocating syscalls in MT code.
Especially since POSIX stipulates FD-allocating syscalls need to
use the lowest-numbered available FD (which is often the most
recently closed FD).

Updated by chucke (Tiago Cardoso) over 6 years ago

I've finally managed to isolate the issue, and found the issue: it was my error. Basically what was happening was that I was monitoring fds for read/write events; sometimes, the same fd returns for read/write, and in those loops, I was scheduling the fd twice, to separate workers, where they'd go through read logic around the same time. When not using the buffer, it would "just work", one read would return bytes, the other wouldn't, and both workers would continue; The same doesn't happen when passing a string as buffer (this might be locking the string internally).

I think this one is fixed. If I could complain about something, was that the error message wasn't very helpful. But I can't come up with a better one.

So, after I found that out, I noticed something really strange: Using the IO#read_nonblock(nread, buffer, exception: false) reduces the throughput of my solution significantly (it also uses up significantly more memory). When using both IO#read_nonblock(nread, exception: false) and IO#read_nonblock(nread, buffer), I get optimal performance.

This might be a bit off-topic. Should I open a new ticket with my findings?

Updated by chucke (Tiago Cardoso) over 6 years ago

Again, false alarm... The problem was that I wasn't checking the return value of IO#read_nonblock(nread, buffer, exception: false), thereby not checking if nil and terminating early (nil equals EOF). This was affecting the throughput significantly in my case.

Thx for the support!

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

Thanks for the followup, I'm glad we were able to help.

I think this one is fixed. If I could complain about
something, was that the error message wasn't very helpful. But
I can't come up with a better one.

shrug I can't think of one right now...

So, after I found that out, I noticed something really
strange: Using the IO#read_nonblock(nread, buffer, exception: false) reduces the throughput of my solution significantly
(it also uses up significantly more memory). When using both
IO#read_nonblock(nread, exception: false) and
IO#read_nonblock(nread, buffer), I get optimal performance.

It seems you've figured that out in in [ruby-core:82707],
but I think my earlier note about using thread-local storage
for short-lived buffers still applies. Per-socket buffers
(which you seem to be using) would use more object slots in
common situations.

This might be a bit off-topic. Should I open a new ticket with my findings?

Not speaking for ruby-core, but I don't care for assigning
tickets/classifications/priorities/etc and prefer a more
freeform approach.

(But then again, I hate Genre tags in my music collection
and nuke them at every opportunity)

Updated by duerst (Martin Dürst) over 6 years ago

On 2017/09/08 06:11, Eric Wong wrote:

wrote:

So, after I found that out, I noticed something really
strange: Using the IO#read_nonblock(nread, buffer, exception: false) reduces the throughput of my solution significantly
(it also uses up significantly more memory). When using both
IO#read_nonblock(nread, exception: false) and
IO#read_nonblock(nread, buffer), I get optimal performance.

It seems you've figured that out in in [ruby-core:82707],
but I think my earlier note about using thread-local storage
for short-lived buffers still applies. Per-socket buffers
(which you seem to be using) would use more object slots in
common situations.

This might be a bit off-topic. Should I open a new ticket with my findings?
If it's still relevant, then yes, please.

Regards, Martin.

Updated by chucke (Tiago Cardoso) over 6 years ago

but I think my earlier note about using thread-local storage for short-lived buffers still applies. Per-socket buffers (which you seem to be using) would use more object slots in common situations.

Buffers are indeed attached per socket, but I wouldn't say they are short-lived. Also, I don't have guarantees that the same thread will handle the same socket after a wait, so I can't follow your advice. But I can also say that I haven't seen a big performance degradation because of it (although I'd appreciate suggestions on tools that could help me inspect this).

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

Buffers are indeed attached per socket, but I wouldn't say
they are short-lived. Also, I don't have guarantees that the
same thread will handle the same socket after a wait, so I
can't follow your advice. But I can also say that I haven't
seen a big performance degradation because of it (although I'd
appreciate suggestions on tools that could help me inspect
this).

From your original post:

def read(io)
io.read_nonblock(16_384, @buffer, exception: false)

do stuff...

@buffer.clear
end

It looks like @buffer only get used for the duration of the
method.

For common HTTP (and other TCP-based) servers, often you can
read an entire request with one read(2) syscall, do all parsing,
and discard the buffer ASAP (before even performing expensive
request parsing or response writing).

I only need to create a longer-lived buffer when a request is
big or trickled due to slowness.

Updated by chucke (Tiago Cardoso) over 6 years ago

I see, now I understood what you meant. My approach was more to have a fixed-size buffer, read into it until full or socket is waiting, and then pass data to the parser. Your approach might be less resource-intensive though, thx for the heads-up.

Actions #13

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago

  • Status changed from Open to Closed
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0