Project

General

Profile

Actions

Bug #19288

open

Ractor JSON parsing significantly slower than linear parsing

Added by maciej.mensfeld (Maciej Mensfeld) over 1 year ago. Updated 7 months ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux]
[ruby-core:111526]

Description

a simple benchmark:

require 'json'
require 'benchmark'

CONCURRENT = 5
RACTORS = true
ELEMENTS = 100_000

data = CONCURRENT.times.map do
  ELEMENTS.times.map do
    {
      rand => rand,
      rand => rand,
      rand => rand,
      rand => rand
    }.to_json
  end
end

ractors = CONCURRENT.times.map do
  Ractor.new do
    Ractor.receive.each { JSON.parse(_1) }
  end
end

result = Benchmark.measure do
  if RACTORS
    CONCURRENT.times do |i|
      ractors[i].send(data[i], move: false)
    end

    ractors.each(&:take)
  else
    # Linear without any threads
    data.each do |piece|
      piece.each { JSON.parse(_1) }
    end
  end
end

puts result

Gives following results on my 8 core machine:

# without ractors:
  2.731748   0.003993   2.735741 (  2.736349)

# with ractors
12.580452   5.089802  17.670254 (  5.209755)

I would expect Ractors not to be two times slower on the CPU intense work.

Updated by Eregon (Benoit Daloze) over 1 year ago

It would be more fair to Ractor.make_shareable(data) first.
But even with that Ractor is slower:

no Ractor:
  2.748311   0.003002   2.751313 (  2.763541)
Ractor
  9.939530   5.816431  15.755961 (  4.289792)

This high system time seems strange.
Probably lock contention for allocations?

Updated by Eregon (Benoit Daloze) over 1 year ago

Also that script creates Ractors even in "linear" mode.
With the fixed script below:

  2.040496   0.002988   2.043484 (  2.048731)

i.e. it's also quite a bit slower if any Ractor is created.

Script:

require 'json'
require 'benchmark'

CONCURRENT = 5
RACTORS = ARGV.first == "ractor"
ELEMENTS = 100_000

data = CONCURRENT.times.map do
  ELEMENTS.times.map do
    {
      rand => rand,
      rand => rand,
      rand => rand,
      rand => rand
    }.to_json
  end
end

if RACTORS
  Ractor.make_shareable(data)

  ractors = CONCURRENT.times.map do
    Ractor.new do
      Ractor.receive.each { JSON.parse(_1) }
    end
  end
end

result = Benchmark.measure do
  if RACTORS
    CONCURRENT.times do |i|
      ractors[i].send(data[i], move: false)
    end

    ractors.each(&:take)
  else
    # Linear without any threads
    data.each do |piece|
      piece.each { JSON.parse(_1) }
    end
  end
end

puts result

Updated by luke-gru (Luke Gruber) over 1 year ago

I just took a look at this and it looks like the culprit is the c dtoa function that's called in the json parser, specifically a helper function Balloc. It uses a lock for some reason shrug.

Edit: It looks like in ruby's missing/dtoa.c, the lock function is a no-op. If that version of dtoa.c is used in your Ruby then it isn't that. My ruby is using the missing/dtoa.c and running the perf tool with this script it points to Balloc being the main issue. Something funny is going on in that Balloc function. I think it's the malloc() calls that are locking the malloc arena lock, and the lock contention is there, but that's just a guess.

Updated by maciej.mensfeld (Maciej Mensfeld) over 1 year ago

I find this issue important and if mitigated, it would allow me to release production-grade functionalities that would benefit users of the Ruby language.

I run an OSS project called Karafka (https://github.com/karafka/karafka) that allows for processing Kafka messages using multiple threads in parallel. For non-IO bound cases, the majority of the time of users whom use-cases I know is spent on data deserialization (> 80%). JSON is by far the most popular format that is also conveniently supported natively by Ruby. While providing true parallelism around the whole processing may not be easy due to a ton of synchronization around the whole process, the atomicity of messages deserialization makes it an ideal case of using Ractors.

  • Data can be sent there, and results can be transferred without interdependencies.
  • Each message is atomic; hence their deserialization can run in parallel.
  • All message deserialization requests can be sent to a generic queue from which Ractors could consume.

I am not an expert in the Ruby code, but if there is anything I could help with to move this forward, please just ping me.

Updated by luke-gru (Luke Gruber) over 1 year ago

I've notified the flori/json people (https://github.com/flori/json/issues/511)

So to update everyone, the dtoa function is called during json generation, not parsing. As this script does both, it's hard to measure it using perf tools. You have to run
the generation part of the script alone and look at it the perf report, then compare it against running the generation and the parsing (both with ractors and without).

Updated by luke-gru (Luke Gruber) over 1 year ago

Here's a simple reproduction showing that the problem is not send/receive:

RACTORS = ARGV.first == "ractor"
J = { rand => rand }.to_json
Ractor.make_shareable(J)
if RACTORS
  rs = []
  10.times.each do
    rs << Ractor.new do
      i = 0
      while i < 100_000
        JSON.parse(J)
        i+=1
      end
    end
  end
  rs.each(&:take)
else
  1_000_000.times do
    JSON.parse(J)
  end
end

The ractor example should take less time, but it doesn't.

Updated by Eregon (Benoit Daloze) over 1 year ago

maciej.mensfeld (Maciej Mensfeld) wrote in #note-4:

I find this issue important and if mitigated, it would allow me to release production-grade functionalities that would benefit users of the Ruby language.

Note that Ractor is far from production-ready.
It has many issues as can be found on this bug tracker and when using it and as the warning says (Also there are many implementation issues.).
Also the fact that the main Ruby test suites don't run any Ractor test in the same process also seems an indication of instability.

And then of course there is the issue that Ractor is incompatible with most gems/code out there.
While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.
Other Rubies have solved this in a much more efficient, usable and reliable way, by having no GVL.

Updated by maciej.mensfeld (Maciej Mensfeld) over 1 year ago

Note that Ractor is far from production-ready.

I am well aware. I just provide a justification and since my case seems to fit the limited scope of this functionality, I wanted to raise the attention.

While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.

This is exactly why I want to get a limited functionality that anyhow would allow me to parallelize the processing.

Other Rubies have solved this in a much more efficient, usable and reliable way, by having no GVL.

I am also aware of this :)

Updated by luke-gru (Luke Gruber) over 1 year ago

It has many issues as can be found on this bug tracker and when using it and as the warning says (Also there are many implementation issues.).

I think the implementation issues are solvable but the bigger picture issue of adoption is of course up in the air. IMO if are allowed to have an API, for example, of Ractor.disable_isolation_checks! { ... } for use around thread-safe code, that would be a big win in my book.

Also about the test-suite, I do want to add in-process ractor tests. I hope the ruby core team isn't against it.

Updated by maciej.mensfeld (Maciej Mensfeld) over 1 year ago

I think the implementation issues are solvable but the bigger picture issue of adoption is of course up in the air.

The first step to adoption is to have a case for it that could be used. I believe the case I presented is viable and should be considered.

Updated by luke-gru (Luke Gruber) about 1 year ago

This PR I made to JSON repository is related: https://github.com/flori/json/pull/512

Updated by duerst (Martin Dürst) about 1 year ago

Eregon (Benoit Daloze) wrote in #note-7:

And then of course there is the issue that Ractor is incompatible with most gems/code out there.
While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.
Other Rubies have solved this in a much more efficient, usable and reliable way, by having no GVL.

But don't other Rubies rely on the programmer to know how to program with threads? That's only usable if you're used to programming with threads and avoid the related issues. The idea (where the implementation and many gems may still have to catch up) behind Ractor is that thread-related issues such as data races can be avoided at the level of the programming model.

Updated by maciej.mensfeld (Maciej Mensfeld) about 1 year ago

And then of course there is the issue that Ractor is incompatible with most gems/code out there.
While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.

We need to start somewhere. Even if trivial/isolated cases work, if they work well, they can act as the first milestone to usage of this API for commercial benefits and I am willing to take the risk ;)

Updated by Eregon (Benoit Daloze) about 1 year ago

duerst (Martin Dürst) wrote in #note-12:

But don't other Rubies rely on the programmer to know how to program with threads? That's only usable if you're used to programming with threads and avoid the related issues. The idea (where the implementation and many gems may still have to catch up) behind Ractor is that thread-related issues such as data races can be avoided at the level of the programming model.

We're getting a bit off-topic, but I believe not necessarily. And the GIL doesn't prevent most Ruby-level threading issues, so in that matter it's almost the same on CRuby.
For example I would think many Ruby on Rails devs don't know well threading, and they don't need to, even though webservers like Puma use threads.
Deep knowledge of multithreading is needed e.g. when creating concurrent data structures, but using them OTOH doesn't require much.
I would think for most programmers, using threads is much easier and more intuitive than having the big limitations of Ractor which prevent sharing any state, especially in an imperative and stateful language like Ruby where almost everything is mutable.
IMO Ractors are way more difficult to use than threads. They can also have some sorts of race conditions due to message order, so it's not that much safer either. And it's a lot less efficient for any communication between ractors vs threads (Ractor copy or move both need a object graph walk).

Updated by maciej.mensfeld (Maciej Mensfeld) 7 months ago

I want to revisit our discussion about leveraging Ruby Ractors for parallel JSON parsing. It appears there hasn't been much activity on this thread for a long time.

I found it pertinent to mention that during the recent RubyKaigi conference, Koichi Sasada highlighted the need for real-life/commercial use-cases to showcase Ractors' potential. To that end, I wanted to bring forth that I do have a practical, commercial scenario. Karafka handles parsing of thousands or more of JSONs in parallel. Having Ractors support in such a context could substantially enhance performance, providing a tangible benefit to the end users.

Given this real-life use case, are there any updates or plans to continue work on allowing Ractors to operate faster in the presented-by-me scenario? It would indeed be invaluable for many of users working with Kafka in Ruby. While the end-user processing of data still will have to happen in a single Ractor, parsing seems like a great example where immutable raw payload can be shipped to independent ractors and frozen deserialized payloads can be shipped back.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0