Project

General

Profile

Feature #17261

Updated by ko1 (Koichi Sasada) 6 months ago

## Abstract 

 I propose Software transactional memory (STM) for threads and ractors. 

 Implementation is here: https://github.com/ruby/ruby/pull/3652 

 The interface is similar to concurrent-ruby, but not the same. 
 http://ruby-concurrency.github.io/concurrent-ruby/1.1.4/Concurrent/TVar.html 

 ## Basic concept 

 https://en.wikipedia.org/wiki/Software_transactional_memory 
 Transaction is popular idea on data base systems to keep state consistency. 

 STM is similar idea to implement optimistic synchronization strategy. 

 There are several advantages compare with traditional synchronization techniques like Mutex and so on: 

 * Performance: in some cases, it is faster because of optimistic nature. 
 * Composability: multiple locks can introduce dead-lock. STM allows nested transaction. In other words, (some kind of) STM can guarantee the progressiveness. 

 The disadvantages is, it can lead slow down on high-contention cases. 

 ## API 

 * `Thread::atomically do expr end`: make a new transaction and run `expr` in it. `expr` can be retried if the conflict is detected. 
 * `Thread::TVar.new(default_value)` 
 * `Thread::TVar#value`: get current value of TVar 
 * `Thread::TVar#value = val`: set TVar value `val`. 
 * `Thread::TVar#increment(n=1)`: Just same as `Thread.atomically{ tv.value += 1 }`. 

 Note that `expr` for `Thread.atomically` can retries and all `TVar#value=` (set TVar values) are reverted before retries. Another operations such as other memory modification, IO operations includes network operations etc are not reverted. 

 The very difference between `Concurrent::TVar` is: 

 * TVar only refer to shareable objects to support Ractor. 
 * `TVar#value=` should be used with `atomically`. We can define as `Thread.atomically{ tv.value = val }`, but it can lead misusing without `atomically`. 
 * `TVar#increment` is special case to allow setting without `atomically` to support typical single counter cases. 

 ## Implementation 

 https://github.com/ruby/ruby/pull/3652 

 The implementation is almost same as TL2, lock-based STM with global version clock with pthread/win32 threads. 
 We can use atomic operations but not supported yet (but only a few performance benefit on my measuremnets). 

 ## Example 

 ```ruby 
 N = 1_000_000 

 tv1 = Thread::TVar.new(0) 
 tv2 = Thread::TVar.new(0) 

 r1 = Ractor.new tv1, tv2 do |tv1, tv2| 
   loop do 
     Thread.atomically do 
       v1, v2 = tv1.value, tv2.value 
       raise if v1 != v2 
     end 
   end 
 end 

 rs = 3.times.map do 
   Ractor.new tv1, tv2 do |tv1, tv2| 
     N.times do 
       Thread.atomically do 
         tv1.value += 1 
         tv2.value += 1 
       end 
     end 
   end 
 end 

 rs.each{|r| r.take} 
 p [tv1.value, tv2.value] #=> [3000000, 3000000] 
 ``` 

 In this case,  

 * all `atomically` blocks keep consistency that `tv1.value == tv2.value`. 
 * the results `[3000000, 3000000]` shows consistency on `+=1`. 

 Here is famous bank-account example: 


 ```ruby 
 class Account 
   COUNT = Thread::TVar.new 0 

   def initialize deposit = 0 
     @i = COUNT.increment 
     @balance = Thread::TVar.new(deposit) 
   end 

   def transfer_from acc, n 
     Thread::atomically do 
       acc.withdraw n 
       self.deposit n 
     end 
   end 

   def transfer_to acc, n 
     Thread::atomically do 
       self.withdraw n 
       acc.deposit n 
     end 
   end 

   def withdraw n 
     @balance.value -= n 
   end 

   def deposit n 
     @balance.value += n 
   end 

   def balance 
     @balance.value 
   end 
 end 

 AN =    1_0000 
 N = 10_000_000 
 RN = 10 
 iter = 0 
 accs = AN.times.map{Account.new.freeze}.freeze 

 require 'benchmark' 

 # :forward 
 #     two ractors operate N times: a[i].transfer(a[i+1]) 
 #       R1: a1->a2, a2->a3, ... 
 #       R2: a1->a2, a2->a3, ... 

 # :reverse 
 #     two ractors operate N times: a[i].transfer(a[i+1]), 
 #     but the oroder of accounts are reversed. 
 #       R1: a1->a2, a2->a3, ... 
 #       R2: a1->aN-1, a2->aN-2, ... 

 # :shuffle 
 #     RN ractors operate N times: a[rand].transfer(a[rand]) 
 #     It simulates normal bank-operation 

 mode = :shuffle 

 loop do 
   iter += 1 

   btime = Time.now 

   case mode 
   when :forward 
     rs = [] 

     rs << Ractor.new(accs) do |accs| 
       N.times{|i| 
         a1, a2 = accs[i%accs.size], accs[(i+1)%accs.size] 
         a1.transfer_to(a2, 1) 
       } 
     end 

     rs << Ractor.new(accs) do |accs| 
       N.times{|i| 
         a1, a2 = accs[i%accs.size], accs[(i+1)%accs.size] 
         a1.transfer_from(a2, 1) 
       } 
     end 

     rs.each{|r| r.take} 

   when :reverse 
     rs = [] 

     rs << Ractor.new(accs) do |accs| 
       N.times{|i| 
         a1, a2 = accs[i%accs.size], accs[(i+1)%accs.size] 
         a1.transfer_to(a2, 1) 
       } 
     end 

     rs << Ractor.new(accs.reverse.freeze) do |accs| 
       N.times{|i| 
         a1, a2 = accs[i%accs.size], accs[(i+1)%accs.size] 
         a1.transfer_from(a2, 1) 
       } 
     end 

     rs.each{|r| r.take} 

   when :shuffle 
     RN.times.map{ 
       Ractor.new(accs) do |accs| 
         rnd = Random.new 
         N.times{ 
           a1 = accs.sample random: rnd 
           a2 = accs.sample random: rnd 
           redo if a1 == a2 
           a1.transfer_to(a2, rnd.rand(1000)) 
         } 
       end 
     }.each{|r| r.take} 

   else 
     raise 
   end 

   sum = accs.inject(0){|r, acc| acc.balance + r} 
   if sum != 0 
     pp accs 
     raise "iter #{iter} sum:#{sum}" 
   end 

   etime = Time.now 
   p time: etime - btime 

   # break 
 end 

 ``` 

 This program create AN bank accounts and repeat N transafer operations. 
 You can observe that huge AN reduces conflicts and the execution time is low. Small AN reduces conflicts -> many retries and the execution time is high. 

 ``` 
      AN      Execution time (s)    Retry counts 
     100                  6.914          958,969 
   1_000                  3.107          186,267 
  10_000                  2.549           26,183 
 100_000                  2.627            2,458 
 ``` 

 Now x10 retries doesn't affect execution time x10, this is because the current Ractor implementation (acquiring a global lock to raise an exception, and it reduces the retry counts). If we improve the Ractor's implementation, the result would be more worse. 


 ## Consideration 

 ### `Thread.atomically` in ractors 

 At first, I implemented this feature with `Ractor::atomically` and `Ractor::TVar`. 
 However, this STM feature will help the thread programming. 
 This is why I moved from `Ractor::atomically` to `Thread::atomically`. 

 Introduce `Concurrent` namespace what concurrent-ruby are using. However, there are small differences so that I'm not sure is is feasible. 

 Another idea is to support alias: `Thread.atomically` and `Ractor.atomically`. 

 ### `Thread::TVar` can refer only shareable objects 

 Threads can access all objects so we don't need to restrict by such rule. 
 However, to support ractors, this restriction is needed. 

 One idea is separate `Thread::TVar` and `Ractor::TVar`, but it can introduce confusion. 
 Only Also thread programs should be more thread-safe with shareable objects, thread programs become more thread-safe, so I think it is good choice to have current restriction. 

 ### Bug detection 

 Similar to locking, we can forget to use a `atomically` like that: 

 ```ruby 
 class C 
   def initialize 
     @tv1 = Thread::TVar.new(0) 
     @tv2 = Thread::TVar.new(0) 
   end 
   def tv1() = @tv1.value 
   def tv2() = @tv2.value 
   def tv1 = (v) 
     Thread.atomically{ @tv1.value = v } 
   end 
   def tv2 = (v) 
     Thread.atomically{ @tv2.value = v } 
   end 
 end 

 obj = C.new 
 obj.tv1 += 1 
 obj.tv2 += 2 
 ``` 

 It works but it can introduce inconsistency if tv1 and tv2 are tightly coupled with because tv1 and tv2 are not accessed in the same transaction. 
 If tv1 and tv2 need to be modified consistently, we need to write like the following: 

 ```ruby 
 Thread.atomically do 
   obj.tv1 += 1 
   obj.tv2 += 1 
 end 
 ``` 

 and `tv1/tv2/tv1=/tv2=` methods should not be defined. 

 I mean we can write bad programs easily. 

 It is same situation with traditional locking (we need to use `Mutex` appropriately). The duty to use it correctly is for programmer. 

 There are some advantages compared with traditional locking: 

 * We can concentrate on TVars. On traditional thread programming we need to check all memory state. 
 * We can introduce logging mechanism and we can find wrong usage (for example: tv1 and tv2 are set within independent transactions). I think we can make some checker based on the log. On traditional thread programming, there are several similar works, but it is difficult to check it because the target of state is most of memory operations. 

 ## Related works 

 * There are many STM implementation techniques. https://www.morganclaypool.com/doi/abs/10.2200/S00070ED1V01Y200611CAC002 
 * Concurrent Haskell and Clojure are famous to support STM in language (I think). 
   * The model of STM is similar to Clojure. 
     * Clojure allows to access TVar (`ref` in Clojure) value without `atomically` (`dosync` in Clojure). 
     * Clojure doesn't allow to set TVar value without `atomically`. 
   * The API is similar to Concurrent Haskell (`TVar` and `atomically`. 
 * Concurrent-ruby has `Concurrent::TVar`. 
   * But it allows to have an unshareable object. 
   * But is allows to set the value with `atomically`. 

Back