Project

General

Profile

Actions

Feature #14383

closed

Making prime_division in prime.rb Ruby 3 ready.

Added by jzakiya (Jabari Zakiya) almost 4 years ago. Updated about 10 hours ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:84971]

Description

I have been running old code in Ruby 2.5.0 (released 2017.12.25) to check for
speed and compatibility. I still see the codebase in prime.rb hardly has
changed at all (except for replacing Math.sqrt with Integer.sqrt).

To achieve the Ruby 3 goal to make it at least three times faster than Ruby 2
there are three general areas where Ruby improvements can occur.

  • increase the speed of its implementation at the machine level
  • rewrite its existing codebase in a more efficient|faster manner
  • use faster algorithms to implement routines and functions

I want to suggest how to address the later two ways to improve performance of
specifically the prime_division method in the prime.rb library.

I've raised and made suggestions to some of these issues here
ruby-issues forum and now hope to invigorate additional discussion.

Hopefully with the release of 2.5.0, and Ruby 3 conceptually closer to reality,
more consideration will be given to coding and algorithmic improvements to
increase its performance too.

Mathematical correctness

First I'd like to raise what I consider math bugs in prime_division, in how
it handles 0 and -1 inputs.

> -1.prime_division
 => [[-1,1]]

> 0.prime_division
Traceback (most recent call last):
        4: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/bin/irb:11:in `<main>'
        3: from (irb):85
        2: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/prime.rb:30:in `prime_division'
        1: from /home/jzakiya/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/prime.rb:203:in `prime_division'
ZeroDivisionError (ZeroDivisionError)

First, 0 is a perfectly respectable integer, and is non-prime, so its output should be [],
an empty array to denote it has no prime factors. The existing behavior is solely a matter of
prime_division's' implementation, and does not take this mathematical reality into account.

The output for -1 is also mathematically wrong because 1 is also non-prime (and correctly
returns []), well then mathematically so should -1. Thus, prime_division treats -1 as
a new prime number, and factorization, that has no mathematical basis. Thus, for mathematical
correctness and consistency -1 and 0 should both return [], as none have prime factors.

> -1.prime_division
 => []

> 0.prime_division
 => []

> 1.prime_division
 => []

There's a very simple one-line fix to prime_division to do this:

# prime.rb

class Prime

  def prime_division(value, generator = Prime::Generator23.new)
    -- raise ZeroDivisionError if value == 0
    ++ return [] if (value.abs | 1) == 1

Simple Code and Algorithmic Improvements

As stated above, besides the machine implementation improvements, the other
areas of performance improvements will come from coding rewrites and better
algorithms. Below is the coding of prime_division. This coding has existed at
least since Ruby 2.0 (the farthest I've gone back).

# prime.rb

class Integer

  # Returns the factorization of +self+.
  #
  # See Prime#prime_division for more details.
  def prime_division(generator = Prime::Generator23.new)
    Prime.prime_division(self, generator)
  end

end

class Prime

  def prime_division(value, generator = Prime::Generator23.new)
    raise ZeroDivisionError if value == 0
    if value < 0
      value = -value
      pv = [[-1, 1]]
    else
      pv = []
    end
    generator.each do |prime|
      count = 0
      while (value1, mod = value.divmod(prime)
             mod) == 0
        value = value1
        count += 1
      end
      if count != 0
        pv.push [prime, count]
      end
      break if value1 <= prime
    end
    if value > 1
      pv.push [value, 1]
    end
    pv
  end

end

This can be rewritten in more modern and idiomatic Ruby, to become much shorter
and easier to understand.

require 'prime.rb'

class Integer
  def prime_division1(generator = Prime::Generator23.new)
    Prime.prime_division1(self, generator)
  end
end

class Prime

  def prime_division1(value, generator = Prime::Generator23.new)
    # raise ZeroDivisionError if value == 0
    return [] if (value.abs | 1) == 1
    pv = value < 0 ? [[-1, 1]] : []
    value = value.abs
    generator.each do |prime|
      count = 0
      while (value1, mod = value.divmod(prime); mod) == 0
        value = value1
        count += 1
      end
      pv.push [prime, count] unless count == 0
      break if prime > value1
    end
    pv.push [value, 1] if value > 1                 
    pv
  end

end

By merely rewriting it we get smaller|concise code, that's easier to understand,
which is slightly faster. A triple win! Just paste the above code into a 2.5.0
terminal session, and run the benchmarks below.

def tm; s=Time.now; yield; Time.now-s end

 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division }
[[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]]
 => 27.02951016

 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division1 }
[[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]]
 => 25.959149721

Again, we get a triple win to this old codebase by merely rewriting it. It can
be made 3x faster by leveraging the prime? method from the OpenSSL library to
perform a more efficient|faster factoring algorithm, and implementation.

require 'prime.rb'
require 'openssl'

class Integer

  def prime_division2(generator = Prime::Generator23.new)
    return [] if (self.abs | 1) == 1
    pv = self < 0 ? [-1] : []
    value = self.abs
    prime = generator.next
    until value.to_bn.prime? or value == 1
      while prime
        (pv << prime; value /= prime; break) if value % prime == 0
        prime = generator.next
      end
    end
    pv << value if value > 1
    pv.group_by {|prm| prm }.map{|prm, exp| [prm, exp.size] }
  end

end

Here we're making much better use of Ruby idioms and libraries (enumerable and
openssl), leading to a much greater performance increase. A bigger triple win.
Pasting this code into a 2.5.0 terminal session gives the following results.

# Hardware: System76 laptop; I7 cpu @ 3.5GHz, 64-bit Linux

def tm; s=Time.now; yield; Time.now-s end

 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division }
[[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]]
 => 27.02951016

 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division1 }
[[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]]
 => 25.959149721

 n = 500_000_000_000_000_000_008_244_213; tm{ pp n.prime_division2 }
[[3623, 1], [61283, 1], [352117631, 1], [6395490847, 1]]
 => 9.39650374

prime_division2 is much more usable for significantly larger numbers and use
cases than prime_division. I can even do multiple times better than this, if
you review the above cited forum thread.

My emphasis here is to show there are a lot of possible low hanging fruit
performance gains ripe for the picking to achieve Ruby 3 performance goals, if we
look (at minimum) for simpler|better code rewrites, and then algorithmic upgrades.

So the question is, are the devs willing to upgrade the codebase to provide the
demonstrated performance increases shown here for prime_division?


Files

bm.rb (1.94 KB) bm.rb mame (Yusuke Endoh), 01/25/2018 04:13 PM
Actions

Also available in: Atom PDF