Project

General

Profile

Actions

Feature #13219

closed

bug in Math.sqrt(n).to_i, to compute integer squareroot, new word to accurately fix it

Added by jzakiya (Jabari Zakiya) almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:79623]

Description

In doing a math application using Math.sqrt(n).to_i to compute integer squareroots
of integers I started noticing errors for numbers > 10**28.

I coded an algorithm that accurately computes the integer squareroot for arbirary sized numbers
but its significantly slower than the math library written in C.

Thus, I request a new method Math.intsqrt(n) be created, that is coded in C and part of the
core library, that will compute the integer squareroots of integers accurately and fast.

Here is working highlevel code to accurately compute the integer squareroot.

def intsqrt(n)
  bits_shift = (n.to_s(2).size)/2 + 1
  bitn_mask = root = 1 << bits_shift
  while true
    root ^= bitn_mask if (root * root) > n
    bitn_mask >>= 1
    return root if bitn_mask == 0
    root |= bitn_mask
  end
end

def intsqrt1(n)
  return n if n | 1 == 1   # if n is 0 or 1
  bits_shift = (Math.log2(n).ceil)/2 + 1
  bitn_mask = root = 1 << bits_shift
  while true
    root ^= bitn_mask if (root * root) > n
    bitn_mask >>= 1
    return root if bitn_mask == 0
    root |= bitn_mask
  end
end

require "benchmark/ips"

Benchmark.ips do |x|
  n = 10**40
  puts "integer squareroot tests for n = #{n}"
  x.report("intsqrt(n)"       ) { intsqrt(n)  }
  x.report("intsqrt1(n)"      ) { intsqrt1(n) }
  x.report("Math.sqrt(n).to_i") { Math.sqrt(n).to_i }
  x.compare!
end

Here's why it needs to be done in C.

2.4.0 :178 > load 'intsqrttest.rb'
integer squareroot tests for n = 10000000000000000000000000000000000000000
Warming up --------------------------------------
          intsqrt(n)     5.318k i/100ms
         intsqrt1(n)     5.445k i/100ms
   Math.sqrt(n).to_i   268.281k i/100ms
Calculating -------------------------------------
          intsqrt(n)     54.219k (± 5.5%) i/s -    271.218k in   5.017552s
         intsqrt1(n)     55.872k (± 5.4%) i/s -    283.140k in   5.082953s
   Math.sqrt(n).to_i      5.278M (± 6.1%) i/s -     26.560M in   5.050707s

Comparison:
   Math.sqrt(n).to_i:  5278477.8 i/s
         intsqrt1(n):    55871.7 i/s - 94.47x  slower
          intsqrt(n):    54219.4 i/s - 97.35x  slower

 => true 
2.4.0 :179 > 

Here are examples of math errors using Math.sqrt(n).to_i run on Ruby-2.4.0.

2.4.0 :101 > n = 10**27; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c   
1000000000000000000000000000
31622776601683
999999999999949826038432489
31622776601683
999999999999949826038432489
31622776601683
999999999999949826038432489
 => nil 
2.4.0 :102 > n = 10**28; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c 
10000000000000000000000000000
100000000000000
10000000000000000000000000000
100000000000000
10000000000000000000000000000
100000000000000
10000000000000000000000000000
 => nil 
2.4.0 :103 > n = 10**29; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c 
100000000000000000000000000000
316227766016837
99999999999999409792567484569
316227766016837
99999999999999409792567484569
316227766016837
99999999999999409792567484569
 => nil 
2.4.0 :104 > n = 10**30; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c  
1000000000000000000000000000000
1000000000000000
1000000000000000000000000000000
1000000000000000
1000000000000000000000000000000
1000000000000000
1000000000000000000000000000000
 => nil 
2.4.0 :105 > n = 10**31; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c 
10000000000000000000000000000000
3162277660168379
9999999999999997900254631487641
3162277660168379
9999999999999997900254631487641
3162277660168379
9999999999999997900254631487641
 => nil 
2.4.0 :106 > n = 10**32; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c 
100000000000000000000000000000000
10000000000000000
100000000000000000000000000000000
10000000000000000
100000000000000000000000000000000
10000000000000000
100000000000000000000000000000000
 => nil 
2.4.0 :107 > n = 10**33; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c
1000000000000000000000000000000000
31622776601683793
999999999999999979762122758866849
31622776601683793
999999999999999979762122758866849
31622776601683792
999999999999999916516569555499264
 => nil 
2.4.0 :108 > n = 10**34; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c
10000000000000000000000000000000000
100000000000000000
10000000000000000000000000000000000
100000000000000000
10000000000000000000000000000000000
100000000000000000
10000000000000000000000000000000000
 => nil 
2.4.0 :109 > n = 10**35; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c
100000000000000000000000000000000000
316227766016837933
99999999999999999873578871987712489
316227766016837933
99999999999999999873578871987712489
316227766016837952
100000000000000011890233980627554304
 => nil 
2.4.0 :110 > n = 10**36; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c
1000000000000000000000000000000000000
1000000000000000000
1000000000000000000000000000000000000
1000000000000000000
1000000000000000000000000000000000000
1000000000000000000
1000000000000000000000000000000000000
 => nil 
2.4.0 :111 > n = 10**37; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = Math.sqrt(n).to_i), c*c
10000000000000000000000000000000000000
3162277660168379331
9999999999999999993682442519108007561
3162277660168379331
9999999999999999993682442519108007561
3162277660168379392
10000000000000000379480317059650289664
 => nil 
2.4.0 :112 > 

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

On my 64-bit Linux OS laptop with an I7 cpu, I tested that Math.sqrt(n).to_i gives correct
answers from (0..9_999_899_999_899_999_322_536_673_279). This was also the same on JRuby-9.1.7.0.

Thus, I can create a hybrid method to maximize speed and accuracy like this.

    def sqrt_i(n)
      if n <= 9_999_899_999_899_999_322_536_673_279
        Math.sqrt(n).to_i
      else
        intsqrt1(n)
      end
    end

    def intsqrt1(n)
      return n if n | 1 == 1
      bits_shift = (Math.log2(n).ceil)/2 + 1
      bitn_mask = root = 1 << bits_shift
      while true
        root ^= bitn_mask if (root * root) > n
        bitn_mask >>= 1
        return root if bitn_mask == 0
        root |= bitn_mask
      end
    end

Test results below.

2.4.0 :415 > n = 9999899999899999322536673279; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = sqrt_i(n)), c*c, (d = Math.sqrt(n).to_i), d*d    
9999899999899999322536673279
99999499998249
9999899999899801751003066001
99999499998249
9999899999899801751003066001
99999499998249
9999899999899801751003066001
99999499998249
9999899999899801751003066001
 => nil 
2.4.0 :416 > n = 9999899999899999322536673279 - 1; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = sqrt_i(n)), c*c, (d = Math.sqrt(n).to_i), d*d  
9999899999899999322536673278
99999499998249
9999899999899801751003066001
99999499998249
9999899999899801751003066001
99999499998249
9999899999899801751003066001
99999499998249
9999899999899801751003066001
 => nil 
2.4.0 :417 > n = 9999899999899999322536673279 + 1; puts n, (a = intsqrt(n)), a*a, (b = intsqrt1(n)), b*b, (c = sqrt_i(n)), c*c, (d = Math.sqrt(n).to_i), d*d 
9999899999899999322536673280
99999499998249
9999899999899801751003066001
99999499998249
9999899999899801751003066001
99999499998249
9999899999899801751003066001
99999499998250
9999899999900001750003062500
 => nil 

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

Since this method is designed to operate on integers it would be nice to make it
a class Integer method, then you can do this: 4829391.sqrt_i
(or whatever name it's given), which allows you to easily do method chaining.

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

This is the C version of the algorithm.
I put the part before the start of the while loop of the ruby
version to make it applicable for arbitrary size integers.

http://www.codecodex.com/wiki/Calculate_an_integer_square_root

    unsigned int sqrt32(unsigned long n)  
    {  
        unsigned int c = 0x8000;  
        unsigned int g = 0x8000;  
      
        for (;;) {  
            if (g*g > n) {
                g ^= c;
            }
            
            c >>= 1;  
            
            if (c == 0) {
                return g;
            }
              
            g |= c;  
        }  
    } 

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

I just realized class Integer has a bit_length instance_method.
It makes the code a bit simpler, easier to understand, and faster.

def intsqrt(n)
  bits_shift = (n.to_s(2).size)/2 + 1
  bitn_mask = root = 1 << bits_shift
  while true
    root ^= bitn_mask if (root * root) > n
    bitn_mask >>= 1
    return root if bitn_mask == 0
    root |= bitn_mask
  end
end

def intsqrt1(n)
  return n if n | 1 == 1
  bits_shift = (Math.log2(n).ceil)/2 + 1
  bitn_mask = root = 1 << bits_shift
  while true
    root ^= bitn_mask if (root * root) > n
    bitn_mask >>= 1
    return root if bitn_mask == 0
    root |= bitn_mask
  end
end

def intsqrt2(n)
  bits_shift = (n.bit_length)/2 + 1
  bitn_mask = root = 1 << bits_shift
  while true
    root ^= bitn_mask if (root * root) > n
    bitn_mask >>= 1
    return root if bitn_mask == 0
    root |= bitn_mask
  end
end

require "benchmark/ips"

Benchmark.ips do |x|
  n = 10**40
  puts "integer squareroot tests for n = #{n}"
  x.report("intsqrt(n)"       ) { intsqrt(n)  }
  x.report("intsqrt1(n)"      ) { intsqrt1(n) }
  x.report("intsqrt2(n)"      ) { intsqrt2(n) }
  x.report("Math.sqrt(n).to_i") { Math.sqrt(n).to_i }
  x.compare!
end

It's a bit faster.

2.4.0 :299 > load 'intsqrttest.rb'
integer squareroot tests for n = 10000000000000000000000000000000000000000
Warming up --------------------------------------
          intsqrt(n)     5.495k i/100ms
         intsqrt1(n)     5.679k i/100ms
         intsqrt2(n)     5.759k i/100ms
   Math.sqrt(n).to_i   261.831k i/100ms
Calculating -------------------------------------
          intsqrt(n)     56.563k (± 5.9%) i/s -    285.740k in   5.069484s
         intsqrt1(n)     58.567k (± 6.1%) i/s -    295.308k in   5.062216s
         intsqrt2(n)     59.769k (± 6.1%) i/s -    299.468k in   5.029980s
   Math.sqrt(n).to_i      5.377M (± 6.7%) i/s -     26.969M in   5.039343s

Comparison:
   Math.sqrt(n).to_i:  5377357.4 i/s
         intsqrt2(n):    59768.9 i/s - 89.97x  slower
         intsqrt1(n):    58566.6 i/s - 91.82x  slower
          intsqrt(n):    56563.4 i/s - 95.07x  slower

 => true 

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

This works well in my code.
Easier to code and read with accurate results.

class Integer
    def sqrt_i
      self <= MAX_RANGE ? Math.sqrt(self).to_i : intsqrt(self)
    end

    private
    MAX_RANGE = 9_999_899_999_899_999_322_536_673_279

    def intsqrt(n)
      bits_shift = (n.bit_length)/2 + 1
      bitn_mask = root = 1 << bits_shift
      while true
        root ^= bitn_mask if (root * root) > n
        bitn_mask >>= 1
        return root if bitn_mask == 0
        root |= bitn_mask
      end
    end
end
2.4.0 :020 > n = 10**111; puts n, (a = n.sqrt_i), a*a, (b = Math.sqrt(n).to_i), b*b
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
31622776601683793319988935444327185337195551393252168268
999999999999999999999999999999999999999999999999999999963630737732541472910196882732895436793219183483386119824
31622776601683794595141492119969113562259413928748515328
1000000000000000080647728865639515055517417465817645783945119006952630720180836471433888096459369651964250947584

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

A new version intsqrt3 is one line shorter, and more intuitive (to me)
but there is a neglible difference in speed versus intsqrt2 in Ruby.
Maybe it's faster in C?

def intsqrt(n)
  bits_shift = (n.to_s(2).size)/2 + 1
  bitn_mask = root = 1 << bits_shift
  while true
    root ^= bitn_mask if (root * root) > n
    bitn_mask >>= 1
    return root if bitn_mask == 0
    root |= bitn_mask
  end
end

def intsqrt1(n)
  return n if n | 1 == 1
  bits_shift = (Math.log2(n).ceil)/2 + 1
  bitn_mask = root = 1 << bits_shift
  while true
    root ^= bitn_mask if (root * root) > n
    bitn_mask >>= 1
    return root if bitn_mask == 0
    root |= bitn_mask
  end
end

def intsqrt2(n)
  bits_shift = (n.bit_length)/2 + 1
  bitn_mask = root = 1 << bits_shift
  while true
    root ^= bitn_mask if (root * root) > n
    bitn_mask >>= 1
    return root if bitn_mask == 0
    root |= bitn_mask
  end
end

def intsqrt3(n)
  bits_shift = (n.bit_length)/2 + 2
  root, bitn_mask = 0, (1 << bits_shift)
  until (bitn_mask >>= 1) == 0
    root |= bitn_mask
    root ^= bitn_mask if (root * root) > n
  end
  root
end

require "benchmark/ips"

Benchmark.ips do |x|
  n = 10**40
  puts "integer squareroot tests for n = #{n}"
  x.report("intsqrt(n)"       ) { intsqrt(n)  }
  x.report("intsqrt1(n)"      ) { intsqrt1(n) }
  x.report("intsqrt2(n)"      ) { intsqrt2(n) }
  x.report("intsqrt3(n)"      ) { intsqrt3(n) }
#  x.report("Math.sqrt(n).to_i") { Math.sqrt(n).to_i }
  x.compare!
end
2.4.0 :391 > load 'intsqrttest.rb'
integer squareroot tests for n = 10000000000000000000000000000000000000000
Warming up --------------------------------------
          intsqrt(n)     5.621k i/100ms
         intsqrt1(n)     5.890k i/100ms
         intsqrt2(n)     5.879k i/100ms
         intsqrt3(n)     5.872k i/100ms
Calculating -------------------------------------
          intsqrt(n)     58.255k (± 6.9%) i/s -    292.292k in   5.041702s
         intsqrt1(n)     59.962k (± 6.8%) i/s -    300.390k in   5.033585s
         intsqrt2(n)     60.283k (± 6.4%) i/s -    305.708k in   5.092434s
         intsqrt3(n)     60.120k (± 6.7%) i/s -    305.344k in   5.102386s

Comparison:
         intsqrt2(n):    60283.1 i/s
         intsqrt3(n):    60119.7 i/s - same-ish: difference falls within error
         intsqrt1(n):    59962.4 i/s - same-ish: difference falls within error
          intsqrt(n):    58254.7 i/s - same-ish: difference falls within error

 => true 

Updated by shevegen (Robert A. Heiler) almost 8 years ago

I can not evaluate whether the above is correct or not (not saying that it is not,
I just simply do not know), but I believe that method names such as "intsqrt2" or
"intsqrt" may possibly not be an ideal choice, due to the name alone.

Math.sqrt() is already a bit shortish, adding more abbreviations, in particular
something such as "int" as part of it, would be even more peculiar.

If it would be a bug, as behaviour, why should Math.sqrt() itself not be able
to deal with it? Internally it could call any other method before it outputs
a result.

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago

Robert A. Heiler wrote:

If it would be a bug, as behaviour, why should Math.sqrt() itself not be able
to deal with it? Internally it could call any other method before it outputs
a result.

Expected results differ.
Math.sqrt() is expected to return a Float with much precision as possible.
While the proposed method will return an Integer which does not exceed the real square root.

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

To be clear, I am not saying Math.sqrt has a bug in it.
As stated, it produces a floating point result, which inherently has finite (not infinite) precision.

The "bug", or more accurately, the realization of the limitations of using Math.sqrt(n).to_i to
produce correct results for integers past a certain size, led me to suggest creating a new method
which will produce correct results for arbitrary sized integers (i.e. the largest integer value for
root such that root*root <= n).

This need emanated for me in numerical applications pertaining to generating very large prime numbers.

As I previously suggested, this method should most naturally be a class Integer method, and after much
consideration I like the method name sqrt_i, versus intsqrt, isqrt, etc. This makes its usage as - n.sqrt_i
super simple, and idiomatic, and would give an error message if applied to a non-Integer number/object,
to not confuse its use with Math.sqrt(n), which inherently produces a float value.

I think this name is more ruby idiomatic, because the name of the function is first, and the x_i
extension is consistent with other naming conventions to denote class Type, like to_i, to_f, to_s.

The algorithm I provided (both C and Ruby) show the method is very simple to code, and would be very fast
if coded in C, using its native shift operators, etc. If I had any significant C chops I would try to do it myself. :-)

Also, providing a fast native implementation of this function makes Ruby more attractive for numerical applications,
especially for Number Theory and Cryptography, and also enhances the Ruby 3x3 goals of making it at least 2x faster
than Ruby 2.0.

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

After further testing I found the same errors when using Math.sqrt(n).to_i with
large number when using (n**(1.0/2)).to_i. This reinforces to me the need to provide
sqrt_i (by whatever name) so users won't fall prey to this undocumented anomaly.

2.4.0 :129 > n= 10**35; puts n, a = intsqrt4(n), a*a, b = Math.sqrt(n).to_i, b*b, c = (n**(1.0/2)).to_i, c*c
100000000000000000000000000000000000
316227766016837933
99999999999999999873578871987712489
316227766016837952
100000000000000011890233980627554304
316227766016837952
100000000000000011890233980627554304
 => nil 
2.4.0 :130 >

I also realized that the algorithm used to determine the integer squareroot can be easily
generalized to correctly generate integer nth roots for arbitrary size integers.

This capability will now allows Ruby to be used for many Number Theory and Cryptographic problems,
such as generating correct integer values for eliptic curves, real integer roots of polynomials,
public key, et al, encryption, for very large number sets.

Because of this, I am going to add this capability to my roots rubygem.

https://rubygems.org/gems/roots

It has two (2) methods root and roots which generate all the n roots of an nth root for
all Numeric types (integers, floats, complex, rational).

In order to not clash with my proposed class Integer method name of sqrt_i as a Ruby core method,
I will name two more class Integer methods iroot2 and irootn(n) to be added to roots.
Since finding squareroots is much more frequently done than other roots I give it its own method name.
Below is the code for doing this (which I may refine).

These methods will return the correct nth integer real (not complex) root for positive integers, and
for odd roots of negative integers.

class Integer

  def irootn(n)
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    sign = self < 0 ? -1 : 1
    num  = self.abs
    bits_shift = (num.bit_length)/n + 2
    root, bitn_mask = 0, (1 << bits_shift)
    until (bitn_mask >>= 1) == 0
      root |= bitn_mask
      root ^= bitn_mask if root**n > num
    end
    root*sign
  end

  def iroot2; irootn(2) end

end

As with generating integer squareroots, doing (num**(1.0/n)).to_i produces incorrect answers for
integers past some maximun value, as shown below.

2.4.0 :129 > n = 10**37; puts n, a = n.irootn(3), a**3, b = (n**(1.0/3)).to_i, b**3
10000000000000000000000000000000000000
2154434690031
9999999999987694380850132072511299791
2154434690031
9999999999987694380850132072511299791
 => nil 
2.4.0 :130 > n = 10**38; puts n, a = n.irootn(3), a**3, b = (n**(1.0/3)).to_i, b**3
100000000000000000000000000000000000000
4641588833612
99999999999949657815157877549034676928
4641588833612
99999999999949657815157877549034676928
 => nil 
2.4.0 :131 > n = 10**39; puts n, a = n.irootn(3), a**3, b = (n**(1.0/3)).to_i, b**3
1000000000000000000000000000000000000000
10000000000000
1000000000000000000000000000000000000000
9999999999999
999999999999700000000000029999999999999
 => nil 


2.4.0 :134 > n = 10**112; puts n, a = n.irootn(4), a**4, b = (n**(1.0/4)).to_i, b**4
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
10000000000000000000000000000
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
9999999999999999583119736832
9999999999999998332478947328000104273492291412559539763672807300805169073432752214389506884609933472640769458176
 => nil 
2.4.0 :135 >

These methods are also preferable because they work correctly for negative integers, while the exponential root
method throws errors.

2.4.0 :150 > n = (10**112); puts n, a = n.irootn(3), a**3
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
21544346900318837217592935665193504952
9999999999999999999999999999999999999173635536969138804543654476122863556031297471619838179917400017798906449408
 => nil 
2.4.0 :151 > n = (10**112); puts n, b = (n**(1.0/3)).to_i, b**3
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
21544346900318734919579678222552399872
9999999999999857552405189045342271937486394088257690663380520853991701500619736949767614488813401623055562702848
 => nil 
2.4.0 :152 > n = -(10**112); puts n, a = n.irootn(3), a**3
-10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
-21544346900318837217592935665193504952
-9999999999999999999999999999999999999173635536969138804543654476122863556031297471619838179917400017798906449408
 => nil 
2.4.0 :153 > n = -(10**112); puts n, b = (n**(1.0/3)).to_i, b**3
RangeError: can't convert 1.077217345015937e+37+1.865795172362055e+37i into Integer
        from (irb):153:in `to_i'
        from (irb):153
        from /home/jzakiya/.rvm/rubies/ruby-2.4.0/bin/irb:11:in `<main>'
2.4.0 :154 > 

Again, creating C coded core versions of these methods not only will be more efficient but faster.

Updated by stomar (Marcus Stollsteimer) almost 8 years ago

To clarify further the limitations of using Float here:

n = 10**35
Math.sqrt(n).to_i                # => 316227766016837952
Math.sqrt(n).next_float.to_i     # => 316227766016838016

BigDecimal / BigMath should be able to produce the correct results, but possibly slower(?) than a specialized integer method:

require "bigdecimal"
n = 10**35
r = BigDecimal(n).sqrt(20).to_i  # => 316227766016837933
r ** 2                           # => 99999999999999999873578871987712489
(r+1) ** 2                       # => 100000000000000000506034404021388356

Addendum:

These methods are also preferable because they work correctly for negative integers, while the exponential root method throws errors.

To be precise: "correct" in your sense (i.e. returning a real number); the "exponential root" itself does work correctly -- returning a complex root is just fine --, the exception is raised by trying to convert a complex number into an integer. You could easily avoid this specific problem by calculating -(|n|^(1/3)).

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

One if the really nice things about Ruby is its intent to make users/programmers happy,
and to adhere to the "principle of least surprises (pols)".

I don't think it was the intent of Matz to create math methods that give correct results
some of the time, but not tell users under what conditions they wouldn't give correct results.
Unless you had my use cases, it was probably never tested to cover giving correct results for them.

I have identified verified incorrect results that I have seen no documentation warning users of.
I have presented a verified method to always produce correct results, not only for the initial
case for integer squareroots for arbitrary size integers, but now also for any real nth integer
root for arbitrary sized integers.

As I single user, I have solved these inherent errors in using these native math functions for
my use cases. So I now know what I have to do so I will get correct answers in my applications.

I now will also include these methods in my roots gem so if others need/want them they are available.

I think it would be in the best interest of Ruby users for the language to provide them the methods
I have presented to create correct results for their use cases, which should be fairly easy to code
and include in the core language, and documented accordingly for users, versus messing with the
inherent floating point methods and figuring out how to coerce them into producing correct integer
results for every possible arbitrary sized integer, or forcing users to figure out how to do so.

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

class Integer
  def irootn(n)
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    num  = self.abs
    bits_shift = (num.bit_length)/n + 2
    root, bitn_mask = 0, (1 << bits_shift)
    until (bitn_mask >>= 1) == 0
      root |= bitn_mask
      root ^= bitn_mask if root**n > num
    end
    root *= (self < 0 ? -1 : 1)
  end

  def iroot2; irootn(2) end
end


require "bigdecimal"
require "benchmark/ips"

Benchmark.ips do |x|
  n = 10**35
  puts "integer squareroot tests for n = #{n}"
  x.report("iroot2"       ) { n.iroot2    }
  x.report("irootn(2)"    ) { n.irootn(2) }
  x.report("BigDecimal(n).sqrt(5 ).to_i") { BigDecimal(n).sqrt(5 ).to_i }
  x.report("BigDecimal(n).sqrt(10).to_i") { BigDecimal(n).sqrt(10).to_i }
  x.report("BigDecimal(n).sqrt(20).to_i") { BigDecimal(n).sqrt(20).to_i }
  x.compare!
end

Yes, its much slower, even to the highlevel Ruby versions.

2.4.0 :201 > load 'irootstest.rb'
integer squareroot tests for n = 100000000000000000000000000000000000
Warming up --------------------------------------
              iroot2     5.681k i/100ms
           irootn(2)     5.714k i/100ms
BigDecimal(n).sqrt(5 ).to_i
                         3.021k i/100ms
BigDecimal(n).sqrt(10).to_i
                         2.953k i/100ms
BigDecimal(n).sqrt(20).to_i
                         2.616k i/100ms
Calculating -------------------------------------
              iroot2     57.825k (± 3.3%) i/s -    289.731k in   5.016021s
           irootn(2)     57.462k (± 3.7%) i/s -    291.414k in   5.078940s
BigDecimal(n).sqrt(5 ).to_i
                         30.543k (± 2.8%) i/s -    154.071k in   5.048265s
BigDecimal(n).sqrt(10).to_i
                         30.709k (± 3.1%) i/s -    153.556k in   5.005239s
BigDecimal(n).sqrt(20).to_i
                         26.725k (± 3.0%) i/s -    136.032k in   5.094723s

Comparison:
              iroot2:    57825.2 i/s
           irootn(2):    57461.9 i/s - same-ish: difference falls within error
BigDecimal(n).sqrt(10).to_i:    30708.9 i/s - 1.88x  slower
BigDecimal(n).sqrt(5 ).to_i:    30543.4 i/s - 1.89x  slower
BigDecimal(n).sqrt(20).to_i:    26725.0 i/s - 2.16x  slower

 => true 
2.4.0 :202

And you need to know beforehand the needed correct precision to display the correct results.

2.4.0 :214 > n = 10**35; puts n, n.iroot2, BigDecimal(n).sqrt(5).to_i, BigDecimal(n).sqrt(10).to_i,BigDecimal(n).sqrt(20).to_i   
100000000000000000000000000000000000
316227766016837933
316227766016837933
316227766016837933
316227766016837933
 => nil 
2.4.0 :215 > n = 10**45; puts n, n.iroot2, BigDecimal(n).sqrt(5).to_i, BigDecimal(n).sqrt(10).to_i,BigDecimal(n).sqrt(20).to_i   
1000000000000000000000000000000000000000000000
31622776601683793319988
31622776601683666666666
31622776601683666666666
31622776601683793319988
 => nil 
2.4.0 :216 > n = 10**55; puts n, n.iroot2, BigDecimal(n).sqrt(5).to_i, BigDecimal(n).sqrt(10).to_i,BigDecimal(n).sqrt(20).to_i   
10000000000000000000000000000000000000000000000000000000
3162277660168379331998893544
3162277660168379331499021527
3162277660168379331499021527
3162277660168379331998893544
 => nil 
2.4.0 :217 > n = 10**65; puts n, n.iroot2, BigDecimal(n).sqrt(5).to_i, BigDecimal(n).sqrt(10).to_i,BigDecimal(n).sqrt(20).to_i   
100000000000000000000000000000000000000000000000000000000000000000
316227766016837933199889354443271
316227766016837466536394723986322
316227766016837466536394723986322
316227766016837933199889000000000
 => nil 
2.4.0 :218 > 

Updated by stomar (Marcus Stollsteimer) almost 8 years ago

@Jabari Note that my comments were not meant as arguments against your feature request, I only wanted to point out the reasons for the observed behavior. Personally, I always welcome improved math functionality; however, I'm not informed enough to know whether this should be in the stdlib or not.

There certainly is a problem with the docs for Math.sqrt, which is incomplete (return value types are not given), misleading (it promises to avoid rounding errors), and actually doesn't match the actual behavior:

$ ri Math::sqrt

(from ruby site)
------------------------------------------------------------------------------
  sqrt(a)

------------------------------------------------------------------------------

Computes the square root of a.  It makes use of Complex and Rational to
have no rounding errors if possible.

  Math.sqrt(4/9)      # => 2/3
  Math.sqrt(- 4/9)    # => Complex(0, 2/3)
  Math.sqrt(4.0/9.0)  # => 0.666666666666667

Based on this you would for instance expect the square root of an integer square number to be returned as an (exact) integer or rational; but actually it's a float:

Math.sqrt(9)  # => 3.0  ("no rounding errors if possible" ?!?)

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

No, no, I didn't take it as a negative comment per se,
I just hadn't tried to use BigDecimal before, so I just went ahead and
created tests to see for myself how they performed (accuracy and speed).

Ruby has a much nicer community compared to some others, which is why
I've tried to document the problem, and a solution for it, in great detail.

I want to see Ruby become more used in mathematical and numerical applications
like Python is now becoming. Thus, it needs to be as accurate as possible, and
then, of course, fast (enough).

For users (new and old) using Ruby for advanced mathematical/numerical applications,
and the new rave of "big data" things, I don't want it developing a reputation of
producing erroneous results, especially compared to Python, et al, which maybe don't
produce these errors (and I have no idea how inherently inaccurate Python is).

Here is a section of the README.md file for my new version of my roots gem I'll be releasing.

## Description

Starting with Roots 2.0.0 (2017-2-20) the methods **iroot2** and **irootn** were added.
They are instance_methods of **class Integer** that will accurately compute the real value 
squareroot and nth_root for arbitrary sized Integers.

If you have an application where you need the exact correct integer real value for roots,
especially for arbitrary sized (large) integers, use these methods instead or **root|s**.
These methods work strictly in the Integer domain, and do not first create floating point
approximations of these roots and then convert them to integers.

They are useful for applications in pure math, Number Theory, Cryptography, etc, where you
need to find the exact real integer roots of polynomials, encryption functions, etc.

The methods **root** and **roots** work for all the **Numeric** classes (integers, floats,
complex, rationals), and produce floating point results.  They, thus, produce floating
point approximations to the exact **Integer** values for real roots for arbitrary sized integers.

Usage

**iroot2**

Use syntax:  ival.iroot2
Return the largest Integer +root+ of Integer ival such that root**2 <= ival
A negative ival will result in 'nil' being returned.

 9.iroot2 => 3
-9.iroot2 => nil

120.iroot2 => 10
121.iroot2 => 11
121.iroot2.class => Integer

(10**10).iroot2 => 100000
(10**10).root(2) => 100000.0
(10**10).roots(2)[0] = 100000.0+0.0i

(10**11).iroot2 => 316227
(10**11).root(2) => 316227.76601684
(10**11).roots(2)[0] = 316227.76601684+0.0i

(10**110).iroot2 => 10000000000000000000000000000000000000000000000000000000
(10**110).root(2) => 1.0e+55
(10**110).roots(2)[0] = 1.0e+55+0.0i

(10**111).iroot2 => 31622776601683793319988935444327185337195551393252168268
(10**111).root(2) => 3.1622776601683795e+55
(10**111).roots(2)[0] = 3.1622776601683795e+55+0.0i

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

Here's the README.md writeup on the general integer roots algorithm.

At the cpu level, all these bit operations -- >>, <<, |, ^ -- are one clock cycle instructions,
so you see this can be very fast if done in C. Maybe for arbitrary size numbers that can't
reside in a single cpu register they take longer, but they would still be faster than high level Ruby.

For integer numbers:

For binary based (digital) computers (cpu), all integers are represented as
binary numbers. To find the root "n" of an integer "num" we can use the following
process to find the largest integer nth "root" such that root**n <= num.

For an integer composed of 'b' bits an nth root 'n' will have at most (b/n + 1) bits.

For squareroots, n = 2:
For 9 = 0b1001, it has b = 4 bits, and its squareroot has at most (4/2 + 1) = 3 bits.
The squareroot of 9 is 3 = 0b11, which is 2 bits.

For 100 = 0b1100100, b = 7, and its squareroot has at most (7/2 + 1) = 4 bits.
The squareroot of 100 is 10 = 0b1010, which is 4 bits.

For cuberoots, n = 3:
For 8 = 0b1000, it has b = 4 bits, and its cuberoot has at most (4/3 + 1) = 2 bits.
The cuberoot of 8 is 2 = 0b10, which is 2 bits.

For 125 = 0b1111101, b = 7, and its cuberoot has at most (7/3 + 1) = 3 bits.
The cuberoot of 125 is 5 = 0b101, which is 3 bits.

Algorithm:
           bits_shift = (num.bits.length)/n + 1  # determine max number of root bits
           bitn_mask = 1 << bits_shift           # set value for max bit position of root
           root = 0                              # initialize the value for root
           until bitn_mask == 0                  # step through all the bit positions for root
             root |= bitn_mask                   # set current bit position to '1' in root
             root ^= bitn_mask if root**n > num  # set it back to '0' if root too large
             bitn_mask >>= 1                     # set bitn_mask to next smaller bit position
           end
           root                                  # return exact integer value for root

Updated by akr (Akira Tanaka) almost 8 years ago

Is there many application for this method?
Concrete examples may help to persuade matz.

Another important problem is the method name.
If we use same name for same functionality in other languages/libraries,
it makes learning this method easier.
What's the name of this function in other languages/libraries?
I found isqrt in Common Lisp and Tcl, mpz_sqrt in GMP.
Others?

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

I updated my roots rubygem to 2.0.0 to include iroot2 and irootn.

https://rubygems.org/gems/roots
https://github.com/jzakiya/roots

Updated by Student (Nathan Zook) almost 8 years ago

I just noticed this thread. One bit per cycle methods are not fast by modern methods, they are quite slow. I will attempt to have something more concrete within 24 hours.

Nathan Zook

Updated by Student (Nathan Zook) almost 8 years ago

You might want to consider the following articles:
https://www.reddit.com/r/algorithms/comments/1zt63v/fast_algorithm_to_calculate_integer_square_root/
Which lead me to wikipedia:
https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Binary_numeral_system_.28base_2.29
(You might want to start a bit higher in the article to get the context.)
and also to Zimmerman's implementation:
https://gmplib.org/repo/gmp-5.1/file/c5010c039373/mpn/generic/sqrtrem.c

The wikipedia article implements a faster bit-by-bit computation. Zimmerman implements the standard solution, which converges quadratically.

Looking at your comments, however, you seem really to be interested in a high performance multiprecision library for ruby. Is there any particular reason that we should not take an existing C library and drop a wrapper around it?

Updated by Student (Nathan Zook) almost 8 years ago

I am confused by your solution in comment #3. By your own report, there is no issue for numbers much larger than a single word. But this code is clearly appropriate only for single-word inputs.

Updated by matz (Yukihiro Matsumoto) almost 8 years ago

It's OK to add the functionality. Is Integer.sqrt(n) OK for you?

Matz.

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago

Yukihiro Matsumoto wrote:

It's OK to add the functionality. Is Integer.sqrt(n) OK for you?

When n is not an Integer (but a Numeric), is it truncated first then sqrt?

Updated by tompng (tomoya ishida) almost 8 years ago

using Newton's method might be another good way to implement it.

def intsqrt_newton(n)
  raise if n<0
  return n if n<=1
  r = 1<<((n.bit_length+1)/2)
  # r*r >= n
  while true # r*r-n >= 2*r
    r2 = r-(r*r-n)/(2*r)
    break if r2 == r
    r=r2
  end
  # r*r-n < 2*r
  # (r-1)**2 < n+1
  # √n-1 < r-1 <= √n < √(n+1)
  r-1
end

Updated by stomar (Marcus Stollsteimer) almost 8 years ago

Edit: please ignore (misunderstanding)

I think Integer.sqrt or Numeric.sqrt is extremely problematic. I guess everybody would expect it to be an alias for Math::sqrt and behave like this:

# (probably) expected behavior

2.sqrt(2)       # => 1.4142135623730951
10.sqrt(2)      # => 3.1622776601683795
(6.25).sqrt(2)  # => 2.5

# proposed behavior for this feature

10.sqrt(2)      # => 3

# _not_ part of this request

(6.25).sqrt(2)  # 2 ??? 2.5 ???

Moreover, sqrt (square root) implies n = 2, so sqrt(3), sqrt(4) etc. is contradictory.

Updated by stomar (Marcus Stollsteimer) almost 8 years ago

For square roots, isqrt seems to be an option with some(?) spread, see https://en.wikipedia.org/wiki/Integer_square_root. It would look like this:

26.isqrt  # => 5
30.isqrt  # => 5

For higher roots, maybe iroot(n) could be an option:

26.iroot(3)  # => 2
30.iroot(3)  # => 3

(However, I'm not sure whether the request goes beyond square roots.)

Updated by akr (Akira Tanaka) almost 8 years ago

Marcus Stollsteimer wrote:

I think Integer.sqrt or Numeric.sqrt is extremely problematic. I guess everybody would expect it to be an alias for Math::sqrt and behave like this:

No. Integer.sqrt is a class method.

Integer.sqrt(1000) #=> 31

It is clealy different from "Math.sqrt(n)" (or "sqrt(n)" after "include Math").

The benefit of Integer.sqrt is that it doesn't introduce a new word which expands
vocabulary of ruby.
If there is common name for this function, expanding the vocabulary is not a big problem, though.

Also, Integer.sqrt(n) is square root.
So, n-th root should be a different method:
Integer.root(m, n) for n-th root, for example.

Updated by matz (Yukihiro Matsumoto) almost 8 years ago

Marcus, probably you are confusing Integer.sqrt() and Integer#sqrt().
The former, you call Integer.sqrt(4) # => 2 and the latter, you call 4.sqrt # => 2.
We are talking about the former.

Matz.

Updated by Student (Nathan Zook) almost 8 years ago

Nobuyoshi Nakada wrote:

Yukihiro Matsumoto wrote:

It's OK to add the functionality. Is Integer.sqrt(n) OK for you?

When n is not an Integer (but a Numeric), is it truncated first then sqrt?

The beauty of this particular problem is that it does not matter. ;)

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

Let me attempt to address some of the issues/questions that have recently been presented.
Excuse me if any of this redundant, at the time I am writing.

  1. Fastest algorithm

After extensive searching and testing, the fastest and most flexible algorithm
to exactly compute not only the integer squareroot, but also any root, is the
one I'll just name the "binary bit method" (bbm). See benchmarks below of code.

The methods iroot2 and irootn are now in my roots gem and use bbm.

I encourage people to take this code and include any other proposed methods to
create a reference test suite for assessing performance (assuming the other methods
correctly compute the exact root values).

class Integer
  def irootn(n)
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    num  = self.abs
    bits_shift = (num.bit_length)/n + 2
    root, bitn_mask = 0, (1 << bits_shift)
    until (bitn_mask >>= 1) == 0
      root |= bitn_mask
      root ^= bitn_mask if root**n > num
    end
    root *= (self < 0 ? -1 : 1)
  end

  def iroot2; irootn(2) end
end


def intsqrt7(n)  # Newton's method
  x = n
  y = (n + 1)/2
  while y < x
    x = y
    y = (x + n/x)/2
  end
  x
end

# from http://www.codecodex.com/wiki/Calculate_an_integer_square_root
def intsqrt8(square)  
    # Check our input  
    square = square.to_i # force integer  
    return 0 if square == 0 # quick exit  
    raise RangeError if square < 0  
    # Actual computation  
    n = iter(1, square)  
    n1 = iter(n, square)  
    n1, n = iter(n1, square), n1 until n1 >= n - 1  
    n1 = n1 - 1 until n1*n1 <= square  
    return n1  
end  
def iter(n, square) (n+(square/n))/2 end

require "bigdecimal"
require "benchmark/ips"

Benchmark.ips do |x|
  n = 10**40
  puts "integer squareroot tests for n = #{n}"
  x.report("iroot2"     ) { n.iroot2    }
  x.report("irootn(2)"  ) { n.irootn(2) }
  x.report("newtons"    ) { intsqrt7(n) }
  x.report("codecodex"  ) { intsqrt8(n) }
  x.compare!
end

All will correctly compute the exact integer squareroot, but bbm is clearly faster.

2.4.0 :024 > load 'irootstest.rb'
integer squareroot tests for n = 10000000000000000000000000000000000000000
Warming up --------------------------------------
              iroot2     4.368k i/100ms
           irootn(2)     4.375k i/100ms
             newtons     3.491k i/100ms
           codecodex     2.880k i/100ms
Calculating -------------------------------------
              iroot2     43.969k (± 3.0%) i/s -    222.768k in   5.071034s
           irootn(2)     43.990k (± 3.1%) i/s -    223.125k in   5.077264s
             newtons     35.369k (± 2.9%) i/s -    178.041k in   5.038327s
           codecodex     29.093k (± 3.0%) i/s -    146.880k in   5.053165s

Comparison:
           irootn(2):    43990.3 i/s
              iroot2:    43969.0 i/s - same-ish: difference falls within error
             newtons:    35369.0 i/s - 1.24x  slower
           codecodex:    29093.5 i/s - 1.51x  slower

 => true 
  1. It is well understood in numerical analysis, Number Theory, cryptography, etc,
    an integer nth_root n is the largest integer such that root*n <= num. It is not a
    floating point value, nor just the truncation of a floating point value. A
    calculator "root" is, by definition, a floating point value, and is an approximation
    of some possible exact value, to the precision of the system to produce it.

Other languages have at least the equivalent of an isqrt function that work
strictly on integers, not floats, for the reasons given above.

  1. Usefulness in advanced mathematical and numerical analysis fields.

Most people/programmers probably first (and mostly) think of Ruby as a Rails (web)
language. Whereas Python is being promoted and used widely now in a number of
mathematical and numerical analysis based projects.

https://jupyter.org/
https://ipython.org/
https://github.com/jupyterhub/jupyterhub
http://www.sagemath.org/
https://en.wikipedia.org/wiki/SageMath

I think Ruby is way easier to use (in general) for these applications, but it doesn't
have anywhere near the breadth of libraries and focus from its user base to be applied
to them. So Python is being used, basically by default, because it's easier to program
in than C/C++, and being made fast enough to satisfy more users. I think as a focus for
Ruby 3x3 it can start to create more necessary primitive methods to give it more creed
to be applicable to these fields.

  1. roots as primitive (machine level) methods.

Implementing the bbm algorithm as a primitive method can be at least 1-3 orders
faster than the high level implementations shown in irooot2 and irootn.

The following is why, with computing squareroot, but conceptually applies to all roots too.

For a cpu with n even bits (32, 64) an integer squareroot will only be composed of at most
n/2 bits. So a 64 bit cpu can retain in one register the squareroot of a 128 bit number,
i.e. a value of 2**128 - 1 = 340,282,366,920,938,463,463,374,607,431,768,211,455. For an
arbitrary sized integer it will take take (log2(num).ceil)/64 +1 cpu mem locations to
represent the number.

So at the primitive level, it would first determine how many register length chunks the
maximum root value would need. If it needed only one register (a num < 2128), then all
the bbm process is super simple and fast. If the root value takes more than one register,
the primitive algorithm would start with the MSchunk, do the bbm, and switch in the lower
chunks until finished. For roots > 2, this becomes even faster because any nth_root n requires at
most (num.bit_length/n + 1) bit to represent it. So the cube (3) root can be fit in 64 bits for
any integer < 2
(64*3) = 2**192 = 62,77,101,735,386,680,763,835,789,423,207,666,416,102,355,444,464,034,512,896

No other algorithm has the flexibility and efficiency of the bbm because it requires no
additions, subtractions, or divisions, only a multiplication via an exponentiation. And that
can be done efficiently for large numbers with an appropriate algorithm (better than brute force).
All the other math (besides the > and == comparisons) involve in place bit operations -- >>, <<, ^, |.
which at the cpu are one cycle (or better) operations.

I hope this has sufficiently helped to address these issues/questions.

Updated by stomar (Marcus Stollsteimer) almost 8 years ago

Matz, akr:

sorry for the noise, it's of course completely clear. I only didn't read close enough and missed the switch from the proposed instance method and n-th root to a class method. Stupid.

Now it's also clear to me that you were only talking about square roots, not higher roots.

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

No, I am sorry for any confusion.

My initial concern was for accurately computing exact integer squareroots
for my immediate use case needs.

After finding the bbm algorithm for squareroots, and modifying it to
compute any nth-root, I realized this should be the primitive to create,
because the squareroot (n=2) is just a particular instance, which is the
most frequently used root.

Updated by Student (Nathan Zook) almost 8 years ago

Jabari Zakiya wrote:

Let me attempt to address some of the issues/questions that have recently been presented.
Excuse me if any of this redundant, at the time I am writing.

  1. Fastest algorithm

After extensive searching and testing, the fastest and most flexible algorithm
to exactly compute not only the integer squareroot, but also any root, is the
one I'll just name the "binary bit method" (bbm). See benchmarks below of code.

The methods iroot2 and irootn are now in my roots gem and use bbm.

I encourage people to take this code and include any other proposed methods to
create a reference test suite for assessing performance (assuming the other methods
correctly compute the exact root values).

First, I love the idea of giving ruby the ability to play in the MP space. Anything to allow this beautiful tool to be used more broadly must be a good thing. I very much support adding such support. (Yes, this is me volunteering.)

Having said this, to do so involves a number of issues. For instance, I just attempted to time Integer#prime? on a 4096-bit number. Got a floating point overflow. Same for 2048 bits. So, I tried 1024. It's still running. In fact, I expect my computer to have a mechanical breakdown before the routine returns true.

The articles which I linked included both a faster bit-by-bit algorithm, and mentioned Newton-Raphson-derived methods, which clearly converge with better O(). The given benchmark is of a 133-bit number. I would like to see timing numbers for 128, 512, 1024, 2048, and 4096-bit integers before suggesting that we are going there. Serious work looks a much larger integers, btw.

I'll see what I can whip up for benchmarks...

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

Hey that's great! I'd love to help work on this too, though
I'm not much of a C programmer. I can help with testing though.

Below are test results comparing the bbm algorithm used in
iroot2 and irootn versus the general Newton's method.

As you can see, for the range of values used, Newton becomes
much slower as the numbers gets bigger.

First bbm, for squareroot, is order O(log2(n)/2)), which I
think beats Newton. But also compared to Newton, all those
additions and divisions done in the inner loop become more
expensive (cpu intensive) as the numbers get bigger (taking care
of carries, etc), whereas the bbm merely does inplace bit operations.

You obviously can make Newton's mechanically better, but I suspect
it will never be faster on a real computer than bbm because of
its cpu work cost to implement. In fact my I7 cpu laptop's fan started
to kick in running these benchmarks, which I let cool down between runs.

Both methods did create the same correct results.

Test suite:

class Integer
  def irootn(n)
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    num  = self.abs
    bits_shift = (num.bit_length)/n + 2
    root, bitn_mask = 0, (1 << bits_shift)
    until (bitn_mask >>= 1) == 0
      root |= bitn_mask
      root ^= bitn_mask if root**n > num
    end
    root *= (self < 0 ? -1 : 1)
  end

  def iroot2; irootn(2) end
end


def intsqrt7(n)  # Newton's method
  x = n
  y = (n + 1)/2
  while y < x
    x = y
    y = (x + n/x)/2
  end
  x
end

require "benchmark/ips"

Benchmark.ips do |x|
  exp = 4000; n = 10**exp
  puts "integer squareroot tests for n = 10**#{exp}"
  x.report("iroot2"     ) { n.iroot2    }
  x.report("irootn(2)"  ) { n.irootn(2) }
  x.report("newtons"    ) { intsqrt7(n) }
  x.compare!
end

Benchmark results:

2.4.0 :042 > load 'irootstest.rb'
integer squareroot tests for n = 10**500
Warming up --------------------------------------
              iroot2    70.000  i/100ms
           irootn(2)    69.000  i/100ms
             newtons    43.000  i/100ms
Calculating -------------------------------------
              iroot2    636.901  (±14.9%) i/s -      3.150k in   5.089664s
           irootn(2)    676.828  (± 5.6%) i/s -      3.381k in   5.012279s
             newtons    427.424  (± 4.4%) i/s -      2.150k in   5.040686s

Comparison:
           irootn(2):      676.8 i/s
              iroot2:      636.9 i/s - same-ish: difference falls within error
             newtons:      427.4 i/s - 1.58x  slower

 => true 
2.4.0 :043 > load 'irootstest.rb'
integer squareroot tests for n = 10**1000
Warming up --------------------------------------
              iroot2    22.000  i/100ms
           irootn(2)    22.000  i/100ms
             newtons    14.000  i/100ms
Calculating -------------------------------------
              iroot2    228.501  (± 3.9%) i/s -      1.144k in   5.014841s
           irootn(2)    227.442  (± 4.8%) i/s -      1.144k in   5.042017s
             newtons    139.466  (± 3.6%) i/s -    700.000  in   5.027213s

Comparison:
              iroot2:      228.5 i/s
           irootn(2):      227.4 i/s - same-ish: difference falls within error
             newtons:      139.5 i/s - 1.64x  slower

 => true 
2.4.0 :044 > load 'irootstest.rb'
integer squareroot tests for n = 10**2000
Warming up --------------------------------------
              iroot2     6.000  i/100ms
           irootn(2)     6.000  i/100ms
             newtons     3.000  i/100ms
Calculating -------------------------------------
              iroot2     67.819  (± 2.9%) i/s -    342.000  in   5.049151s
           irootn(2)     68.311  (± 2.9%) i/s -    342.000  in   5.013356s
             newtons     38.863  (± 2.6%) i/s -    195.000  in   5.024956s

Comparison:
           irootn(2):       68.3 i/s
              iroot2:       67.8 i/s - same-ish: difference falls within error
             newtons:       38.9 i/s - 1.76x  slower

 => true 
2.4.0 :045 > load 'irootstest.rb'
integer squareroot tests for n = 10**4000
Warming up --------------------------------------
              iroot2     1.000  i/100ms
           irootn(2)     1.000  i/100ms
             newtons     1.000  i/100ms
Calculating -------------------------------------
              iroot2     17.380  (± 5.8%) i/s -     87.000  in   5.011300s
           irootn(2)     17.417  (± 5.7%) i/s -     87.000  in   5.000415s
             newtons      9.512  (± 0.0%) i/s -     48.000  in   5.050706s

Comparison:
           irootn(2):       17.4 i/s
              iroot2:       17.4 i/s - same-ish: difference falls within error
             newtons:        9.5 i/s - 1.83x  slower

 => true 
2.4.0 :046 > 

Updated by Student (Nathan Zook) almost 8 years ago

def isqrt(n)
    r = 0
    o = 1 << ((n.bit_length >> 1 ) << 1)
    while o != 0 do
        t = r + o
        if n >= t
            n -= t
            r = (r >> 1) + o
        else
            r >>= 1
        end
        o >>= 2
    end
    r
end

def bench(n)
  Benchmark.ips do |x|
    puts "integer squareroot tests for n = #{n}"
    x.report("iroot2"     ) { n.iroot2    }
    x.report("isqrt"       ) { isqrt(n) }
    x.compare!
  end
  nil
end

bench(41)
integer squareroot tests for n = 10 ** 41
Warming up --------------------------------------
              iroot2     3.264k i/100ms
               isqrt     3.088k i/100ms
Calculating -------------------------------------
              iroot2     33.749k (± 0.6%) i/s -    169.728k in   5.029342s
               isqrt     31.469k (± 1.4%) i/s -    157.488k in   5.005489s

Comparison:
              iroot2:    33748.7 i/s
               isqrt:    31469.0 i/s - 1.07x  slower

=> nil
irb(main):245:0> bench(500)
integer squareroot tests for n = 10 ** 500
Warming up --------------------------------------
              iroot2   100.000  i/100ms
               isqrt    97.000  i/100ms
Calculating -------------------------------------
              iroot2      1.002k (± 0.7%) i/s -      5.100k in   5.092307s
               isqrt    976.542  (± 0.8%) i/s -      4.947k in   5.066141s

Comparison:
              iroot2:     1001.6 i/s
               isqrt:      976.5 i/s - 1.03x  slower

=> nil
irb(main):246:0> bench(1000)
integer squareroot tests for n = 10 ** 1000
Warming up --------------------------------------
              iroot2    31.000  i/100ms
               isqrt    38.000  i/100ms
Calculating -------------------------------------
              iroot2    322.487  (± 1.6%) i/s -      1.643k in   5.095980s
               isqrt    392.904  (± 1.3%) i/s -      1.976k in   5.029970s

Comparison:
               isqrt:      392.9 i/s
              iroot2:      322.5 i/s - 1.22x  slower

=> nil
irb(main):247:0> bench(2000)
integer squareroot tests for n = 10 ** 2000
Warming up --------------------------------------
              iroot2     7.000  i/100ms
               isqrt    15.000  i/100ms
Calculating -------------------------------------
              iroot2     74.351  (± 1.3%) i/s -    378.000  in   5.084339s
               isqrt    154.343  (± 1.3%) i/s -    780.000  in   5.054739s

Comparison:
               isqrt:      154.3 i/s
              iroot2:       74.4 i/s - 2.08x  slower

=> nil
irb(main):248:0> bench(4000)
integer squareroot tests for n = 10 ** 4000
Warming up --------------------------------------
              iroot2     1.000  i/100ms
               isqrt     5.000  i/100ms
Calculating -------------------------------------
              iroot2     15.017  (± 0.0%) i/s -     76.000  in   5.061323s
               isqrt     56.217  (± 1.8%) i/s -    285.000  in   5.070483s

Comparison:
               isqrt:       56.2 i/s
              iroot2:       15.0 i/s - 3.74x  slower

=> nil

It is interesting to note that isqrt is actually slower in the low end--this is because the ruby interpreter is doing more per loop. This in spite of the fact that this method does not require multiplies!

But back to my earlier question: if we want folks to use Ruby for multi-precision work, why not encourage them to link to GMP? (And use GMP's sqrt?) We've supported this since 2.1. I know that the square root there will be much, much faster. If we want to implement a non-GMP solution (because of the license issue), then we have to decide how much work to put into it.

I've read Zimmerman's paper (he appears to be the author of the GMP code), https://hal.inria.fr/inria-00072854/document, and it does not look too hard to implement, assuming that we have the primitives already in place. Implementing the primitives would not be overly difficult, but again, we need to start asking just how far to go. Any serious MP work must drop down to asm, because of things like the carry bit and full-word multiplication.

I'm sorry to have confused you about Newton's method. The fast way to compute square roots using Newton's method is to compute 1/sqrt(n), then multiply by n. (This can be made to always give the correct answer.) This is called "Newton-Raphson" in the literature. It has been amazingly hard to find an mp integer implementation anywhere, and I'm good enough to know better than to trust something that I just put together.

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

Ah, these are good and interesting results Nathan. Thanks for posting.
It would be interesting to see how both faired as primitive implementations.
I will definitely study the code and play with it.

However bbm may still be the smallest/efficient/fastest way to exactly do
roots > 2, and can be used as the benchmark algorithm to test others against.
It certainly eliminates having to create specific optimal implementations for
individual roots > 2.

However, here's another reason this issue is important to the larger Ruby 3x3 goal.

In the library file prime.rb is the method Integer#prime?.

class Integer
...
...
  def prime?
    return self >= 2 if self <= 3
    return true if self == 5
    return false unless 30.gcd(self) == 1
    (7..Math.sqrt(self).to_i).step(30) do |p|
      return false if
        self%(p)    == 0 || self%(p+4)  == 0 || self%(p+6)  == 0 || self%(p+10) == 0 ||
        self%(p+12) == 0 || self%(p+16) == 0 || self%(p+22) == 0 || self%(p+24) == 0
    end
    true
  end
...
...

As in typical prime algorithms, you just check for the primes <= the number's squareroot.
Here Math.sqrt(self).to_i determines the integer squareroot, but as we know (and as I
found out, which initiated all this), as the numbers gets > ~10**28 , the approximation for
the squareroot becomes more and more larger than the actual value. This will cause prime?
to do increasingly more work than necessary as the numbers get bigger.
(Completely as an aside, for numbers this size you should probably look to use: n.to_bn.prime?)

In prime.rb I counted three places where Math.sqrt is used to find the integer squareroot.
Creating a primitive integer squareroot method would not only make these results exact, but smaller
as n gets bigger, increasing the accuracy and speed of any application which uses them.

It would be interesting to see what an audit of the entire Ruby code base would reveal where
this is used to provide the integer squareroot, versus the floating point one.

But also to Nathan's point, until I started looking for better alternatives to work with larger
numbers, I was totally unaware of the resources in the openssl library that can be applied
to arbitrary sized integers. Since these resource already exist within the Ruby ecosystem, maybe
a first step is to assess and test how well these existing resources meet these needs and
then standardize using them ubiquitously within the core library to optimize their performance.

From a users perspective, I first want accurate results, then speed. A fast (or slow) incorrect
result is of no value in doing mathematical or numerical analysis heavy applications.

Updated by tompng (tomoya ishida) almost 8 years ago

Newton seems to be faster than bbm, if the initial x is closer to √n.
when I use x=1<<((n.bit_length+1)/2) for the initial x in newton's method,
the iteration count (n=10**4000) became 11 (was 6656 when initial x is n).

I think the calculation cost is:

newton:              O(log(k) * k**1.585))
Nathan Zook's isqrt: O(k**2)
bbm:                 O(k * k**1.585))

k=log(n)
bignum mult: O(k**1.585) # Bignum uses karatsuba algorithm

benchmark

class Integer
  def irootn(n)
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    num  = self.abs
    bits_shift = (num.bit_length)/n + 2
    root, bitn_mask = 0, (1 << bits_shift)
    until (bitn_mask >>= 1) == 0
      root |= bitn_mask
      root ^= bitn_mask if root**n > num
    end
    root *= (self < 0 ? -1 : 1)
  end

  def iroot2; irootn(2) end
end


def intsqrt7(n)  # Newton's method
  x = n
  y = (n + 1)/2
  while y < x
    x = y
    y = (x + n/x)/2
  end
  x
end

def intsqrt7_with_good_initial_value(n)  # Newton's method with √n < initial_x <= 2*√n
  raise if n<0
  return n if n<=1
  x = 1<<((n.bit_length+1)/2)
  y = (x + n/x)/2
  while y < x
    x = y
    y = (x + n/x)/2
  end
  x
end

require "benchmark/ips"
raise unless 100000.times.all?{|i|intsqrt7_with_good_initial_value(i) == i.iroot2}
Benchmark.ips do |x|
  exp = 4000; n = 10**exp
  puts "integer squareroot tests for n = 10**#{exp}"
  x.report("iroot2"  ) { n.iroot2    }
  x.report("newtons" ) { intsqrt7(n) }
  x.report("newtons_fast" ) { intsqrt7_with_good_initial_value(n) }
  x.compare!
end

result

integer squareroot tests for n = 10**4000
Warming up --------------------------------------
              iroot2     1.000  i/100ms
             newtons     1.000  i/100ms
        newtons_fast    90.000  i/100ms
Calculating -------------------------------------
              iroot2     13.054  (±15.3%) i/s -     64.000  in   5.063850s
             newtons      2.455  (± 0.0%) i/s -     13.000  in   5.337181s
        newtons_fast    939.494  (± 7.3%) i/s -      4.680k in   5.022312s

Comparison:
        newtons_fast:      939.5 i/s
              iroot2:       13.1 i/s - 71.97x  slower
             newtons:        2.5 i/s - 382.74x  slower

Updated by akr (Akira Tanaka) almost 8 years ago

Jabari Zakiya wrote:

In the library file prime.rb is the method Integer#prime?.

class Integer
...
...
  def prime?
    return self >= 2 if self <= 3
    return true if self == 5
    return false unless 30.gcd(self) == 1
    (7..Math.sqrt(self).to_i).step(30) do |p|
      return false if
        self%(p)    == 0 || self%(p+4)  == 0 || self%(p+6)  == 0 || self%(p+10) == 0 ||
        self%(p+12) == 0 || self%(p+16) == 0 || self%(p+22) == 0 || self%(p+24) == 0
    end
    true
  end
...
...

As in typical prime algorithms, you just check for the primes <= the number's squareroot.
Here Math.sqrt(self).to_i determines the integer squareroot, but as we know (and as I
found out, which initiated all this), as the numbers gets > ~10**28 , the approximation for
the squareroot becomes more and more larger than the actual value. This will cause prime?
to do increasingly more work than necessary as the numbers get bigger.
(Completely as an aside, for numbers this size you should probably look to use: n.to_bn.prime?)

Math.sqrt(n) can be smaller than mathematical sqrt(n).

% ruby -ve '1.upto(50) {|i| j = 10**i; j2 = Math.sqrt(j*j).to_i; p [i, j2 - j] if j != j2 }'
ruby 2.5.0dev (2017-02-21 trunk 57675) [x86_64-linux]
[23, -8388608]
[24, -16777216]
[25, 905969664]
[26, 4764729344]
[27, 13287555072]
[28, -416880263168]
[29, -8566849142784]
[30, 19884624838656]
[31, -364103705034752]
[32, 5366162204393472]
[33, -54424769012957184]
[34, -544247690129571840]
[35, -3136633892082024448]
[36, 42420637374017961984]
[37, -461237341797878857728]
[38, -2251190176543965970432]
[39, -60290833628396821413888]
[40, 303786028427003666890752]
[41, 620008645040778319495168]
[42, 44885712678075916785549312]
[43, 139372116959414099130712064]
[44, -10985679223259661757684121600]
[45, -70242710975464448780069240832]
[46, -68601809640529787052340805632]
[47, 4384584304507619735463404765184]
[48, 43845843045076197354634047651840]
[49, -535097230524518206803127585210368]
[50, 7629769841091887003294964970946560]

So, Integer#prime? can underestimate the limit to search factors.

Assume i is 1000000000000000031.
I think (ii).prime? returns true.
1000000000000000031 is not searched because
Math.sqrt(i
i).to_i is 1000000000000000000.
(I couldn't complete (i*i).prime? because it takes too long time.)

% ruby -e 'i=1000000000000000031; p i, Math.sqrt(i**2).to_i'
1000000000000000031
1000000000000000000

I feel this is a good evidence that we need integer sqrt.

It would be interesting to see what an audit of the entire Ruby code base would reveal where
this is used to provide the integer squareroot, versus the floating point one.

Simple search result of 'sqrt(.*to_i' using gem-codesearch.
https://github.com/akr/gem-codesearch

Oompa-euler-1.0.3/lib/euler.rb:      max = Math.sqrt(to_factor).to_i + 1
Oompa-euler-1.0.3/lib/euler.rb:        5.step(Math.sqrt(self).to_i, 2) do |x|
Oompa-euler-1.0.3/lib/euler.rb:    3.step(Math.sqrt(max).to_i, 2) { |i|
algebra-0.2.3/lib/algebra/polynomial-factor-int.rb:      (Math.log(c * Math.sqrt(s))/Math.log(u)).to_i + 1
blackwinter-gsl-1.15.3.2/examples/random/randomwalk.rb:sigma = Math::sqrt(N).to_i
boggle-0.0.2/bin/boggle:opts[:size] = Math.sqrt(opts[:board].size).to_i if !opts[:board].empty?
boggler-0.0.1/lib/boggler/grid.rb:      @size = Math.sqrt(@dice.length).to_i
boqwij-0.1.0/lib/boqwij/integer_extensions.rb:        5.step(Math.sqrt(self).to_i, 2) do |x|
boqwij-0.1.0/lib/boqwij/integer_extensions.rb:      max = Math.sqrt(to_factor).to_i + 1
clusterer-0.1.9/lib/clusterer/clustering.rb:        options[:no_of_clusters] ||= Math.sqrt(objects.size).to_i
collective-0.2.1/lib/collective/checker.rb:    @estimate > 0 ? Math.sqrt(@estimate).to_i : 1
cw-0.3.3/lib/cw/read.rb:      #        @magnitude = Math.sqrt(magnitude_squared).to_i
euler-1.1.0/lib/euler.rb:      max = Math.sqrt(to_factor).to_i + 1
euler-1.1.0/lib/euler.rb:        5.step(Math.sqrt(self).to_i, 2) do |x|
euler-1.1.0/lib/euler.rb:    3.step(Math.sqrt(max).to_i, 2) { |i|
euler-1.1.0/lib/rudoku.rb:    @sqrt_n = Math.sqrt(@n).to_i
gecoder-1.1.1/lib/gecoder/interface/constraints/int/arithmetic.rb:        Gecode::Raw::sqrt(@model.active_space, int_op.to_int_var.bind, 
gecoder-with-gecode-1.1.1.1/lib/gecoder/interface/constraints/int/arithmetic.rb:        Gecode::Raw::sqrt(@model.active_space, int_op.to_int_var.bind, 
gphys-1.5.1/lib/numru/gphys/interpolate.rb:        ndiv[i] = Math.sqrt( (kx[i+1]-kx[i])**2 + (ky[i+1]-ky[i])**2).to_i
green_shoes-1.1.374/samples/sample56.rb:  .text(lambda {|d| d.class_name[0, Math.sqrt(d.node_value).to_i / 8]})
gsl-2.1.0.2/examples/random/randomwalk.rb:sigma = Math::sqrt(N).to_i
gsl-nmatrix-1.15.3.2/examples/random/randomwalk.rb:sigma = Math::sqrt(N).to_i
hotcocoa-0.6.3/lib/hotcocoa/graphics/elements/sandpainter.rb:      #@grains = (sqrt((ox-x)*(ox-x)+(oy-y)*(oy-y))).to_i
huge_enumerable-0.0.2/lib/huge_enumerable.rb:    increment = next_prime(( 2 * domain_size / (1 + Math.sqrt(5)) ).to_i)
ifmapper-2.0.4/lib/IFMapper/FXSpline.rb:      num = Math.sqrt( x2 + y2 ).to_i / 16
jwthumbs-0.1.0/lib/shutter.rb:			gridsize = Math.sqrt(@images.length).ceil.to_i
lazylist-0.3.2/examples/examples.rb:  not (2..Math.sqrt(x).to_i).find { |d| x % d == 0 }
magic_cloud-0.0.4/lib/magic_cloud/layouter/place.rb:          case (Math.sqrt(1 + 4 * sign * t) - sign).to_i & 3
mapbaidu-0.0.7/lib/map_baidu.rb:      x = Math.sqrt(radius**2 - y**2).ceil.to_i
mapcache-0.1.1/lib/mgmaps_export.rb:      tpf_x = Math.sqrt(tpf).to_i
mapcache-0.1.1/lib/mgmaps_export.rb:      tpf_x = Math.sqrt(tpf * 2).to_i
matrix_disp-0.0.3/lib/matrix_disp.rb:            for i in (0...Math.sqrt(other[0].size).to_i) do
matrix_disp-0.0.3/lib/matrix_disp.rb:              for x in (0...Math.sqrt(other[0].size).to_i) do
matrix_disp-0.0.3/lib/matrix_disp.rb:                for y in (0...Math.sqrt(other[0].size).to_i) do
matrix_disp-0.0.3/lib/matrix_disp.rb:                    adj[z] = other[0][x+(y*Math.sqrt(other[0].size).to_i)]
matrix_disp-0.0.3/lib/matrix_disp.rb:                resultado = resultado + tmp[i+(j*Math.sqrt(other[0].size).to_i)]*det(adj)
matrix_disp-0.0.3/lib/matrix_disp.rb:                resultado = resultado - tmp[i+(j*Math.sqrt(other[0].size).to_i)]*det(adj)
minesweeper-console-1.0.1/test/minesweeper/console/prettyprinter/minefield_pretty_printer_test.rb:          Math.sqrt(@string_representation.length).to_i
mlanett-hive-0.4.0/lib/hive/checker.rb:    @estimate > 0 ? Math.sqrt(@estimate).to_i : 1
mtah-ruby-treemap-0.0.3.2/lib/treemap/node.rb:            (base_size * Math.sqrt(self.bounds.width * self.bounds.height) / 125).to_i
naga_perfectsquare-1.0.1/lib/naga_perfectsquare.rb:    Math.sqrt(self) == Math.sqrt(self).to_i
narray-0.6.1.2/src/lib/narray/narray_ext.rb:      m = ((n+Math::sqrt(n))*1.27).to_i
narray-bigmem-0.0.0/lib/narray_ext.rb:      m = ((n+Math::sqrt(n))*1.27).to_i
nsudoku-0.2.0/lib/nsudoku/checker.rb:      @width = Math.sqrt(@sudoku.length).to_i
nsudoku-0.2.0/lib/nsudoku/checker.rb:      @block_width = Math.sqrt(@width).to_i
nsudoku-0.2.0/lib/nsudoku/solver.rb:      @width = Math.sqrt(sudoku.length).to_i
nsudoku-0.2.0/lib/nsudoku/solver.rb:      @block_width = Math.sqrt(@width).to_i
nsudoku-0.2.0/lib/nsudoku.rb:    @width = Math.sqrt(@sudoku.length).to_i
nsudoku-0.2.0/lib/nsudoku.rb:    @block_width = Math.sqrt(@width).to_i
numb-0.186.0/lib/numb/sum_of_squares.rb:      s = Math.sqrt(i).to_i
numeric_memoist-0.0.2/examples/isqrt_example.rb:      $stderr.puts "calculate sqrt(#{self}).to_i (<= MaxIntFloat)"
numeric_memoist-0.0.2/examples/isqrt_example.rb:      return Math.sqrt(self).to_i
numeric_memoist-0.0.2/examples/isqrt_example.rb:    $stderr.puts "calculate sqrt(#{self}).to_i"
numeric_memoist-0.0.2/examples/isqrt_example.rb:  # STDERR> calculate sqrt(123456789).to_i (<= MaxIntFloat)
numeric_memoist-0.0.2/examples/isqrt_example.rb:  # STDERR> calculate sqrt(986960440108935861883449099987613301336842162813430234017512025).to_i
numru-narray-1.0.3/lib/numru/narray_ext.rb:      m = ((n+Math::sqrt(n))*1.27).to_i
pixeldistance-0.2.1/lib/pixeldistance.rb:    Math.sqrt((x1-x2)**2 + (y1-y2)**2).to_i >> (21 - zoom)
prime-multiplication-1.0/lib/PrimeMultiplication.rb:		(2..(Math.sqrt(number).to_i)).each do |i|
prime_numbers-0.0.4/lib/prime_numbers/prime_int.rb:      2.upto(Math.sqrt(num).to_i) do |x|
prime_table-0.0.2/lib/prime_table/prime.rb:      sqrt = Math.sqrt(self).to_i
primes-utils-2.7.0/lib/primes/utils.rb:        sqrtN = Math.sqrt(n).to_i
primes-utils-2.7.0/lib/primes/utils.rb:        sqrtN = Math.sqrt(n).to_i
primes-utils-2.7.0/lib/primes/utils.rb:            factors << p; r -=1; n /= p; sqrtN = Math.sqrt(n).to_i
primes-utils-2.7.0/lib/primes/utils.rb:      sqrtN = Math.sqrt(num).to_i   # sqrt of end_num (end of range)
primes-utils-2.7.0/lib/primes/utils.rb:      if start_num <= Math.sqrt(num).to_i     # for one array of primes upto N
print_square-0.0.2/lib/print_square/command_runner.rb:        size = Math.sqrt(number).to_i
quasi-0.0.6/lib/halton.rb:  m=Math.sqrt(q).to_int
rlsm-1.8.1/lib/rlsm/monoid.rb:      @order = Math::sqrt(@table.size).to_i
royw-drbman-0.0.6/examples/primes/lib/primes/sieve_of_eratosthenes.rb:    sqr_primes = primes(Math.sqrt(maximum).to_i, drbman)
ruby-compiler-0.1.1/vendor/ruby/lib/prime.rb:    (7..Math.sqrt(self).to_i).step(30) do |p|
ruby-spark-1.2.1/benchmark/comparison/ruby.rb:    upper = Math.sqrt(x.to_f).to_i
ruby-spark-1.2.1/lib/spark/rdd.rb:      max_sample_size = Integer::MAX - (num_st_dev * Math.sqrt(Integer::MAX)).to_i
ruby-treemap-0.0.4/lib/treemap/node.rb:            (base_size * Math.sqrt(self.bounds.width * self.bounds.height) / 125).to_i
ruby-treemap-fork-0.0.4/lib/treemap/node.rb:            (base_size * Math.sqrt(self.bounds.width * self.bounds.height) / 125).to_i
rubyvis-0.6.1/examples/5_pv_hierarchies/bubble_charts.rb:  .text(lambda {|d| d.class_name[0, Math.sqrt(d.node_value).to_i / 8]})
rusby-0.1.2/lib/rusby/profiler.rb:    SQRT_ITERATIONS = Math.sqrt(1000).to_i
simstring_pure-1.1.1/lib/simstring_pure.rb:      (alpha * Math.sqrt(query_size * y_size)).ceil.to_i
sudoku_maker-0.0.2/lib/sudoku_maker/maker.rb:      @decision_num = Math.sqrt(num).to_i
sudoku_solver-1.0.0/lib/sudoku_solver/solver.rb:      @dimension = Math.sqrt(@width).to_i
tactical_tic_tac_toe-1.0.0/lib/board.rb:      Math.sqrt(marked_spaces.count).to_i
taskjuggler-3.6.0/lib/taskjuggler/reports/ChartPlotter.rb:        rr = (r / Math.sqrt(2.0)).to_i
tdiary-contrib-5.0.3/plugin/category_to_tagcloud.rb:		level = ((Math.sqrt(count) - min) * factor).to_i
tic_tac_toes-0.1.8/lib/tic_tac_toes/ui/serializer.rb:        row_size = Math.sqrt(board_structure.count).to_i
ul-wukong-4.1.1/lib/wukong/widget/reducers/bin.rb:        self.num_bins = Math.sqrt(total_count).to_i
whitestone-1.0.2/etc/extra_tests/output_examples_code.rb:  (2..Math.sqrt(n).to_i).map { |i| n % i }.all? { |rem| rem != 0 }
wukong-4.0.0/lib/wukong/widget/reducers/bin.rb:        self.num_bins = Math.sqrt(total_count).to_i

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

I did a simplifications in isqrt1, replacing all the '+'s with '|' operations,
leaving only the math subtraction operation n -= t . I'll see if I can get rid of that.

This makes it a bit faster, especially for numbers < 10**400, where it starts being
faster than iroot2.

This suggests, for optimal overall speed, that a hybrid method can be crafted, like I
proposed for using Math.sqrt(n).to_i, like:

def sqrt_i(n); n < UPPER_BOUND ? n.iroot2 : isqrt1(n)  end

class Integer
  def irootn(n)
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    num  = self.abs
    bits_shift = (num.bit_length)/n + 2
    root, bitn_mask = 0, (1 << bits_shift)
    until (bitn_mask >>= 1) == 0
      root |= bitn_mask
      root ^= bitn_mask if root**n > num
    end
    root *= (self < 0 ? -1 : 1)
  end

  def iroot2; irootn(2) end    
end

def isqrt(n)
    r = 0
    x = 1 << ((n.bit_length >> 1 ) << 1)
    while x != 0 do
        t = r + x
        if n >= t
            n -= t
            r = (r >> 1) + x
        else
            r >>= 1
        end
        x >>= 2
    end
    r
end

def isqrt1(n)
    r = 0
    x = 1 << ((n.bit_length >> 1 ) << 1)
    until x == 0
        t = r | x
        r >>= 1
        (n -= t; r |= x) if n >= t
        x >>= 2
    end
    r
end

Benchmark.ips do |x|
  exp = 400; n = 10**exp
  puts "integer squareroot tests for n = 10**#{exp}"
  x.report("iroot2"     ) { n.iroot2    }
#  x.report("irootn(2)"  ) { n.irootn(2) }
  x.report("isqrt(n)"   ) { isqrt(n)    }
  x.report("isqrt1(n)"  ) { isqrt1(n)   }
  x.compare!
end
2.4.0 :118 > load 'irootstest.rb'
integer squareroot tests for n = 10**50
Warming up --------------------------------------
              iroot2     4.157k i/100ms
            isqrt(n)     3.590k i/100ms
           isqrt1(n)     4.029k i/100ms
Calculating -------------------------------------
              iroot2     41.870k (± 3.7%) i/s -    212.007k in   5.070841s
            isqrt(n)     36.039k (± 3.6%) i/s -    183.090k in   5.087128s
           isqrt1(n)     40.580k (± 3.4%) i/s -    205.479k in   5.069789s

Comparison:
              iroot2:    41870.0 i/s
           isqrt1(n):    40579.8 i/s - same-ish: difference falls within error
            isqrt(n):    36039.1 i/s - 1.16x  slower

 => true 
2.4.0 :119 > load 'irootstest.rb'
integer squareroot tests for n = 10**100
Warming up --------------------------------------
              iroot2     1.385k i/100ms
            isqrt(n)   879.000  i/100ms
           isqrt1(n)   883.000  i/100ms
Calculating -------------------------------------
              iroot2     14.004k (± 3.2%) i/s -     70.635k in   5.049521s
            isqrt(n)      8.558k (± 3.9%) i/s -     43.071k in   5.040986s
           isqrt1(n)      9.173k (± 3.8%) i/s -     45.916k in   5.012857s

Comparison:
              iroot2:    14003.5 i/s
           isqrt1(n):     9173.2 i/s - 1.53x  slower
            isqrt(n):     8557.9 i/s - 1.64x  slower

 => true 
2.4.0 :120 > load 'irootstest.rb'
integer squareroot tests for n = 10**200
Warming up --------------------------------------
              iroot2   425.000  i/100ms
            isqrt(n)   377.000  i/100ms
           isqrt1(n)   393.000  i/100ms
Calculating -------------------------------------
              iroot2      4.248k (± 4.1%) i/s -     21.250k in   5.010948s
            isqrt(n)      3.722k (± 4.5%) i/s -     18.850k in   5.075627s
           isqrt1(n)      3.876k (± 4.9%) i/s -     19.650k in   5.081749s

Comparison:
              iroot2:     4248.2 i/s
           isqrt1(n):     3876.3 i/s - 1.10x  slower
            isqrt(n):     3721.7 i/s - 1.14x  slower

 => true 
2.4.0 :121 > load 'irootstest.rb'
integer squareroot tests for n = 10**300
Warming up --------------------------------------
              iroot2   252.000  i/100ms
            isqrt(n)   219.000  i/100ms
           isqrt1(n)   238.000  i/100ms
Calculating -------------------------------------
              iroot2      2.477k (± 6.5%) i/s -     12.348k in   5.008024s
            isqrt(n)      2.263k (± 6.6%) i/s -     11.388k in   5.060793s
           isqrt1(n)      2.407k (± 4.0%) i/s -     12.138k in   5.050951s

Comparison:
              iroot2:     2476.8 i/s
           isqrt1(n):     2407.2 i/s - same-ish: difference falls within error
            isqrt(n):     2262.7 i/s - same-ish: difference falls within error

 => true 
2.4.0 :122 > load 'irootstest.rb'
integer squareroot tests for n = 10**400
Warming up --------------------------------------
              iroot2   101.000  i/100ms
            isqrt(n)   156.000  i/100ms
           isqrt1(n)   165.000  i/100ms
Calculating -------------------------------------
              iroot2    972.069  (± 6.0%) i/s -      4.848k in   5.006825s
            isqrt(n)      1.495k (± 7.3%) i/s -      7.488k in   5.038471s
           isqrt1(n)      1.611k (± 7.0%) i/s -      8.085k in   5.045823s

Comparison:
           isqrt1(n):     1610.8 i/s
            isqrt(n):     1494.9 i/s - same-ish: difference falls within error
              iroot2:      972.1 i/s - 1.66x  slower

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

Thanks Akira for that list.

So, too small integer squareroots potentially can cause miss finding primes in some apps,
and being too large can create doing more work than necessary in some apps, and just being
wrong in other apps has unknown consequences. :(

I know how this affected my specific apps, and was diligent (lucky) enough to track it down.

How many other situations are out there being affected by this that people don't even know
are giving wrong answers, or know something is amiss and have no hint of what's causing it?

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

So the clear winner, and new champeen is...newtons_fast, at least for squareroots.

Just goes to show, don't f**k with Newton!

class Integer
  def irootn(n)
    return nil if self < 0 && n.even?
    raise "root n is < 2 or not an Integer" unless n.is_a?(Integer) && n > 1
    num  = self.abs
    bits_shift = (num.bit_length)/n + 2
    root, bitn_mask = 0, (1 << bits_shift)
    until (bitn_mask >>= 1) == 0
      root |= bitn_mask
      root ^= bitn_mask if root**n > num
    end
    root *= (self < 0 ? -1 : 1)
  end

  def iroot2; irootn(2) end    
end

def isqrt1(n)
    r = 0
    x = 1 << ((n.bit_length >> 1 ) << 1)
    until x == 0
        t = r | x
        r >>= 1
        (n -= t; r |= x) if n >= t
        x >>= 2
    end
    r
end

def newtons_fast(n)  # Newton's method with √n < initial_x <= 2*√n
  raise if n<0
  return n if n<=1
  x = 1<<((n.bit_length+1)/2)
  y = (x + n/x)/2
  while y < x
    x = y
    y = (x + n/x)/2
  end
  x
end

Benchmark.ips do |x|
  exp = 400; n = 10**exp
  puts "integer squareroot tests for n = 10**#{exp}"
  x.report("iroot2"     ) { n.iroot2    }
  x.report("isqrt1(n)"  ) { isqrt1(n)   }
  x.report("newtons_fast(n)") { newtons_fast(n)
  x.compare!
end
2.4.0 :138 > load 'irootstest.rb'                                      
integer squareroot tests for n = 10**5
Warming up --------------------------------------
              iroot2    85.000k i/100ms
           isqrt1(n)    87.646k i/100ms
     newtons_fast(n)   191.796k i/100ms
Calculating -------------------------------------
              iroot2      1.014M (± 2.9%) i/s -      5.100M in   5.031392s
           isqrt1(n)      1.049M (± 3.0%) i/s -      5.259M in   5.018645s
     newtons_fast(n)      2.846M (± 4.6%) i/s -     14.385M in   5.066760s

Comparison:
     newtons_fast(n):  2845709.1 i/s
           isqrt1(n):  1048817.1 i/s - 2.71x  slower
              iroot2:  1014498.5 i/s - 2.81x  slower

 => true 
2.4.0 :139 > load 'irootstest.rb'
integer squareroot tests for n = 10**50
Warming up --------------------------------------
              iroot2     4.077k i/100ms
           isqrt1(n)     3.992k i/100ms
     newtons_fast(n)    24.377k i/100ms
Calculating -------------------------------------
              iroot2     41.189k (± 6.1%) i/s -    207.927k in   5.071463s
           isqrt1(n)     40.293k (± 3.4%) i/s -    203.592k in   5.058807s
     newtons_fast(n)    260.960k (± 3.4%) i/s -      1.316M in   5.050191s

Comparison:
     newtons_fast(n):   260959.7 i/s
              iroot2:    41188.8 i/s - 6.34x  slower
           isqrt1(n):    40292.7 i/s - 6.48x  slower

 => true 
2.4.0 :140 > load 'irootstest.rb'
integer squareroot tests for n = 10**500
Warming up --------------------------------------
              iroot2    74.000  i/100ms
           isqrt1(n)   127.000  i/100ms
     newtons_fast(n)     3.775k i/100ms
Calculating -------------------------------------
              iroot2    741.359  (± 4.6%) i/s -      3.700k in   5.001895s
           isqrt1(n)      1.258k (± 7.0%) i/s -      6.350k in   5.077988s
     newtons_fast(n)     37.762k (± 4.5%) i/s -    188.750k in   5.008756s

Comparison:
     newtons_fast(n):    37761.5 i/s
           isqrt1(n):     1257.6 i/s - 30.03x  slower
              iroot2:      741.4 i/s - 50.94x  slower

 => true 
2.4.0 :141 > load 'irootstest.rb'
integer squareroot tests for n = 10**5000
Warming up --------------------------------------
              iroot2     1.000  i/100ms
           isqrt1(n)     5.000  i/100ms
     newtons_fast(n)   340.000  i/100ms
Calculating -------------------------------------
              iroot2     11.196  (± 8.9%) i/s -     56.000  in   5.013699s
           isqrt1(n)     53.056  (± 3.8%) i/s -    265.000  in   5.002572s
     newtons_fast(n)      3.402k (± 3.1%) i/s -     17.000k in   5.002400s

Comparison:
     newtons_fast(n):     3401.9 i/s
           isqrt1(n):       53.1 i/s - 64.12x  slower
              iroot2:       11.2 i/s - 303.85x  slower

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

Another reason this needs to change is because Math.sqrt(n).to_i
has an upper limit before crashing.

Here's something else to consider:

There is all that code out there using Math.sqt(n).to_i and I presume
most people expect it to return the correct result (which many probably
don't even realize what that should be).

Is there some possible way for the smart Ruby Gurus to fix the parser so
that when it sees the phrase Math.sqrt(xyz).to_i it uses whatever
chosen correct implementation decided upon, so all that old legacy code
out there, in the core, gems, and anywhere else, will just magically work?

I know it's a hack, but Ruby is designed to make programmers happy, and it
wasn't their fault this bug exists. I know it would make me happy to not
have to go back over everything I've done and replace it with a new method name.

Just something to consider.

2.4.0 :166 > load 'irootstest.rb'
integer squareroot tests for n = 10**308
Warming up --------------------------------------
     newtons_fast(n)     5.042k i/100ms
   Math.sqrt(n).to_i   167.539k i/100ms
Calculating -------------------------------------
     newtons_fast(n)     51.171k (± 4.3%) i/s -    257.142k in   5.034762s
   Math.sqrt(n).to_i      2.411M (± 6.1%) i/s -     12.063M in   5.022522s

Comparison:
   Math.sqrt(n).to_i:  2411213.7 i/s
     newtons_fast(n):    51170.9 i/s - 47.12x  slower

 => true 
2.4.0 :167 > load 'irootstest.rb'
integer squareroot tests for n = 10**309
Warming up --------------------------------------
     newtons_fast(n)     4.190k i/100ms
   Math.sqrt(n).to_iFloatDomainError: Infinity
        from irootstest.rb:93:in `to_i'
        from irootstest.rb:93:in `block (2 levels) in <top (required)>'
        from /home/jzakiya/.rvm/gems/ruby-2.4.0/gems/benchmark-ips-2.7.2/lib/benchmark/ips/job/entry.rb:53:in `call_times'
        from /home/jzakiya/.rvm/gems/ruby-2.4.0/gems/benchmark-ips-2.7.2/lib/benchmark/ips/job.rb:220:in `block in run_warmup'
        from /home/jzakiya/.rvm/gems/ruby-2.4.0/gems/benchmark-ips-2.7.2/lib/benchmark/ips/job.rb:206:in `each'
        from /home/jzakiya/.rvm/gems/ruby-2.4.0/gems/benchmark-ips-2.7.2/lib/benchmark/ips/job.rb:206:in `run_warmup'
        from /home/jzakiya/.rvm/gems/ruby-2.4.0/gems/benchmark-ips-2.7.2/lib/benchmark/ips/job.rb:185:in `block in run'
        from /home/jzakiya/.rvm/gems/ruby-2.4.0/gems/benchmark-ips-2.7.2/lib/benchmark/ips/job.rb:184:in `times'
        from /home/jzakiya/.rvm/gems/ruby-2.4.0/gems/benchmark-ips-2.7.2/lib/benchmark/ips/job.rb:184:in `run'
        from /home/jzakiya/.rvm/gems/ruby-2.4.0/gems/benchmark-ips-2.7.2/lib/benchmark/ips.rb:59:in `ips'
        from irootstest.rb:85:in `<top (required)>'
        from (irb):167:in `load'
        from (irb):167
        from /home/jzakiya/.rvm/rubies/ruby-2.4.0/bin/irb:11:in `<main>'
2.4.0 :168 > 

Updated by Student (Nathan Zook) almost 8 years ago

First, please, PLEASE do not credit me with the optimized bit-by-bit method. I'm a formally trained mathematician, and I DO NOT want credit for someone else's work. It came from the above-referenced Reddit post.

Second, it appears that we must have a pretty efficient MP divide, or any direct version of Newton's method will perform poorly. Obviously, I now need to go ahead & see what I get with Zimmerman's method...
And probably look into Newton-Raphson for comparison. As I stated, this will take work. BTW, performance of Newton's method depends a lot on N. If I come up with a rival, we will likely need a lot of tests to validate speed claims.

Third, please don't get me started on Prime.rb when it comes to large numbers. :D The implementation there is simply not to be used even for full-word numbers. Try timing (1 << 56) + 81 to see what I mean. I've actually done experiments with Miller-Rabin, and...I'll probably be submitting some code soon.

I had not thought about the implications of Math.sqrt overestimating. It is a float, so yeah.

Actually, I need to leave prime.rb alone if I'm going to work here. There is some crazy work done for small (< 2^^64) numbers...

Updated by Student (Nathan Zook) almost 8 years ago

# Core Algorithm by Paul Zimmerman, article entitled
# Karatsuba Square Root
# https://hal.inria.fr/inria-00072854/document
# Presumably used in GMP.

# Personal computations derived from implementation details (page 5)
# n >= b**4 / 4
# b = 2**64**k
# n * 4 >= b ** 4
# n.length + 2 >= b.length * 4
# (n.length + 2) >> 2 >= b.length
# (n.length + 2) >> 2 >= 64 * k
# ((n.length + 2) >> 2) / 64 = k

def sqrtrem(n)
  nlength = n.bit_length
  if nlength >= (64 * 4 - 2)
    bbits = (n.bit_length + 2 >> 2) / 64 * 64
  elsif nlength >= (32 * 4 - 2)
    bbits = (n.bit_length + 2 >> 2) / 32 * 32
  elsif nlength >= (16 * 4 - 2)
    bbits = (n.bit_length + 2 >> 2) / 16 * 16
  else # single word now -- my computer has 64-bit mantissas!
    s = Math.sqrt(n).to_i
    r = n - s * s
    return [s, r]
  end

  b = 1 << bbits
  bmask = b - 1
  a3 =  n >> bbits * 3
  a2 = (n >> bbits * 2) & bmask
  a1 = (n >> bbits    ) & bmask
  a0 =  n               & bmask

  i, j = sqrtrem((a3 << bbits) + a2)
  q, u = ((j << bbits) + a1).divmod(i << 1)
  s = (i << bbits) + q
  r = (u << bbits) + a0 - q ** 2

  if r < 0
    r += (s << 1) - 1
    s -= 1
  end

  [s, r]
end

def sqrt_z(n)
  s, r = sqrtrem(n)
  s
end


Benchmark.ips do |x|
   n = 10 ** 5000
   x.report("iroot2") { n.iroot2 }
   x.report("sqrt_z") { sqrt_z(n) }
end

Warming up --------------------------------------
              iroot2     1.000  i/100ms
              sqrt_z   987.000  i/100ms
Calculating -------------------------------------
              iroot2      8.665  (± 0.0%) i/s -     44.000  in   5.078551s
              sqrt_z     10.015k (± 1.6%) i/s -     50.337k in   5.027514s
=>

I've been at this for way too long today.

A few notes:

  1. Assuming no architectural weirdness, it looks like--for this PARTICULAR case--Zimmerman's algorithm is doing about 4x better than what you are calling newtons_fast. First, this is not a surprise, as no one uses Newton's method this way. What is interesting is that Zimmerman's algorithm is designed to be good on GMP, and I am using stock ruby.
  2. That hack at the top can almost certainly be dramatically improved (It's my contribution). This is a demo of Zimmerman's work--which doesn't even really want to talk about numbers smaller than 2**254. Please don't bother looking at performance under 1000 bits until that is cleaned up.
  3. In a note that Zimmerman included in his article, he indicated that Newton-Raphson would probably be faster. As I stated, I have been unable to find an implementation, and I am loathe to present anything without high confidence that it is correct. I will do some more looking tomorrow.
  4. In some sense it is silly to do timing tests like this. We have no idea how much time is being spent in ruby and how much in C. It is entirely likely that we could have 10x speed improvements (or much, much more) across certain ranges for some of these methods by dropping into C. As I said, it is a question of how much work we want to do. At a minimum, we should do a GMP-linked build and see what Zimmerman's work can really do. Again, I don't know the legal considerations, and I would like guidance before attempting to reimplement an MP library from scratch.

Updated by Student (Nathan Zook) almost 8 years ago

Sorry to spam like this, but I really have to disagree regarding some of the implications drawn for Math.sqrt(n). This returns a float. If you don't know about the pointy edges of IEEE-754, but you do care, then shame on you. But allow me to calm you regarding Numeric#prime?. Let p be the largest prime < 2 ** 32. Run (p * p).prime?. When we fix that problem, we can take care of the rounding issue.

Updated by Student (Nathan Zook) almost 8 years ago

Following the discussion from the second answer at http://cs.stackexchange.com/questions/37596/arbitrary-precision-integer-square-root-algorithm, I implemented an integer Newton - Raphson on the inverse square root. This involved correcting an error and clarifying some things. The results were disappointing, as the asymptotic behavior is clearly beaten by the transcribed Zimmerman algorithm. The crossover was somewhat beyond 10 ** 1000. For reference, I made a faster Newton method by using Math.sqrt for the initial guess.

I also adjusted my transcription of the Zimmerman algorithm to reflect the fact that Matz implemented a simple , portable Math.sqrt. Before we use it, we would need to do serious testing, of course.

One more round of benchmarks. I have removed the warmup & comparison sections. I also implemented a newton_faster method, which uses Math.sqrt for the initial guess.

integer squareroot tests for n = 10**50
Calculating -------------------------------------
     newtons_fast(n)    176.714k (± 1.3%) i/s -    891.495k in   5.045686s
    newton_faster(n)    389.709k (± 0.8%) i/s -      1.952M in   5.009508s
           sqrt_z(n)    166.894k (± 1.4%) i/s -    847.715k in   5.080312s
   inverse Newton(n)    250.008k (± 0.4%) i/s -      1.271M in   5.083042s

integer squareroot tests for n = 10**500
Calculating -------------------------------------
     newtons_fast(n)     31.530k (± 3.4%) i/s -    158.496k in   5.032977s
    newton_faster(n)     61.087k (± 3.1%) i/s -    306.020k in   5.014480s
           sqrt_z(n)     33.167k (± 0.5%) i/s -    167.128k in   5.039057s
   inverse Newton(n)     51.252k (± 3.6%) i/s -    257.972k in   5.041610s

integer squareroot tests for n = 10**1000
Calculating -------------------------------------
     newtons_fast(n)     14.085k (± 0.8%) i/s -     71.656k in   5.087601s
    newton_faster(n)     24.078k (± 3.3%) i/s -    121.074k in   5.034512s
           sqrt_z(n)     25.410k (± 2.3%) i/s -    127.500k in   5.020581s
   inverse Newton(n)     28.921k (± 2.7%) i/s -    145.350k in   5.030223s

integer squareroot tests for n = 10**2000
Calculating -------------------------------------
     newtons_fast(n)      3.994k (± 0.7%) i/s -     20.247k in   5.069925s
    newton_faster(n)      6.698k (± 0.7%) i/s -     34.017k in   5.079015s
           sqrt_z(n)     19.176k (± 2.3%) i/s -     96.951k in   5.058731s
   inverse Newton(n)  13.724k (± 1.7%) i/s -     68.952k in   5.025785s

integer squareroot tests for n = 10**4000
Calculating -------------------------------------
     newtons_fast(n)      1.035k (± 0.4%) i/s -      5.253k in   5.074204s
    newton_faster(n)      1.600k (± 0.9%) i/s -      8.000k in   5.001751s
           sqrt_z(n)     12.478k (± 0.7%) i/s -     62.475k in   5.006905s
   inverse Newton(n)      6.094k (± 0.7%) i/s -     30.957k in   5.079890s

integer squareroot tests for n = 10**5000
Calculating -------------------------------------
     newtons_fast(n)    628.308  (± 0.6%) i/s -      3.162k in   5.032783s
    newton_faster(n)    915.131  (± 0.4%) i/s -      4.641k in   5.071494s
           sqrt_z(n)     10.107k (± 0.6%) i/s -     50.796k in   5.026155s
   inverse Newton(n)      4.539k (± 3.0%) i/s -     22.850k in   5.038822s

It is possible that "inverse Newton" could be modified to advantage the particulars of our implementation of bignum, or GMP, should it be linked. I don't want spam any more performances after this until it is determined if we want to drop down to C for this.

One last thing to play with might be Halley's method. This converges cubically, but requires 4 multiplies. This should compute faster. But don't expect to see it very soon.

Updated by Student (Nathan Zook) almost 8 years ago

All of the algorithms are now at https://github.com/NathanZook/ruby_sqrt. Hope that's okay.

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

Hey Nathan, Great Work!

I'm seeing an error in inverse_newton_sqrt(n) being one-off.

2.4.0 :224 > n = 10**10 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n) 
100000
100000
100000
100000
 => nil 
2.4.0 :225 > n = 10**11 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n) 
316227
316227
316227
316227
 => nil 
2.4.0 :226 > n = 10**12 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n) 
1000000
1000000
1000000
1000000
 => nil 
2.4.0 :227 > n = 10**13 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n) 
3162277
3162277
3162277
3162277
 => nil 
2.4.0 :228 > n = 10**14 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n) 
10000000
10000000
10000000
10000000
 => nil 
2.4.0 :229 > n = 10**15 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n) 
31622776
31622776
31622776
31622776
 => nil 
2.4.0 :230 > n = 10**16 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n)
100000000
100000000
100000000
99999999
 => nil 
2.4.0 :231 > n = 10**17 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n)
316227766
316227766
316227766
316227765
 => nil 
2.4.0 :232 > n = 10**20 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n)
10000000000
10000000000
10000000000
9999999999
 => nil 
2.4.0 :233 > n = 10**30 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n)
1000000000000000
1000000000000000
1000000000000000
1000000000000000
 => nil 
2.4.0 :234 > n = 10**40 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n)
100000000000000000000
100000000000000000000
100000000000000000000
99999999999999999999
 => nil 
2.4.0 :235 > n = 10**50 ; puts sqrt_z(n), newtons_fast(n), newton_faster(n), inverse_newton_sqrt(n)
10000000000000000000000000
10000000000000000000000000
10000000000000000000000000
9999999999999999999999999
 => nil 
2.4.0 :236 > 

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

Oh I see where at least part of the error comes from.

In this

def inverse_newton_sqrt(n)
  raise if n < 0
  return Math.sqrt(n).to_i if n < 1 << 53

See top of this thread at #1
I showed earlier I found that Math.sqrt(n).to_i starts given incorrect
results for n > 9_999_899_999_899_999_322_536_673_279

I replaced if n < 1 << 53 with if n <= 9_999_899_999_899_999_322_536_673_279

This eliminated the shown errors upto n = 10**40, when it started to be one-off again.

This is on a 64-bit Linux OS I7 cpu laptop.

Updated by stomar (Marcus Stollsteimer) almost 8 years ago

Jabari Zakiya wrote:

I showed earlier I found that Math.sqrt(n).to_i starts given incorrect
results for n > 9_999_899_999_899_999_322_536_673_279

I replaced if n < 1 << 53 with if n <= 9_999_899_999_899_999_322_536_673_279

Is this guaranteed to be platform independent?

And a more general consideration:

I assume Integer.sqrt() will be implemented in C, not in Ruby? How significant or useful are all those benchmarks then? Micro-optimizing Ruby code for speed and then reimplementing it in C seems to be the wrong approach.

Updated by Student (Nathan Zook) almost 8 years ago

Jabari Zakiya wrote:

Oh I see where at least part of the error comes from.

In this

def inverse_newton_sqrt(n)
  raise if n < 0
  return Math.sqrt(n).to_i if n < 1 << 53

See top of this thread at #1
I showed earlier I found that Math.sqrt(n).to_i starts given incorrect
results for n > 9_999_899_999_899_999_322_536_673_279

9999899999899999322536673279 < 1 << 53
=> false

So this cannot be the source of this particular problem. I am not surprised that there are errors in inverse_newton_sqrt. It was quickly put together, designed to credibly determine if it is faster than the Zimmerman algorithm. Getting the last bit correct on perfect squares is actually hard. If it appeared that it would be a superior algorithm, I would have needed to have proven convergence. I had expected that the +2 in ins_find_initial_exponent would have sufficed, but it looks like I probably needed to add another bit because of the use of fixed point rather than floating point in the algorithm.

I will credit you with finding the errors.

The question now is, "where do we go from here?" Clearly, faster Newton and Zimmerman are the contenders. At this moment, faster Newton wins almost to 10 * 1000, which is about 3000 bits, on speed alone. It is also clearly correct. (It conceivably does not converge, but that is easy to fix.)

The Zimmerman algorithm only works if I got that hack at the top correct. Actually, if we switch to a different algorithm at that point, we can probably eliminate a full pass (or two??) through the algorithm, which will speed it up lot for smaller numbers. I guess I should reach out to him & see if he will release his work to the Ruby license, in which case I can rely upon his proofs & techniques... :D

Both will benefit if we drop to C, but Zimmerman will gain a lot more. (Besides all of the ruby that happens in the loop, consider also the tricks that can be played with extracting a3-a0, for instance.)

Updated by Student (Nathan Zook) almost 8 years ago

Marcus Stollsteimer wrote:

Jabari Zakiya wrote:

I showed earlier I found that Math.sqrt(n).to_i starts given incorrect
results for n > 9_999_899_999_899_999_322_536_673_279

I replaced if n < 1 << 53 with if n <= 9_999_899_999_899_999_322_536_673_279

Is this guaranteed to be platform independent?

If this were the issue, it would be. Math.sqrt() runs through some IEEE-754-specificed functions in C. I know of no funny business in these particular functions.

And a more general consideration:

I assume Integer.sqrt() will be implemented in C, not in Ruby? How significant or useful are all those benchmarks then? Micro-optimizing Ruby code for speed and then reimplementing it in C seems to be the wrong approach.

I do expect that we want to go with C, but I don't have an answer. The purpose of these benchmarks has been to get some idea of which algorithms are credible. We have run large enough samples that it is clear which one have better O() performance based on the way that ruby handles large numbers. So long as we are smart, that does not depend on the ruby code.

And see my just-posted comment for more. ;)

Actions #53

Updated by Student (Nathan Zook) almost 8 years ago

It is now clear that Halley's method requires five multiplies, not the four previously reported by wikipedia. This makes it uninteresting as a candidate.

Actions #54

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago

  • Status changed from Open to Closed

Applied in changeset r57705.


Integer.sqrt [Feature #13219]

Updated by Student (Nathan Zook) almost 8 years ago

Jabari Zakiya wrote:

Oh I see where at least part of the error comes from.

In this

def inverse_newton_sqrt(n)
  raise if n < 0
  return Math.sqrt(n).to_i if n < 1 << 53

See top of this thread at #1
I showed earlier I found that Math.sqrt(n).to_i starts given incorrect
results for n > 9_999_899_999_899_999_322_536_673_279

I replaced if n < 1 << 53 with if n <= 9_999_899_999_899_999_322_536_673_279

This eliminated the shown errors upto n = 10**40, when it started to be one-off again.

This is on a 64-bit Linux OS I7 cpu laptop.

Actually, with some understanding of how floating point arithmetic works, the first even root past 1 << 53 is likely to be bad:

irb(main):1081:0>
def msi(n)
  x = Math.sqrt(n).to_i
  big = x * x > n
  small = x * x + 2 * x < n
  puts [n, x, big, small]
end

irb(main):1082:0> msi(1<<53)
9007199254740992
94906265
false
false
=> nil
irb(main):1083:0> msi(94906266 ** 2 - 1)
9007199326062755
94906266
true
false
=> nil

;)

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

I want to raise for consideration, as a user, the API behavior for these methods.

I noticed in the other squareroot methods they raise an error for n < 0
whereas my version returns 'nil' when taking an even root of a negative integer.

Let me promote why this shouldn't be treated as a runtime error.

The reason I chose to return 'nil' is because in some applications where I'm
processing lots of data I don't want the program to bomb when encountering
negative integers, I want to continue and just choose how to process those conditions.

First, mathematically it is perfectly allowable to take the even root of a negative number,
it's just a complex number, so conceptually this shouldn't be an error, just unwanted for this method.

With irootn|2 I know I will either get a real integer, which means n was a valid integer (and root)
or 'nil', which means it didn't have a real (as in complex part) root.

If I wanted to check for negativity in the first place I could always do something like:

def my_app_sqrt(n); n < 0 ? Complex(0, n.abs.iroot2) : n.iroot2 end

or using the root method from my Roots gem:

def my_app_sqrt(n); n < 0 ? n.root(2) : n.iroot2 end

As a user, I want to choose how to handle cases when my app may try to take an even root
of a negative integer/number and not automatically throw an error that crashes my program.

I think as long as this is clearly documented (either way) for the API a user will know
exactly how to deal with these conditions, but returning 'nil' gives the user total control.

Updated by stomar (Marcus Stollsteimer) almost 8 years ago

Jabari Zakiya wrote:

I noticed in the other squareroot methods they raise an error for n < 0
whereas my version returns 'nil' when taking an even root of a negative integer.

I strongly agree with nobu's implementation. That's exactly the behavior I was about to suggest when I noticed it was already there.

The reason I chose to return 'nil' is because in some applications where I'm
processing lots of data I don't want the program to bomb when encountering
negative integers, I want to continue and just choose how to process those conditions.

It doesn't have to bomb.

  1. for typical use cases (number theory, etc.) you will have only positive values anyway
  2. you can check your data before using the method
  3. you can use a begin...rescue block and handle the exception appropriately

Returning nil for a mathematical operation doesn't make much sense. And a silent failure you are not aware of will bite you much more than an exception.

First, mathematically it is perfectly allowable to take the even root of a negative number,
it's just a complex number, so conceptually this shouldn't be an error, just unwanted for this method.

"Not an error, but unwanted"???

The point is that by premise the method doesn't make any sense in the complex domain. It's supposed to return the largest integer (natural number) less or equal to the square root. This wouldn't work for complex numbers at all: how would you truncate to an integer? how would you do the comparison? (Complex numbers are not ordered.)

Therefore, isqrt only does make sense for natural numbers. (Different from the "normal" sqrt.)

Updated by jzakiya (Jabari Zakiya) almost 8 years ago

I disagree, but as long as the API explicitly states what's happens for this case a user can adapt accordingly

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago

  • Description updated (diff)
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0