Project

General

Profile

Feature #16335

C Ruby time.c leap_year_p and date_core.c c_gregorian_leap_p - two suggestions for minor changes

Added by colin13rg (Colin Bartlett) 9 days ago. Updated 4 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:95761]

Description

This isn't a bug report because as written they work correctly.
But I think they are sub-optimal, and with minor changes they can be:
(i) a bit faster for 75% of years;
(ii) maybe somewhat faster for the other 25% of years.
I'm more than happy to answer questions.
Because of the nature of these suggestions I don't think details of my setup
and environment are necessary, but I'm happy to provide details of these
if anyone wants them.

These also apply to JRuby RubyDate.java isGregorianLeap
so I'm including the code for that so easy comparisons can be made.
I have also posted this on a JRuby forum.

Currently:

  • C Ruby: time_c: leap_year_p(long y):

    return ((y % 4 == 0) && (y % 100 != 0)) || (y % 400 == 0);
    
  • C Ruby: date_core.c: c_gregorian_leap_p(int y):

    return (MOD(y, 4) == 0 && y % 100 != 0) || MOD(y, 400) == 0;
    
  • JRuby: RubyDate.java: isGregorianLeap(final long year):

    return year % 4 == 0 &&  year % 100 != 0 || year % 400 == 0;
    

Suppose y (or year) is not exactly divisible by 4:
if I understand C and Java operator precedence and short circuit evaluation
correctly, for all three of the above as currently bracketed:

  • "y % 4 == 0" (etc) is evaluated as false;
  • "&& y % 100 != 0" (etc) is not evaluated;
  • then "|| y % 400 == 0" (etc) is evaluated as false.

But if we rebracket the return statements as, for example:

return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0);
  • "y % 4 == 0" is evaluated as false;
  • "&& (y % 100 != 0 || y % 400 == 0)" is not evaluated;

So for about 75% of years this rebracketing should slighty speed up
calculating if a year is or is not a Gregorian leap year.

Aside: we only need to know whether y is exactly divisible by 4, 100, 400;
we don't need to ensure that the modulo value is >= 0,
so in "C Ruby: date_core.c: c_gregorian_leap_p" we can use "%" instead of "MOD".
This also applies to "C Ruby: date_core.c: c_gregorian_leap_p(int y)":

return MOD(y, 4) == 0;

For example, "JRuby: RubyDate.java: isJulianLeap(final long year)" uses:

return year % 4 == 0;

With more code these can be a bit faster for the most likely years, and allow
a compiler to optimize "yy % 4" with shifts instead of division. For example:

  • C Ruby: time_c: leap_year_p(long y):

    static int
    leap_year_p(long y)
    {
      if (y % 4 == 0)
          return 0;
      /* Deal with most likely years first, avoiding division. */
      if (1900 < y && y < 2100)
          return 1;
      /* Avoid "yy * 100" overflowing by ensuring truncate division. */
      long yy = y >= 0 ? y / 100; y > -100 ? 0 : -((-(y + 100)) / 100)  - 1;
      return y != yy * 100 || yy % 4 == 0;
    }
    
  • C Ruby: date_core.c: c_gregorian_leap_p(int y):
    As just above, except instead of "long" use "int".

  • JRuby: RubyDate.java: isGregorianLeap(final long year):

    private static boolean isGregorianLeap(final long year) {
      if (y % 4 == 0)
          return false;
      /* Deal with most likely years first, avoiding division. */
      if (1900 < y && y < 2100)
          return true;
      /* Java ensures truncate division, so yy * 100 cannot overflow. */
      long yy = y / 100;
      return y != yy * 100 || yy % 4 == 0;
    }
    

Files

Associated revisions

Revision e7ea6e07
Added by nobu (Nobuyoshi Nakada) 4 days ago

Check more likely condition first [Feature #16335]

History

Updated by wishdev (John Higgins) 9 days ago

colin13rg (Colin Bartlett) wrote:

...

But if we rebracket the return statements as, for example:
return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0);

Excellent analysis.

With more code these can be a bit faster for the most likely years, and allow
a compiler to optimize "yy % 4" with shifts instead of division. For example:

  • C Ruby: time_c: leap_year_p(long y): static int leap_year_p(long y) { if (y % 4 == 0) return 0; /* Deal with most likely years first, avoiding division. / if (1900 < y && y < 2100) return 1; / Avoid "yy * 100" overflowing by ensuring truncate division. */ long yy = y >= 0 ? y / 100; y > -100 ? 0 : -((-(y + 100)) / 100) - 1; return y != yy * 100 || yy % 4 == 0; }

This is interesting because it squarely falls somewhere "notable" on the readability vs speed spectrum. The first suggestion above is clear to understand. This one not so much.

Also I'm assuming a typo here in that the first if statement obviously should be

if (y % 4 != 0)

But excellent work!

John

Updated by jeremyevans0 (Jeremy Evans) 8 days ago

Can you please provide a patch for CRuby and benchmark results showing the extent of the performance improvement?

Updated by colin13rg (Colin Bartlett) 5 days ago

Advance apology: I was expecting some sort of a reply button, so I hope using "Edit" to reply is the correct way to do things.

To: wishdev (John Higgins)

Thanks!

I like to think of that typo as an - albeit inadvertent! - code equivalent of a "Trap Street"!
https://en.wikipedia.org/wiki/Trap_street

"This is interesting because it squarely falls somewhere "notable" on the readability vs speed spectrum."

  • From the benchmarking, it seems my "bright idea" for faster calculations for the most likely years (that is 1901 to 2099) has - for these functions at least - met reality, and lost.

A reason for that is the compilers may have been substituting multiply and shift for divide by an integer constant. (If this interests you - it does me- yesterday I found this post https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/ on faster calculations of remainders for constant integer divisors without first calculating Floor(n/d) and then using r = n - floor(n/d) * d.)

As advance warning, in the not too distant future I intend to post suggestions for speeding up a few more complicated functions in C Ruby date_core.c (and maybe also in time.c), mostly by changing some floating point double calculations to instead use integers, and perhaps by also using one or two different algorithms.

To: jeremyevans0 (Jeremy Evans), and others who may be interested in this

I'm using fixed width font for all of the following to preserve some needed indenting.

# My apologies for my delay in replying. Making a C program for the benchmarks
# was quick, but writing a Ruby program to summarise the data from the
# benchmarks took rather longer than I anticipated.

# Summary of the following for C Ruby leap_year_p in time.c
# and for c_julian_leap_p and c_gregorian_leap_p in date_core.c:
#
# * using "y % 4" instead of "MOD(y, 4)" either makes it a bit faster,
#   or at worst slightly slower, and I don't believe it *really* does that.
#   So making this change seems worthwile, if only on aesthetic grounds,
#   given that it *shouldn't* make the function slower.
#
# * I believe there is also a good case for re-bracketting c_gregorian_leap_p,
#   again on aesthetic grounds (I believe it's what the code should be!)
#   and that it *shouldn't* make it slower. In fact, it doesn't seem to make
#   it slower, and does at least sometimes make it faster.
#
# * Alas, my "bright idea" for making c_gregorian_leap_p faster for the most
#     likely years (1901 to 2099) has met reality, and apparently lost!

# Current code and suggested patches for C Ruby time.c and date_core.c:

# time.c:

static int
leap_year_p(long y)
{
    return ((y % 4 == 0) && (y % 100 != 0)) || (y % 400 == 0);  //-
    return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0);        //+
    return (y % 4 == 0) && ((y % 100 != 0) || (y % 400 == 0));  //+ **
}
# ** If we do need to bracket the "equals" and "not equals",
#    either to ensure it does what we want, or on Ruby time.c styles.

# date_core.c:

inline static int
c_julian_leap_p(int y)
{
    return MOD(y, 4) == 0;  //-
    return y % 4 == 0;      //+
}

inline static int
c_gregorian_leap_p(int y)
{
    return (MOD(y, 4) == 0 && y % 100 != 0) || MOD(y, 400) == 0;  //-
    return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0);          //+
}


# I hope that's reasonably clear, but I'm happy to explain any of the above
# or below, and to try make things better if is isn't sufficient.
#
# I regret that I don't trust myself sufficiently to correctly put the patches
# directly into the appropriate Ruby development places.

# ("Ever tried. Ever failed. No matter. Try again. Fail again. Fail better."
#  From Samuel Beckett's 1983 story "Worstward Ho". Which I had better admit
#  to not having read - I just know the quote.)

# The full-ish gory details - no obligation whatsoever to read them - follow.
#
# The benchmarks are at the end of this reply.
#
# If I can, I'll attach the C code used, the Microsoft Windows and Linux shell
# scripts used, and the Ruby program used to summarise the benchmark data.
# If I can't do that I'm happy to make them available on request, maybe
# by posting them on my Github acount?
#
# If this post is too long as a "reply", then I'll post most of it either
# as an attachment or on something like Github.

# I'm more than somewhat suspicious of benchmarks, especially when made by me.
# (In fact the first time I ran the program for these, I accidentally
#  used the code for c_gregorian_leap_p_pc also in c_gregorian_leap_p_bb,
#  instead of the re-bracketted code: unsurprisingly the times
#  for the supposedly re-bracketted code were not an improvement
#  on the code as originlly bracketted in date_core.c.)
#
# Also, I raise an eyebrow or two when some of the benchmarks show, for
# example, a function being a bit faster using -O0 than using -O3.
#
# Not to mention some high variability on some of the run times on the Lenovo
# with compiler option -O0. 
#
# Anyway, that said, from benchmarking the C Ruby functions in date_core.c:
#
# * c_julian_leap_p:
#     using "y % 4" instead of "MOD(y, 4)" either makes it a bit faster,
#     or at worst slightly slower, and I don't believe it *really* does that.
#     So making this change seems worthwile, if only on aesthetic grounds,
#     *and* given that it *shouldn't* make the function slower.
#
# * c_gregorian_leap_p:
#     the argument for using "y % 4" instead of "MOD(y, 4)" (etc) is the same
#     as for c_julian_leap_p.
#
#     I also think from the timings (and my, if no-one else's, aesthetics!),
#     that there's a good case for using the suggested re-bracketting
#     in c_gregorian_leap_p_bb: it seems to materially reduce the call times
#     in all the benchmarks made.
#
#     On the evidence of these benchmarks, I somewhat reluctantly admit
#     that my suggestion of a bit more complicated code to make the most
#     likely years (1901 to 2099) faster is at best not proven. For these
#     years it is a bit faster for some compiler options, but is a bit
#     slower for other options, and this seems inconsistently so over
#     the compiler options and computers used. So a possibly "bright idea"
#     met reality, and reality seems to have won. Ah well.

# Average durations in nanoseconds for:
# * loop: average duration of one iteration, with statement: sumv += int;
# * function: average duration of a loop iteration with one function call,
#             less the average duration of one loop iteration.
#
# Loops for y in -16,000 to -1 and in 1 to 16,000 were re-iterated 30,000 times.
# The loops for y in 1901 to 2099 were re-iterated 2,412,060, making the total
# number of calls to each function about the same as for the first two loops.
#
# The overall duration times were made by using the clock() function
# at the start and end of each of those overall re-iterations.
#
# The averages were found from the overall durations of re-iterations of loops,
# using the clock() function at the start and end of each overall re-iteration.
# The averages were by dividing those durations by dividing
# by the total number of calls to a function in the re-iteration loops.
#
# The sequence was first -16,000..-1, then 1..16,000, then 1901..2099.
# Each overall loop was re-done 5 times in the benchmarking program. 
# The benchmarking program was run again about an hour later,
# giving 10 overall iterations for the average nano seconds per function call
# shown below.
#
# This was done for compiler options -O0, -O1, -O2, -O3, -Ofast.


# From C Ruby date_core.c:

/* copied from time.c */
#define NDIV(x,y) (-(-((x)+1)/(y))-1)
#define NMOD(x,y) ((y)-(-((x)+1)%(y))-1)
#define DIV(n,d) ((n)<0 ? NDIV((n),(d)) : (n)/(d))
#define MOD(n,d) ((n)<0 ? NMOD((n),(d)) : (n)%(d))


# Functions benchmarked:
#
# inline static int c_julian_leap_p(int y):     return MOD(y, 4) == 0;
# inline static int c_julian_leap_p_pc(int y):  return y % 4 == 0;
#
# inline static int c_gregorian_leap_p(int y):     return (MOD(y, 4) == 0 && y % 100 != 0) || MOD(y, 400) == 0;
# inline static int c_gregorian_leap_p_pc(int y):  return (y % 4 == 0 && y % 100 != 0) || y % 400 == 0;
# inline static int c_gregorian_leap_p_bb(int y):  return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0);
#
# inline static int c_gregorian_leap_p_fi(int y)
#    if (y % 4 != 0) return 0;
#    /* Fast way to deal with most likely years exactly divisible by 4. */
#    if (1900 < y && y < 2100) return 1;
#    /* If here then a not so likely case of y % 4 == 0, so code can be slow. */
#    /* Ensure truncate towards zero division to ensure avoiding overflows. */
#    int yy = y >= 0 ? y / 100 : y > -100 ? 0 : -((-(y + 100)) / 100) - 1;
#    return y != yy * 100 || yy % 4 == 0;
#
# static int c_gregorian_leap_p_f(int y)
#    code is the same as c_gregorian_leap_p_fi, except this isn't "inlined".


# Details of setups used for the benchmarking.
#
# * Lenovo Thinkpad T420 laptop - 64 bits
#   Windows 7 Professional 64 bits
#   MinGW: gcc (MinGW.org GCC-8.2.0-5) 8.2.0
#
# * Samsung N220 netbooK - 32 bits
#   Lubuntu running from USB memory stick
#   GNU: gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0


#    -O0    -O1    -O2    -O3   -Ofast  compiler options used:
#
# Lenovo Thinkpad T420 and MinGW gcc:
#
#    3.1    0.3    0.0    0.0    0.0   loop: 1: y -16000..-1
#                                        #  (function - loop) duration
#
#    1.6    1.8    0.7    0.6    0.6   c_julian_leap_p
#    0.4    0.5    0.8    0.7    0.7   c_julian_leap_p_pc
#
#    3.8    3.2    2.2    1.9    1.9   c_gregorian_leap_p
#    1.8    1.7    2.0    1.9    2.0   c_gregorian_leap_p_pc
#    0.8    0.7    1.1    1.0    1.0   c_gregorian_leap_p_bb
#    1.7    1.9    1.6    1.7    1.7   c_gregorian_leap_p_fi
#    1.8    1.9    1.7    1.8    1.8   c_gregorian_leap_p_f
#
#    3.1    0.3    0.0    0.0    0.0   loop: 2: y 1..16000
#
#                                        #  (function - loop) duration
#    0.8    0.6    0.7    0.6    0.6   c_julian_leap_p
#    0.4    0.5    0.8    0.7    0.7   c_julian_leap_p_pc
#
#    2.6    2.2    2.5    1.9    1.8   c_gregorian_leap_p
#    1.7    1.8    2.0    2.0    1.9   c_gregorian_leap_p_pc
#    0.7    0.7    1.2    1.0    1.0   c_gregorian_leap_p_bb
#    1.3    1.8    1.7    1.7    1.7   c_gregorian_leap_p_fi
#    1.4    1.9    1.8    1.8    1.7   c_gregorian_leap_p_f
#
#    3.1    0.4    0.0    0.0    0.0   loop: 3: y 1901..2099
#                                        #  (function - loop) duration
#
#    0.8    0.6    0.7    0.7    0.7   c_julian_leap_p
#    0.4    0.5    0.9    0.7    0.7   c_julian_leap_p_pc
#
#    2.5    2.1    2.5    1.8    1.8   c_gregorian_leap_p
#    1.7    1.6    2.0    2.0    1.9   c_gregorian_leap_p_pc
#    0.7    0.6    1.1    0.9    0.9   c_gregorian_leap_p_bb
#    0.5    1.8    0.9    0.9    0.9   c_gregorian_leap_p_fi
#    0.5    1.9    1.4    1.4    1.4   c_gregorian_leap_p_f
#
# Samsung netbook and GNU gcc:
#
#    3.0    1.2    0.0    0.0    0.0   loop: 1: y -16000..-1
#                                        #  (function - loop) duration
#
#   21.8    9.7    3.0    4.2    4.2   c_julian_leap_p
#   16.9    2.4    3.1    3.0    3.0   c_julian_leap_p_pc
#
#   37.9   19.3   13.2   13.7   13.7   c_gregorian_leap_p
#   28.2   12.0   12.8   12.8   12.8   c_gregorian_leap_p_pc
#   19.1    4.5    5.0    5.4    5.4   c_gregorian_leap_p_bb
#   23.2    6.2    6.4    6.6    6.7   c_gregorian_leap_p_fi
#   23.2    6.2    6.4    6.6    6.8   c_gregorian_leap_p_f
#
#    3.0    1.2    0.0    0.0    0.0   loop: 2: y 1..16000
#                                        #  (function - loop) duration
#
#   17.0    3.0    3.0    3.0    3.0   c_julian_leap_p
#   16.9    2.4    3.0    3.0    3.0   c_julian_leap_p_pc
#
#   32.4   12.9   14.0   13.4   13.4   c_gregorian_leap_p
#   28.3   12.0   12.8   12.8   12.8   c_gregorian_leap_p_pc
#   19.2    4.5    5.0    5.4    5.4   c_gregorian_leap_p_bb
#   22.4    4.9    5.9    6.8    6.8   c_gregorian_leap_p_fi
#   22.4    5.0    5.9    5.9    6.8   c_gregorian_leap_p_f
#
#    3.1    1.3    0.0    0.0    0.0   loop: 3: y 1901..2099
#                                        #  (function - loop) duration
#
#   16.9    3.1    3.1    3.1    3.1   c_julian_leap_p
#   16.9    2.4    3.1    3.1    3.1   c_julian_leap_p_pc
#
#   31.0   12.7   14.0   13.4   13.4   c_gregorian_leap_p
#   28.1   11.9   12.8   12.8   12.8   c_gregorian_leap_p_pc
#   19.0    4.3    5.0    5.4    5.4   c_gregorian_leap_p_bb
#   19.1    2.6    3.7    4.4    4.4   c_gregorian_leap_p_fi
#   19.1    2.6    3.7    3.7    4.4   c_gregorian_leap_p_f

# Standard deviations as percentages of averages, rounded to 1% up;
# "z" means the standard deviation was almost zero.
#
#  -O0   -O1   -O2   -O3  -Ofast  compiler options used:
#
# Lenovo Thinkpad T420 and MinGW gcc:
#
#   1%    1%    z     z     z     loop: 1: y -16000..-1
#   2%    1%    1%    1%    1%    c_julian_leap_p
#   5%    1%    1%    1%    1%    c_julian_leap_p_pc
#   1%    1%    1%    1%    1%    c_gregorian_leap_p
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_pc
#   3%    1%    1%    1%    1%    c_gregorian_leap_p_bb
#   2%    1%    1%    1%    1%    c_gregorian_leap_p_fi
#   16%   1%    1%    1%    1%    c_gregorian_leap_p_f
#
#   1%    1%    z     z     z     loop: 2: y 1..16000
#   4%    1%    1%    1%    1%    c_julian_leap_p
#   7%    1%    1%    1%    1%    c_julian_leap_p_pc
#   1%    1%    1%    1%    1%    c_gregorian_leap_p
#   2%    1%    1%    1%    1%    c_gregorian_leap_p_pc
#   4%    1%    1%    1%    1%    c_gregorian_leap_p_bb
#   2%    1%    1%    1%    1%    c_gregorian_leap_p_fi
#   2%    1%    1%    1%    1%    c_gregorian_leap_p_f
#
#   2%    1%    z     z     z     loop: 3: y 1901..2099
#   3%    1%    1%    1%    1%    c_julian_leap_p
#   6%    1%    1%    1%    1%    c_julian_leap_p_pc
#   2%    1%    1%    1%    1%    c_gregorian_leap_p
#   2%    1%    1%    1%    1%    c_gregorian_leap_p_pc
#   4%    1%    1%    1%    1%    c_gregorian_leap_p_bb
#   7%    1%    1%    1%    1%    c_gregorian_leap_p_fi
#   6%    1%    1%    1%    1%    c_gregorian_leap_p_f
#
# Samsung netbook and GNU gcc:
#   1%    1%    1%    1%    1%    loop: 1: y -16000..-1
#   1%    1%    1%    1%    1%    c_julian_leap_p
#   1%    1%    1%    1%    1%    c_julian_leap_p_pc
#   1%    1%    1%    1%    1%    c_gregorian_leap_p
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_pc
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_bb
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_fi
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_f
#
#   1%    1%    1%    1%    1%    loop: 2: y 1..16000
#   1%    1%    1%    1%    1%    c_julian_leap_p
#   1%    1%    1%    1%    1%    c_julian_leap_p_pc
#   1%    1%    1%    1%    1%    c_gregorian_leap_p
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_pc
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_bb
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_fi
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_f
#
#   1%    1%    1%    1%    1%    loop: 3: y 1901..2099
#   1%    1%    1%    1%    1%    c_julian_leap_p
#   1%    1%    1%    1%    1%    c_julian_leap_p_pc
#   1%    1%    1%    1%    1%    c_gregorian_leap_p
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_pc
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_bb
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_fi
#   1%    1%    1%    1%    1%    c_gregorian_leap_p_f

#4

Updated by nobu (Nobuyoshi Nakada) 4 days ago

  • Description updated (diff)
#5

Updated by nobu (Nobuyoshi Nakada) 4 days ago

  • Status changed from Open to Closed

Applied in changeset git|e7ea6e078fecb70fbc91b04878b69f696749afac.


Check more likely condition first [Feature #16335]

Also available in: Atom PDF