Advance apology: I was expecting some sort of a reply button, so I hope using "Edit" to reply is the correct way to do things.
To: wishdev (John Higgins)
Thanks!
I like to think of that typo as an - albeit inadvertent! - code equivalent of a "Trap Street"!
https://en.wikipedia.org/wiki/Trap_street
"This is interesting because it squarely falls somewhere "notable" on the readability vs speed spectrum."
- From the benchmarking, it seems my "bright idea" for faster calculations for the most likely years (that is 1901 to 2099) has - for these functions at least - met reality, and lost.
A reason for that is the compilers may have been substituting multiply and shift for divide by an integer constant. (If this interests you - it does me- yesterday I found this post https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/ on faster calculations of remainders for constant integer divisors without first calculating Floor(n/d) and then using r = n - floor(n/d) * d.)
As advance warning, in the not too distant future I intend to post suggestions for speeding up a few more complicated functions in C Ruby date_core.c (and maybe also in time.c), mostly by changing some floating point double calculations to instead use integers, and perhaps by also using one or two different algorithms.
To: jeremyevans0 (Jeremy Evans), and others who may be interested in this
I'm using fixed width font for all of the following to preserve some needed indenting.
# My apologies for my delay in replying. Making a C program for the benchmarks
# was quick, but writing a Ruby program to summarise the data from the
# benchmarks took rather longer than I anticipated.
# Summary of the following for C Ruby leap_year_p in time.c
# and for c_julian_leap_p and c_gregorian_leap_p in date_core.c:
#
# * using "y % 4" instead of "MOD(y, 4)" either makes it a bit faster,
# or at worst slightly slower, and I don't believe it *really* does that.
# So making this change seems worthwile, if only on aesthetic grounds,
# given that it *shouldn't* make the function slower.
#
# * I believe there is also a good case for re-bracketting c_gregorian_leap_p,
# again on aesthetic grounds (I believe it's what the code should be!)
# and that it *shouldn't* make it slower. In fact, it doesn't seem to make
# it slower, and does at least sometimes make it faster.
#
# * Alas, my "bright idea" for making c_gregorian_leap_p faster for the most
# likely years (1901 to 2099) has met reality, and apparently lost!
# Current code and suggested patches for C Ruby time.c and date_core.c:
# time.c:
static int
leap_year_p(long y)
{
return ((y % 4 == 0) && (y % 100 != 0)) || (y % 400 == 0); //-
return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0); //+
return (y % 4 == 0) && ((y % 100 != 0) || (y % 400 == 0)); //+ **
}
# ** If we do need to bracket the "equals" and "not equals",
# either to ensure it does what we want, or on Ruby time.c styles.
# date_core.c:
inline static int
c_julian_leap_p(int y)
{
return MOD(y, 4) == 0; //-
return y % 4 == 0; //+
}
inline static int
c_gregorian_leap_p(int y)
{
return (MOD(y, 4) == 0 && y % 100 != 0) || MOD(y, 400) == 0; //-
return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0); //+
}
# I hope that's reasonably clear, but I'm happy to explain any of the above
# or below, and to try make things better if is isn't sufficient.
#
# I regret that I don't trust myself sufficiently to correctly put the patches
# directly into the appropriate Ruby development places.
# ("Ever tried. Ever failed. No matter. Try again. Fail again. Fail better."
# From Samuel Beckett's 1983 story "Worstward Ho". Which I had better admit
# to not having read - I just know the quote.)
# The full-ish gory details - no obligation whatsoever to read them - follow.
#
# The benchmarks are at the end of this reply.
#
# If I can, I'll attach the C code used, the Microsoft Windows and Linux shell
# scripts used, and the Ruby program used to summarise the benchmark data.
# If I can't do that I'm happy to make them available on request, maybe
# by posting them on my Github acount?
#
# If this post is too long as a "reply", then I'll post most of it either
# as an attachment or on something like Github.
# I'm more than somewhat suspicious of benchmarks, especially when made by me.
# (In fact the first time I ran the program for these, I accidentally
# used the code for c_gregorian_leap_p_pc also in c_gregorian_leap_p_bb,
# instead of the re-bracketted code: unsurprisingly the times
# for the supposedly re-bracketted code were not an improvement
# on the code as originlly bracketted in date_core.c.)
#
# Also, I raise an eyebrow or two when some of the benchmarks show, for
# example, a function being a bit faster using -O0 than using -O3.
#
# Not to mention some high variability on some of the run times on the Lenovo
# with compiler option -O0.
#
# Anyway, that said, from benchmarking the C Ruby functions in date_core.c:
#
# * c_julian_leap_p:
# using "y % 4" instead of "MOD(y, 4)" either makes it a bit faster,
# or at worst slightly slower, and I don't believe it *really* does that.
# So making this change seems worthwile, if only on aesthetic grounds,
# *and* given that it *shouldn't* make the function slower.
#
# * c_gregorian_leap_p:
# the argument for using "y % 4" instead of "MOD(y, 4)" (etc) is the same
# as for c_julian_leap_p.
#
# I also think from the timings (and my, if no-one else's, aesthetics!),
# that there's a good case for using the suggested re-bracketting
# in c_gregorian_leap_p_bb: it seems to materially reduce the call times
# in all the benchmarks made.
#
# On the evidence of these benchmarks, I somewhat reluctantly admit
# that my suggestion of a bit more complicated code to make the most
# likely years (1901 to 2099) faster is at best not proven. For these
# years it is a bit faster for some compiler options, but is a bit
# slower for other options, and this seems inconsistently so over
# the compiler options and computers used. So a possibly "bright idea"
# met reality, and reality seems to have won. Ah well.
# Average durations in nanoseconds for:
# * loop: average duration of one iteration, with statement: sumv += int;
# * function: average duration of a loop iteration with one function call,
# less the average duration of one loop iteration.
#
# Loops for y in -16,000 to -1 and in 1 to 16,000 were re-iterated 30,000 times.
# The loops for y in 1901 to 2099 were re-iterated 2,412,060, making the total
# number of calls to each function about the same as for the first two loops.
#
# The overall duration times were made by using the clock() function
# at the start and end of each of those overall re-iterations.
#
# The averages were found from the overall durations of re-iterations of loops,
# using the clock() function at the start and end of each overall re-iteration.
# The averages were by dividing those durations by dividing
# by the total number of calls to a function in the re-iteration loops.
#
# The sequence was first -16,000..-1, then 1..16,000, then 1901..2099.
# Each overall loop was re-done 5 times in the benchmarking program.
# The benchmarking program was run again about an hour later,
# giving 10 overall iterations for the average nano seconds per function call
# shown below.
#
# This was done for compiler options -O0, -O1, -O2, -O3, -Ofast.
# From C Ruby date_core.c:
/* copied from time.c */
#define NDIV(x,y) (-(-((x)+1)/(y))-1)
#define NMOD(x,y) ((y)-(-((x)+1)%(y))-1)
#define DIV(n,d) ((n)<0 ? NDIV((n),(d)) : (n)/(d))
#define MOD(n,d) ((n)<0 ? NMOD((n),(d)) : (n)%(d))
# Functions benchmarked:
#
# inline static int c_julian_leap_p(int y): return MOD(y, 4) == 0;
# inline static int c_julian_leap_p_pc(int y): return y % 4 == 0;
#
# inline static int c_gregorian_leap_p(int y): return (MOD(y, 4) == 0 && y % 100 != 0) || MOD(y, 400) == 0;
# inline static int c_gregorian_leap_p_pc(int y): return (y % 4 == 0 && y % 100 != 0) || y % 400 == 0;
# inline static int c_gregorian_leap_p_bb(int y): return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0);
#
# inline static int c_gregorian_leap_p_fi(int y)
# if (y % 4 != 0) return 0;
# /* Fast way to deal with most likely years exactly divisible by 4. */
# if (1900 < y && y < 2100) return 1;
# /* If here then a not so likely case of y % 4 == 0, so code can be slow. */
# /* Ensure truncate towards zero division to ensure avoiding overflows. */
# int yy = y >= 0 ? y / 100 : y > -100 ? 0 : -((-(y + 100)) / 100) - 1;
# return y != yy * 100 || yy % 4 == 0;
#
# static int c_gregorian_leap_p_f(int y)
# code is the same as c_gregorian_leap_p_fi, except this isn't "inlined".
# Details of setups used for the benchmarking.
#
# * Lenovo Thinkpad T420 laptop - 64 bits
# Windows 7 Professional 64 bits
# MinGW: gcc (MinGW.org GCC-8.2.0-5) 8.2.0
#
# * Samsung N220 netbooK - 32 bits
# Lubuntu running from USB memory stick
# GNU: gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
# -O0 -O1 -O2 -O3 -Ofast compiler options used:
#
# Lenovo Thinkpad T420 and MinGW gcc:
#
# 3.1 0.3 0.0 0.0 0.0 loop: 1: y -16000..-1
# # (function - loop) duration
#
# 1.6 1.8 0.7 0.6 0.6 c_julian_leap_p
# 0.4 0.5 0.8 0.7 0.7 c_julian_leap_p_pc
#
# 3.8 3.2 2.2 1.9 1.9 c_gregorian_leap_p
# 1.8 1.7 2.0 1.9 2.0 c_gregorian_leap_p_pc
# 0.8 0.7 1.1 1.0 1.0 c_gregorian_leap_p_bb
# 1.7 1.9 1.6 1.7 1.7 c_gregorian_leap_p_fi
# 1.8 1.9 1.7 1.8 1.8 c_gregorian_leap_p_f
#
# 3.1 0.3 0.0 0.0 0.0 loop: 2: y 1..16000
#
# # (function - loop) duration
# 0.8 0.6 0.7 0.6 0.6 c_julian_leap_p
# 0.4 0.5 0.8 0.7 0.7 c_julian_leap_p_pc
#
# 2.6 2.2 2.5 1.9 1.8 c_gregorian_leap_p
# 1.7 1.8 2.0 2.0 1.9 c_gregorian_leap_p_pc
# 0.7 0.7 1.2 1.0 1.0 c_gregorian_leap_p_bb
# 1.3 1.8 1.7 1.7 1.7 c_gregorian_leap_p_fi
# 1.4 1.9 1.8 1.8 1.7 c_gregorian_leap_p_f
#
# 3.1 0.4 0.0 0.0 0.0 loop: 3: y 1901..2099
# # (function - loop) duration
#
# 0.8 0.6 0.7 0.7 0.7 c_julian_leap_p
# 0.4 0.5 0.9 0.7 0.7 c_julian_leap_p_pc
#
# 2.5 2.1 2.5 1.8 1.8 c_gregorian_leap_p
# 1.7 1.6 2.0 2.0 1.9 c_gregorian_leap_p_pc
# 0.7 0.6 1.1 0.9 0.9 c_gregorian_leap_p_bb
# 0.5 1.8 0.9 0.9 0.9 c_gregorian_leap_p_fi
# 0.5 1.9 1.4 1.4 1.4 c_gregorian_leap_p_f
#
# Samsung netbook and GNU gcc:
#
# 3.0 1.2 0.0 0.0 0.0 loop: 1: y -16000..-1
# # (function - loop) duration
#
# 21.8 9.7 3.0 4.2 4.2 c_julian_leap_p
# 16.9 2.4 3.1 3.0 3.0 c_julian_leap_p_pc
#
# 37.9 19.3 13.2 13.7 13.7 c_gregorian_leap_p
# 28.2 12.0 12.8 12.8 12.8 c_gregorian_leap_p_pc
# 19.1 4.5 5.0 5.4 5.4 c_gregorian_leap_p_bb
# 23.2 6.2 6.4 6.6 6.7 c_gregorian_leap_p_fi
# 23.2 6.2 6.4 6.6 6.8 c_gregorian_leap_p_f
#
# 3.0 1.2 0.0 0.0 0.0 loop: 2: y 1..16000
# # (function - loop) duration
#
# 17.0 3.0 3.0 3.0 3.0 c_julian_leap_p
# 16.9 2.4 3.0 3.0 3.0 c_julian_leap_p_pc
#
# 32.4 12.9 14.0 13.4 13.4 c_gregorian_leap_p
# 28.3 12.0 12.8 12.8 12.8 c_gregorian_leap_p_pc
# 19.2 4.5 5.0 5.4 5.4 c_gregorian_leap_p_bb
# 22.4 4.9 5.9 6.8 6.8 c_gregorian_leap_p_fi
# 22.4 5.0 5.9 5.9 6.8 c_gregorian_leap_p_f
#
# 3.1 1.3 0.0 0.0 0.0 loop: 3: y 1901..2099
# # (function - loop) duration
#
# 16.9 3.1 3.1 3.1 3.1 c_julian_leap_p
# 16.9 2.4 3.1 3.1 3.1 c_julian_leap_p_pc
#
# 31.0 12.7 14.0 13.4 13.4 c_gregorian_leap_p
# 28.1 11.9 12.8 12.8 12.8 c_gregorian_leap_p_pc
# 19.0 4.3 5.0 5.4 5.4 c_gregorian_leap_p_bb
# 19.1 2.6 3.7 4.4 4.4 c_gregorian_leap_p_fi
# 19.1 2.6 3.7 3.7 4.4 c_gregorian_leap_p_f
# Standard deviations as percentages of averages, rounded to 1% up;
# "z" means the standard deviation was almost zero.
#
# -O0 -O1 -O2 -O3 -Ofast compiler options used:
#
# Lenovo Thinkpad T420 and MinGW gcc:
#
# 1% 1% z z z loop: 1: y -16000..-1
# 2% 1% 1% 1% 1% c_julian_leap_p
# 5% 1% 1% 1% 1% c_julian_leap_p_pc
# 1% 1% 1% 1% 1% c_gregorian_leap_p
# 1% 1% 1% 1% 1% c_gregorian_leap_p_pc
# 3% 1% 1% 1% 1% c_gregorian_leap_p_bb
# 2% 1% 1% 1% 1% c_gregorian_leap_p_fi
# 16% 1% 1% 1% 1% c_gregorian_leap_p_f
#
# 1% 1% z z z loop: 2: y 1..16000
# 4% 1% 1% 1% 1% c_julian_leap_p
# 7% 1% 1% 1% 1% c_julian_leap_p_pc
# 1% 1% 1% 1% 1% c_gregorian_leap_p
# 2% 1% 1% 1% 1% c_gregorian_leap_p_pc
# 4% 1% 1% 1% 1% c_gregorian_leap_p_bb
# 2% 1% 1% 1% 1% c_gregorian_leap_p_fi
# 2% 1% 1% 1% 1% c_gregorian_leap_p_f
#
# 2% 1% z z z loop: 3: y 1901..2099
# 3% 1% 1% 1% 1% c_julian_leap_p
# 6% 1% 1% 1% 1% c_julian_leap_p_pc
# 2% 1% 1% 1% 1% c_gregorian_leap_p
# 2% 1% 1% 1% 1% c_gregorian_leap_p_pc
# 4% 1% 1% 1% 1% c_gregorian_leap_p_bb
# 7% 1% 1% 1% 1% c_gregorian_leap_p_fi
# 6% 1% 1% 1% 1% c_gregorian_leap_p_f
#
# Samsung netbook and GNU gcc:
# 1% 1% 1% 1% 1% loop: 1: y -16000..-1
# 1% 1% 1% 1% 1% c_julian_leap_p
# 1% 1% 1% 1% 1% c_julian_leap_p_pc
# 1% 1% 1% 1% 1% c_gregorian_leap_p
# 1% 1% 1% 1% 1% c_gregorian_leap_p_pc
# 1% 1% 1% 1% 1% c_gregorian_leap_p_bb
# 1% 1% 1% 1% 1% c_gregorian_leap_p_fi
# 1% 1% 1% 1% 1% c_gregorian_leap_p_f
#
# 1% 1% 1% 1% 1% loop: 2: y 1..16000
# 1% 1% 1% 1% 1% c_julian_leap_p
# 1% 1% 1% 1% 1% c_julian_leap_p_pc
# 1% 1% 1% 1% 1% c_gregorian_leap_p
# 1% 1% 1% 1% 1% c_gregorian_leap_p_pc
# 1% 1% 1% 1% 1% c_gregorian_leap_p_bb
# 1% 1% 1% 1% 1% c_gregorian_leap_p_fi
# 1% 1% 1% 1% 1% c_gregorian_leap_p_f
#
# 1% 1% 1% 1% 1% loop: 3: y 1901..2099
# 1% 1% 1% 1% 1% c_julian_leap_p
# 1% 1% 1% 1% 1% c_julian_leap_p_pc
# 1% 1% 1% 1% 1% c_gregorian_leap_p
# 1% 1% 1% 1% 1% c_gregorian_leap_p_pc
# 1% 1% 1% 1% 1% c_gregorian_leap_p_bb
# 1% 1% 1% 1% 1% c_gregorian_leap_p_fi
# 1% 1% 1% 1% 1% c_gregorian_leap_p_f