Misc #10278
Updated by nobu (Nobuyoshi Nakada) about 10 years ago
Mainly posting this for documentation purposes because it seems like an obvious thing to try given we have ccan/list nowadays. Having shorter code along branchless insert/delete, and using a common linked-list API is very appealing. On the other hand, benchmark results are a mixed bag: http://80x24.org/bmlog-20140922-032221.13002 Also, I may have introduced new bugs the tests didn't catch. The st_foreach* functions get a bit strange when dealing with packed-to-unpacked transitions while iterating. Great thing: bighash is faster (as expected) because of branchless linked list insertion. However, the major speedup in bighash probably isn't too important, most hashes are small and users never notice. vm2_bighash* 1.222 Also, we could introduce rb_hash_new_with_size() for use insns.def (newhash) if people really care about the static bighash case (I don't think many do). Real regressions, iteration seems more complex because loop conditions are more complex :< hash_keys 0.978 hash_values 0.941 However, hash_keys/values regressions are pretty small. Things that worry me: vm1_attr_ivar* 0.736 vm1_attr_ivar_set* 0.845 WTF? I reran the attr_ivar tests, and the numbers got slightly better: ~~~ ["vm1_attr_ivar", [[1.851297842, 1.549076322, 1.623306027, 1.956916541, 1.533218607, 1.554089054, 1.702590516, 1.789863782, 1.711815018, 1.851260599], [1.825423191, 1.824934062, 1.542471471, 1.868502091, 1.79106375, 1.884568825, 1.850712387, 1.797538962, 2.165696827, 1.866671482]]], ["vm1_attr_ivar_set", [[1.926496052, 2.04742421, 2.025571131, 2.047656291, 2.043747069, 2.099586827, 1.953769267, 2.017580504, 2.440432603, 2.111254634], [2.365839125, 2.076282818, 2.112784977, 2.118754445, 2.091752673, 2.161164561, 2.107439445, 2.128147747, 2.945295069, 2.131679632]]]] Elapsed time: 91.963235593 (sec) ----------------------------------------------------------- benchmark results: minimum results in each 10 measurements. Execution time (sec) name orig stll loop_whileloop 0.672 0.670 vm1_attr_ivar* 0.861 0.872 vm1_attr_ivar_set* 1.255 1.406 Speedup ratio: compare with the result of `orig' (greater is better) name stll loop_whileloop 1.002 vm1_attr_ivar* 0.987 vm1_attr_ivar_set* 0.892 ~~~ Note: these tests do not even hit st, and even if they did, these are tiny tables which are packed so the linked-list implementation has no impact (especially not on lookup tests) So yeah, probably something messy with the CPU caches. I always benchmark with the performance CPU governor, and the rerun ivar numbers are run with CPU pinned to a single core. CPU: AMD FX-8320 Maybe I can access my other systems later.