In these difficult times, no one wants to report bad or simply weak results (and this will destroy this hypocritical civilization). Since this is my personal blog and I am not looking for grants, I don't care.
Let's dissect one truly inspiring paper - they employed reinforcement learning and claim that
transparently producing 2% to 26% speedup
wow, 26% is really excellent result. So I decided to implement proposed technique, but first I need get source of latency values for each SASS instruction. I extracted files with latency tables from nvdisasm - their names have _2.txt suffixes
Then I made perl binding for my perl version of Ced (see methods for object Cubin::Ced::LatIndex), add new pass (-l option for dg.pl) and done some experiments to dump latency values for each instruction. Because connections order is unknown I implemented all 3:
- current column with current row
- column from previous instruction with current row
- current column with row from previous instruction
The results are discouraging
- some instructions (~1.5% for best case 1) does not have latency at all (for example S2R or XXXBAR)
- some instructions have more than 1 index to the same table - well, I fixed this with selecting max value (see function intersect_lat)
- while comparing with actual stall count the percentage of incorrect values above 60 - it's even worse than just coin flipping
Some possible reasons for failure:
