In my first attempt I used latency tables extracted from MD file (located inside nvdisasm) and nothing good came out of it
Obvious reason is that real latency table should be located not in disassembler - it must be inside ptxas. But the problem with that file is that it is really huge - in SDK 13 it has size 40Mb. Sure no symbols included
This is not surprisingly bcs it contains lots of things:
- ptxas parser
- lots of macros
- optimizing compiler with 159 passes and don't use LLVM at all
- code generators for several different SMs
Besides it does not have any tracepoints and big part of string are encrypted. So it took lots of time and patience but finally I found and extracted right latency table
And then a lot of discoveries came my way