I've add some support of DWARF debug info from nvidia nvcc to my dwarfdump. As everyone knows dwarf is over-complicated, fat and just disgusting - however, nvidia was able to take his nausea to a new level
четверг, 26 марта 2026 г.
dwarf from nvcc
среда, 18 марта 2026 г.
read a couple of books about compilers
LLVM Compiler for RISC-V Architecture
- there is no introduction about LLVM IR/risc-v specific IR, so long IR listings are very hard to follow
- author don't give link to source code implementing some algo. Fortunately elixir indexed whole LLVM source tree
Dive into Deep Learning Compiler
As far as I know, this is the only book describing AI/ML compilers so far. Also TVM looks very promising - unlike monsters like XLA/iree it is compact and observable for mere mortals
Drawbacks:
- book is not completed - last two chapter about NN & deployment are just "place holder"
- it's unclear why for matrix multiplication on CUDA they didn't get cublas as base case
- and openblas for cpu version
Despite this, considering that the book is freely downloadable, my rating is 4 out of 5
пятница, 6 марта 2026 г.
SASS latency table: second try
In my first attempt I used latency tables extracted from MD file (located inside nvdisasm) and nothing good came out of it
Obvious reason is that real latency table should be located not in disassembler - it must be inside ptxas. But the problem with that file is that it is really huge - in SDK 13 it has size 40Mb. Sure no symbols included
This is not surprisingly bcs it contains lots of things:
- ptxas parser
- lots of macros
- optimizing compiler with 159 passes and don't use LLVM at all
- code generators for several different SMs
Besides it does not have any tracepoints and big part of string are encrypted. So it took lots of time and patience but finally I found and extracted right latency table
And then a lot of discoveries came my way