instructions that cicc cannot generate
The idea occurred to me that we also could make minus of PTX instructions from cicc and so get instructions which cicc just unable to produce. So I add to iptx.pl new option -U and got file ptx_not_in_cicc.txt with 114 unique names
Btw PTX in total has only 268 unique names - so 114 is 42.5%. So what's remarkable instructions missed:
- cctl for cache control
- lop3 - yeah, I saw them many times in SASS, so it later generated by ptxas during optimization passes
- r2p
- and all video instructions
And this lead me to conclusion that official MLIR for cuda is totally incomplete
MLIR was initially a very dubious idea IMHO - what if we have some unscrupulous HW vendor who prefers to hide many details of it's hardware? And even worse - usually you use several MLIR dialects (like gpu, nvgpu, nvvm, linalg etc), so at least one of them must be aware of all of them. And this lead to exponential explosion of complexity - you can expect items from each of used dialects while doing optimization