instructions that cicc cannot generate
The idea occurred to me that we also could make minus of PTX instructions from cicc and so get instructions which cicc just unable to produce. So I add to iptx.pl new option -U and got file ptx_not_in_cicc.txt with 114 unique names
Btw PTX in total has only 268 unique names - so 114 is 42.5%. So what's remarkable instructions missed:
- cctl for cache control
- lop3 - yeah, I saw them many times in SASS, so it later generated by ptxas during optimization passes
- r2p
- and all video instructions
And this lead me to conclusion that official MLIR for cuda is totally incomplete
MLIR was initially a very dubious idea IMHO - what if we have some unscrupulous HW vendor who prefers to hide many details of it's hardware? And even worse - usually you use several MLIR dialects (like gpu, nvgpu, nvvm, linalg etc), so at least one of them must be aware of all of them. And this lead to exponential explosion of complexity - you can expect items from each of used dialects while doing optimization
some instructions are totally undocumented
Nvidia is probably using them for unfair competition. Couple of most fat is _mma.warpgroup & _mma. They have lots of unique attributes not presented in other instructions. So to reduce noise during analysis I add -w option to skip them
order of attributes is not important
Surprise-surprise. Officially mma.sync has order of attributes
- aligned
- shape like m8n8k4 and friends
- row
- col
- dtype
something like
mma.sync.aligned.m8n8k4.row.col.f16.f16.f16.f16but this construction is also perfectly valid:
mma.sync.col.aligned.row.m8n8k4.f16.f16.f16.f16
This means that at least one of my hypothesis was correct - parser just collects any valid attributes without caring about their relative order
extracted data is not complete
lets check for example instruction tcgen05.fence - officially it has couple of possible values for attribute. However:
grep tcgen05.fence ptx_ops2.txt
160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 tcgen05.fence
as you can see in extracted data it has zero mask for all attributes
current status
As of today, I was able to identify 60 attributes and they cover 134 unique instructions (50%) and 734 forms from 1090 (67%)
Комментариев нет:
Отправить комментарий