Having sass assembler it seems like easy task to make parser for it. So I made parser of nvdisasm output
Lets check some samples:
SHF.R.S32.HI R209, RZ, 0x2, R209 ;
Looks like easy application of LL(1) parser - you first select instruction, then process it's optional enums (separated by dots) and then just try to match operands separated by commas, right? Hwell, no - grammar of sass is not regular and we can have lots of quirky cases
Instruction names with '.'
It's perfectly legal to meet instructions "UIADD3" & "UIADD3.64". And they have different encodings and even not marked as ALTERNATE
Pseudo opcodes
We can observe totally non-distinguishable enum
PSEUDO_OPCODE "nopseudo_opcode"=0 , "SHL"=0 , "ISCADD"=0 , "IADD"=0 , "MOV"=0;
and samples of using:
Opcode /LOOnly("LO"):wide /PSEUDO_OPCODE("nopseudo_opcode"):pseudo_opcode
Btw operand pseudo_opcode don't even have corresponding encoding field. In essence instructions like IMAD.IADD, IMAD.MOV & IMAD.SHL have exactly the same encoding form. I don't know how nvdisasm selects PSEUDO_OPCODE - probably they borrowed hallucination generator from chatgpt
Enums can contain '.' too
Yes - enum names can be something like SR_CTAID.X, SR_CTAID.Y & SR_CTAID.Z
Operands not always separated with ','
BRX R2 -0x110 (*"INDIRECT_CALL"*)
nvidasm can't show some fields
especially batch & pm_pred. Typical instructions tail looks like:
$( { '&' REQ:req '=' BITSET(6/0x0000):req_bit_set } )$
$( { '&' RD:rd '=' UImm(3/0x7):src_rel_sb } )$
$( { '&' WR:wr '=' UImm(3/0x7):dst_wr_sb } )$
$( { '?' USCHED_INFO("DRAIN"):usched_info } )$
$( { '?' BATCH_T("NOP"):batch_t } )$
$( { '?' PM_PRED("PMN"):pm_pred } )$
and nvdisasm output contains only &wr=0x1 for WR, &rd=0x2 for RD and ?something for USCHED_INFO
Results
SM | parsing rate | avg forms |
---|---|---|
5 | 1.0 | 1.0 |
55 | 1.0 | 1.0 |
57 | 1.0 | 1.0 |
70 | 1.0 | 1.002404 |
75 | 1.0 | 1.018318 |
86 | 1.0 | 1.0 |
90 | 1.0 | 1.001589 |
100 | 1.0 | 1.016845 |
120 | 1.0 | 1.000225 |
Source of ambiguity
Lets run pa with options -Ssv to dump original text and all matched forms. We can see something like:BAR.SYNC.DEFER_BLOCKING 0x0
2 forms:
19342 @Pg.D(7) BAR .E:barmode .E:defer_blocking Sb:UImm E:Rc.D(255) req_bit_set:BITSET src_rel_sb:UImm(7) E:usched_info E:batch_t.D(0) E:pm_pred.D(0)
19286 @Pg.D(7) BAR .E:barmode .E:defer_blocking Sb:UImm ,Sc:UImm req_bit_set:BITSET src_rel_sb:UImm(7) E:usched_info E:batch_t.D(0) E:pm_pred.D(0)
The first form has additional register operand with default value 255 and second has yet another UImm operand Sc with default value 0 (UImm(12/0)*:Sc) - so they cannot be distinguished