windows deep internals: nvidia sass disassembler, part 4

пятница, 21 марта 2025 г.

nvidia sass disassembler, part 4

I've made native sass disasm - just adding c++ codegen (can be produced by ead.pl with -C option). It works via dynamic loading of right disasm module - see list of supported architectures in map s_sms. For now it supports only operands dump with -O option - not rendered yet (bcs rewriting bunch of perl code with duck-types to C++ is boring and tedious work). Also you can dump attributes with -e option. You can make those modules with something like "make sm90.so". Btw dumb gcc allocates for local vars ~600kb on stack and with -Os option it compiles each module for 10 minutes with stack consumption shrink to normal values)

Tests show zero unrecognized instructions (and I am truly proud of this), however if you will find such - I also add option -N to dump it's content to bit-mask, which you then can pass to ead.pl with the same -N option to see what happened

On the other side it seems that nvidia trying to hide something important from us - let's check libcublas.so from v12 - we can notice lots of sections

.nv.merc.nv.info - genuine nvdiasm unable to show their content
.nv.capmerc.text - however, the instructions they contain are clearly in some other format and cannot be disassembled - I add -s option to disasm single section by it's index, so you can try it by yourself
and they obviously has corresponding relocs in sections .nv.merc.rela.text
and even .nv.merc.rela.debug_frame & .nv.merc.symtab

Known problems

Duplicates

As you may notice many instructions has several matches for the same input - for example LDC. Lets check why - first form described as

[NonZeroRegister:Ra + SImm(17/0)*:Ra_offset]

and second as

[ZeroRegister("RZ"):Ra + SImm(17/0)*:Ra_offset]

The problem here is that enum NonZeroRegister contains value 255 for RZ too, so they are totally indistinguishable - for first mask for Ra will be xxxxxxxx. For second form will be generated mask 11111111 for Ra field. As you can guess input will match both and ead.pl shows exactly the same decoding:

LDC R1,C[0x0][0x28]

Branches

Theoretically instructions with branches has so called PROPERTIES, like bra:

...RSImm(58)*:sImm

PROPERTIES
 INSTRUCTION_TYPE = INST_TYPE_DECOUPLED_BRU_DEPBAR_RD_SCBD;
 BRANCH_TARGET_INDEX = INDEX(sImm) ;
 BRANCH_TYPE = BRT_BRANCH ;

it would seem - what's the problem? Lets just collect all instructions having BRANCH_TYPE and extract field in BRANCH_TARGET_INDEX. Sure, like in brx:

FORMAT PREDICATE @[!]Predicate(PT):Pg Opcode /DEPTH("nodepth"):depth
 [!]Predicate("PT"):Pp
','Register:Ra SImm(58/0)*:Ra_offset

PROPERTIES
 INSTRUCTION_TYPE = INST_TYPE_DECOUPLED_BRU_DEPBAR_RD_SCBD;
 BRANCH_TARGET_INDEX = INDEX(Ra) ;
 BRANCH_TYPE = BRT_BRANCH ;

Here branch index points to register Ra field. And yet another sample for call:

FORMAT PREDICATE @[!]Predicate(PT):Pg Opcode /ABSONLY:abs /CALL_DEPTH("INC"):depth
 [!]Predicate("PT"):Pp
','C:Sa[UImm(5/0*):Sa_bank]*   [SImm(17)*:Sa_addr]

PROPERTIES
 INSTRUCTION_TYPE = INST_TYPE_DECOUPLED_BRU_DEPBAR_RD_SCBD;
 BRANCH_TARGET_INDEX = INDEX(Sa) ;
 BRANCH_TYPE = BRT_CALL ;

Here branch index points to constant bank (and btw you now need to store it's name somewhere). Wonderful

Other quirks

still the same long-suffering instruction BRA described in sm55 as:

FORMAT PREDICATE @[!]Predicate(PT):Pg Opcode /U(noU):uniform /LMT(noLMT):lmt
          { CC(CC):TestCC/Test(T):CCTest }

ENCODING
            Opcode12 = Opcode;
            Pred = Pg;
            PredNot = Pg@not;
            CCC_1 = Test;
            CA = 0;
            Imm24 = sImm;
            U = U;
            LMT = LMT;
            !NencBRA;
            !RegA;

There is no encoding for field TestCC. And for CCTest too. Instead we have 5bit field just Test and enum Test has all 32 values, and T = 0xf. On other side CC is 1 bit enum with only value CC=1. I have zero ideas which bit from Test should be used for CC

And sure there are too many others

windows deep internals

пятница, 21 марта 2025 г.

nvidia sass disassembler, part 4

Known problems

Duplicates

Branches

Other quirks

Комментариев нет:

Отправить комментарий

пятница, 21 марта 2025 г.

nvidia sass disassembler, part 4

Known problems

Duplicates

Branches

Other quirks

Комментариев нет:

Отправить комментарий

пятница, 21 марта 2025 г.