windows deep internals: nvidia sass disassembler, part 5

среда, 26 марта 2025 г.

nvidia sass disassembler, part 5

Previous parts: 1, 2, 3 & 4

I've finally add native rendering for instructions - actually just rewrite from perl terrible function make_inst. Because in output typically rendering only small fraction of instructions data for formats are filling by demand via std::call_once. Results to compare with genuine nvdisasm:

mine	nvdisasm
LDC R1,c:[0][0x37C] LDCU.64 UR8,c:[0][URZ+0x440] LDC R16,c:[0][0x3B8] LDCU.64 UR12,c:[0][URZ+0x448] LDCU UR4,c:[0][URZ+0x3AC] LDC._64 R4,c:[0][0x450] LDCU.64 UR14,c:[0][URZ+0x380] LDCU.64 UR10,c:[0][URZ+0x358] HFMA2 R13,-RZ,RZ, 1.875000, 0.000000 ISETP.NE.S64.AND P2,PT,RZ,UR8,PT	LDC R1, c[0x0][0x37c] LDCU.64 UR8, c[0x0][0x440] LDC R16, c[0x0][0x3b8] LDCU.64 UR12, c[0x0][0x448] LDCU UR4, c[0x0][0x3ac] LDC.64 R4, c[0x0][0x450] LDCU.64 UR14, c[0x0][0x380] LDCU.64 UR10, c[0x0][0x358] HFMA2 R13, -RZ, RZ, 1.875, 0 ISETP.NE.S64.AND P2, PT, RZ, UR8, PT

IMHO very similar, has some minor problems with formatting of floating point values (I used FP16 to extract 16bit values but don't know what means E8M7Imm in format descriptor)

So the next thing to show is

labels for branches

As I mentioned you can identify instruction as branches via it's PROPERTIES, get value in BRANCH_TARGET_INDEX and render it as label address. There are two problems:

size of branch offset vary in size - it can be 58bit for sm_90, 50bit for sm_75, 24 for sm_3 and so on
branch offset is signed value, so we need some method to detect that some value of known bit size is negative

To solve first problem I've add new hashmap vwidth to instruction descriptor with bit widths for each numerical operands, key is name of operand

Abstract note: you could add reference to mask instead of just width and so have theoretically opportunity to make in-place patching of instructions operands

Second problem can be solved with some knowledge from basic school: suppose that we have integer value with width only 4 bit, so 1 is +1 but 0xf is negative integer with value -1. To check sign we must check 3rd bit, and to get negative value we should subtract 0x10. In general for value with width M bits:

to check sign: value & (1 << (M - 1))

and to get negative value: value - (1 << M)

Ok, so far so good, we have some labels (I am too lazy to make 2 passes for collecting all labels, so some labels are missed in output. After all, this code is just a test case)

Except BSSY instruction:

FORMAT PREDICATE @[!]Predicate(PT):Pg Opcode
 [!]Predicate("PT"):Pp
','BD:barReg
','RSImm(32)*:Sa

PROPERTIES
 INSTRUCTION_TYPE = INST_TYPE_DECOUPLED_BRU_DEPBAR_RD_SCBD;
 MEM_SCBD = NONE ;
 MEM_SCBD_TYPE = BARRIER_INST ;
 MIN_WAIT_NEEDED = 1 ;
 VALID_IN_SHADERS = ISHADER_ALL ;
 PRED_INDEX = INDEX(Pp) ;

As you can see it isn't marked as BRANCH in properties. Instead it has operand Sa with type RSImm, so we should check this too

And then we also have so called

Indirect Branches

like in following excerpt from nvdisasm output:

@!P4 SYNC (*"BRANCH_TARGETS .L_x_5"*)
Wait a minute, it has following description:

FORMAT PREDICATE @[!]Predicate(PT):Pg Opcode
 [!]Predicate("PT"):Pp
','BD:barReg
$( { '&' REQ:req '=' BITSET(6/0x0000):req_bit_set } )$
$( { '?' USCHED_INFO("DRAIN"):usched_info } )$
$( { '?' BATCH_T("NOP"):batch_t } )$
$( { '?' PM_PRED("PMN"):pm_pred } )$ ;

PROPERTIES
 INSTRUCTION_TYPE = INST_TYPE_DECOUPLED_BRU_DEPBAR_RD_SCBD;
 MEM_SCBD = NONE ;
 MEM_SCBD_TYPE = BARRIER_INST ;
 MIN_WAIT_NEEDED = 1 ;
 VALID_IN_SHADERS = ISHADER_ALL ;
 PRED_INDEX = INDEX(Pp) ;

Do you see some operand which can hold value of BRANCH_TARGETS? I can't

So after some meditation I found that args for SYNC are stored in .nv.info section inside EIATTR_INDIRECT_BRANCH_TARGETS

Abstract note 2: this make addition support of sass in IDA Pro very hard, bcs some information about instructions are stored in different sections (which may not even be loaded)

I'm convinced that this is far from the only one - EIATTR_JUMPTABLE_RELOCS & EIATTR_SYSCALL_OFFSETS also looks very promising. Unfortunately I didn't catch them while testing, so if you have some .cubin containing those attributes - pls share it

windows deep internals

среда, 26 марта 2025 г.

nvidia sass disassembler, part 5

labels for branches

Indirect Branches

Комментариев нет:

Отправить комментарий

среда, 26 марта 2025 г.

nvidia sass disassembler, part 5

labels for branches

Indirect Branches

Комментариев нет:

Отправить комментарий

среда, 26 марта 2025 г.