понедельник, 10 марта 2025 г.

nvidia sass disassembler, part 2

Lets continue explore "machine descriptions" - in this time try to understand how to make format output more similar to genuine nvdisasm

For example format for one of variant I2F looks like: 

FORMAT PREDICATE @[!]Predicate(PT):Pg Opcode /Float64:dstfmt /SRCFMT_U16_S16:srcfmt /Round1("RN"):rnd
Register:Rd
','C:Sb[UImm(5/0*):Sb_bank]*   [SImm(17)*:Sb_addr] /HSEL("H0"):hsel
$( { '&' REQ:req '=' BITSET(6/0x0000):req_bit_set } )$
$( { '&' RD:rd '=' UImm(3/0x7):src_rel_sb } )$
$( { '&' WR:wr '=' UImm(3/0x7):dst_wr_sb } )$
$( { '?' USCHED_INFO("DRAIN"):usched_info } )$
$( { '?' BATCH_T("NOP"):batch_t } )$
$( { '?' PM_PRED("PMN"):pm_pred } )$ ;

...

ENCODING
!i2f_Rd64__Cb_16b_unused;
BITS_3_14_12_Pg = Pg;
BITS_1_15_15_Pg_not = Pg@not;
BITS_13_91_91_11_0_opcode=Opcode;
BITS_3_77_75_sz=*dstfmt;
BITS_3_85_84_74_74_srcfmt=*srcfmt;
BITS_2_79_78_stride=rnd;
BITS_8_23_16_Rd=Rd;

BITS_5_58_54_Sb_bank,BITS_14_53_40_Sb_offset =  ConstBankAddress2(Sb_bank,Sb_addr);
BITS_2_61_60_hsel=hsel;
BITS_6_121_116_req_bit_set=req_bit_set;
BITS_3_115_113_src_rel_sb=VarLatOperandEnc(src_rel_sb);
BITS_3_112_110_dst_wr_sb=VarLatOperandEnc(dst_wr_sb);
BITS_2_103_102_pm_pred=pm_pred;
BITS_8_124_122_109_105_opex=TABLES_opex_0(batch_t,usched_info);

Field srcfmt can be extracted from mask BITS_3_85_84_74_74_srcfmt and is enum SRCFMT_U16_S16

SRCFMT_U16_S16 "U16"=2 , "S16"=3;
Initial slash '/' actually means that in output format should be placed dot '.'
 
For enums described as having default value like Round1 they will be placed (again with dot) only if value of rnd field differs from enum value "RN". The next field Rd is also enum "Register" but this time without '/', so it should be placed in output as is. To illustrate: if we have value of srcfmt 2, rnd 0 (equal to "RN") and Rd 1 then output should be F2I.U16 R1
 
If some fields are not enums then they should present some immediate value - for them there are types UImm, SImm and so on - check function is_type
 
As you can see syntax for format is enough complex and right solution would be some grammar parser like RecDescent but being lazy I just used several regexps (see details in function cons_ae - they are twitter compatible - the longest of them is about 130 characters)
 
However as usually there are several problems

Undefined enums

namely: REQ, RD & WR. They described as
 REQ req*;
 RD rd*;
 WR_EARLY wr_early*;
 WR wr*;
and as you can notice there are no fields req, rd or wr in masks. I don't know what they are

IDENTICAL

Some values actually referencing to the same mask and this implemented as pseudo-table
BITS_8_39_32_Rb=IDENTICAL(Rb,Rc)
Then format description can use both field Rb & Rc the value for which will be extracted from the mask BITS_8_39_32_Rb

 

Placeholders

Did you noticed strange looking expression [!] before Predicate:Pg? Actually this is place to put value of Pg@not. The others are
  • [-] for V@negate
  • [~] for V@invert
  • [||] for V@absolute

 

Const bank addresses

In the example above it was described as
','C:Sb[UImm(5/0*):Sb_bank]*   [SImm(17)*:Sb_addr]

in this case it could be printed as c[Sb_bank][Sb_addr], but it's not that simple. As you can see values for it was taken from ConstBankAddress2 and right output must be c[Sb_bank][Sb_addr * 4]

I don't know if Sb_bank also should be multiplied by 4. Also there is another form:

C:srcConst[UImm(5/0*):constBank]* [ZeroRegister(RZ):Ra + SImm(17)*:immConstOffset]
As you can see it should be printed like c[constBank][Reg + immConstOffset * 4]. I don't know if I should multiply Reg too
Good news - there are no other ConstBankAddresses - only 0 or 2

Results 

I've made -i option to dump all recognized format:

P @!  Predicate Pg
$
E     FTZ 1  ftz
E     DSTFMT_U8_S8_U16_S16_U32_S32 1 5 dstfmt
E     Float32 1 2 srcfmt
E     Round3 1  rnd
E     NTZ 1  ntz
E     Register 0  Rd
E ,-  Register 0  Rb
E     REQ 0  req
V     req_bit_set BITSET
E     RD 0  rd
V     src_rel_sb UImm
E     WR 0  wr
V     dst_wr_sb UImm
E     USCHED_INFO 0  usched_info
E     BATCH_T 0  batch_t
E     PM_PRED 0  pm_pred

and assembled instruction is .FTZ.U32.TRUNC.NTZ R3 ,R2- 0x0 0xFFFF 0x2 trans1

Output from nvdisasm is F2I.FTZ.U32.TRUNC.NTZ R3, R2 - last 3 fields was taken from USCHED_INFO, BATCH_T & PM_PRED

Комментариев нет:

Отправить комментарий