It seems that latency values are the best kept secret - I was able to find only article in internet and author didn't provided any code to decipher those tables. So
Disclaimer
All of the following are the shaky conclusions of my dark mind, almost certainly false and having no connection to reality
How they are look like
Descriptions of latency tables are located in files *_2.txt and look like
TABLE_OUTPUT(UGPR) : UDP_subset`{URd @URdRange,URd2 @URd2Range}
R2UR_S2UR`{URd @URdRange,URd2 @URd2Range}
OP_R2UR_COUPLED`{URd @URdRange,URd2 @URd2Range}
ULDC_VOTEU_UMOV_ULEPC`{URd @URdRange,URd2 @URd2Range}=
{
UDP_subset`{URd @URdRange,URd2 @URd2Range} : 1 4 7 7
R2UR_S2UR`{URd @URdRange,URd2 @URd2Range} : 1 1 1 1
OP_R2UR_COUPLED`{URd @URdRange,URd2 @URd2Range} : 4 4 1 10
ULDC_VOTEU_UMOV_ULEPC`{URd @URdRange,URd2 @URd2Range} : 1 4 1 1
};
Presumably you can get previous instruction as column, current as row and get ticks count from this table. However there are several tables for each resource - TRUE, ANTI & OUTPUT. I have no ideas how to select them
Instructions are grouped in so called operation sets - like set
R2UR_S2UR={REDUX,REDUXudp_pipe,S2UR,S2URudp_pipe}
contains 4 instructions. In reality it contains 2 - redux & s2r bcs xxx_pipe just refer to them againCondition in curly braces are described in Connector Conditions section - like registers sets and optionally some function (with prefix @) which must evaluating to true to connect some instruction for this table
Also for old SM there may be condition for whole table - like
FMAI_OPS[CC_NON_CONST]`{inputCC,dummyCC,TestCC}
in more new SM they were moved to Connector Sets descriptors:
CONNECTOR SETS
GMMA_SB = OP_WARPGROUP[MODE_ARV]`{GMMA_GPR};
Here both CC_NON_CONST & MODE_ARV are names for connector conditions functions
Operation Sets
Simplest part and even can be described in BNF like
op_set: op_name '=' op_expr ';'
op_expr: op_name | op_expr '+' op_expr | op_expr '-' op_expr | '(' op_expr ')' | inst_list
inst_list: '{' comma separated list of instruction names '}'
As usually there are lots of missing instructions - for example for sm90 missed descriptions for ICMP, BFE, BFI, ISET, FCMP, GETFPFLAGS, SETFPFLAGS, CSET, PSET, VMAD, VADD, VMNMX, VSET, VSHL, VSHR, VSETP, XMAD, IMADSP, F2F64, F2I64, FRND64 etc - although MD file still have some mentions about them, like enum XMADcop & mask NencBFE
I think they were removed from MD files deliberately - at least it's very hard to believe that this is just sloppiness from company which was not too lazy to encrypt 41515 names of MLIR opcodes
Connector Conditions
typically looks like
Rd2Range = (((((MD_PRED(IDEST2_SIZE)) >= (1)) ? (MD_PRED(IDEST2_SIZE)) : (1)) - 1) >> 5) + 1;
MODE_ARV = ((mode == 0) _OR_ 0);
As you can see they can refer to instructions predicates (via keyword MD_PRED) or to fields like mode or even to early defined conditions
This is hardest part to parse - bcs we can have totally imaginary things right from unicorns land like
CONNECTOR CONDITION
REORDER = ANNOTATED( "dilbertCheck" )
|| ANNOTATED( "zlfCounter" )
|| ANNOTATED( "firstResort_IFB" "_" "dilbertCheck" )
|| ANNOTATED( "firstResort_IFB" "_" "zlfCounter" )
|| ANNOTATED( "trans1" "_" "zlfCounter" )
|| ANNOTATED( "trans4" "_" "zlfCounter" )
|| ANNOTATED( "firstResort_IFB" "_" "cubeAlgo" "_" "zlfCounter" );
there are no annotations in MD, at least with names like "dilbertCheck"
In old SM they are usually contains whole body in square brackets like
OP_P2R[Pr==0]`{PR_PRED}
Here Pr isn't predicate - it is field Pr described as CCPR:Pr and having encoding CCPR = CCPR;
Another example of hallucinations - field b described as BOnly and don't having encoding for it:
FORMAT PREDICATE @[!]Predicate(PT):Pg Opcode
/BOnly:b /LOD(noLOD):lod /LC(noLC):lc /TOFF1(noTOFF):toff
ENCODING
Opcode10 = Opcode;
LCB = lc;
PredDst = Pd;
LODB = LOD;
AOFFIB = TOFF1;
DC = DC;
NDV = NDV;
ParamA = ParamA;
Wmsk = wmsk;
NODEP = NODEP;
Pred = Pg;
PredNot = Pg@not;
Dest = Rd;
RegA = Ra;
RegB = Rb;
!NencTEXB;
OEUSchedInfo = usched_info;
In my script I tried to convert to C++ as many conditions as I can, but anyway I think for old SM like 3 & 4 it does not work properly
Connector Sets
Most widely used since sm70 and looks like
ON_MATH_PRED_READERS = OP_P2R`{PR_PRED} + OP_R2P`{PR_PRED} + OP_CSMTEST[VTG_PRED]`{PR_PRED} + OP_VOTE`{Pr,Pq,Pp,Pa,Pb,Pc,Ps,Plg};
As you can see it can have both connector condition in square brackets and fields with predicates
Following the breadcrumbes
I add -g option to ead.pl to parse file with latency tables and produce reference to them for each instruction:
- bcs instruction can belongs to several groups and several groups can refer to latency table as column or row nv_instr now have pointers to NV_tabrefs for columns and rows
- NV_tabrefs has pointer to NV_Tab & connector condition for whole table in field filter
- NV_Tab has lists of columns & rows which also have connector condition for fields in NV_cond_list
So connection logic is:
- for previous instruction check each referred table and call it's connector condition if presents
- for matched tables iterate for connected columns and also call their connector conditions
- store all matched tables and columns indexes
- for current instruction check each early matched table and call it's connector condition if presents
- for matched tables iterate for connected rows and also call their connector conditions
- dump tables value at column from previous instruction and row of current
I also add caching for NV_cond_list - cache hits revolves around ~40%:
conditions 17492 (8678) cached 7143
Now my sass disassembler has -S option to show values from latency tables:
S> tab TRUE GPR row 0 (FXU_OPS){Rd:3} col 0 (MATH_OPS){Rb:255 Ra:3}: 6
S> tab TRUE GPR row 1 (FMAI_OPS){Rd:3} col 0 (MATH_OPS){Rb:255 Ra:3}: 6
XMAD line 20275 n 150 20 render items
> XMAD R3,R2.reuse,R8.reuse,RZ &0 ?trans1
As you can see here instruction XMAD has 2 references to the same table TRUE(GPR), first time as row FXU_OPS, second as FMAI_OPS. In both cases latency is 6 ticks
Комментариев нет:
Отправить комментарий