пятница, 30 мая 2025 г.

nvidia sass assembler

I am very skeptical about patching of existing .cubin files - it requires too much book-keeping. Let's say we want to insert several additional instructions into some function - then we need

  1. extend section containing code for those function by patching sections table
  2. patch symbols table/relocs
  3. disasm whole function and build code-flow graph for all instructions in function
  4. fix offsets for jumps
  5. fix attributes like EIATTR_INDIRECT_BRANCH_TARGETS & EIATTR_JUMPTABLE_RELOCS
  6. and so on

While points 1-2 can be implemented with ELF patching libraries like elftools it is anyway too much tedious labour

For example CuAssembler prefers to create new .cubin files from scratch. In any case we need some engine to generate sass instructions and this task is perfectly achieve-able when you have ready disassembler. So I add to my sass disasm engine some primary features for code generation:

  • dictionary of all instructions for given SM - method INV_disasm::get_instrs
  • for each instruction add encoders describing how to put values for fields, tables, constant banks & scheduling

As illustration I've implemented interactive sass assembler (with some help of readline for auto-completion)

воскресенье, 4 мая 2025 г.

nvidia sass latency tables

It seems that latency values are the best kept secret - I was able to find only article in internet and author didn't provided any code to decipher those tables. So

Disclaimer

All of the following are the shaky conclusions of my dark mind, almost certainly false and having no connection to reality

 

How they are look like

Descriptions of latency tables are located in files *_2.txt and look like
TABLE_OUTPUT(UGPR) : UDP_subset`{URd @URdRange,URd2 @URd2Range}
                      R2UR_S2UR`{URd @URdRange,URd2 @URd2Range}
                       OP_R2UR_COUPLED`{URd @URdRange,URd2 @URd2Range}
                        ULDC_VOTEU_UMOV_ULEPC`{URd @URdRange,URd2 @URd2Range}=
{
    UDP_subset`{URd @URdRange,URd2 @URd2Range} : 1 4 7 7
    R2UR_S2UR`{URd @URdRange,URd2 @URd2Range} : 1 1 1 1
    OP_R2UR_COUPLED`{URd @URdRange,URd2 @URd2Range} : 4 4 1 10
    ULDC_VOTEU_UMOV_ULEPC`{URd @URdRange,URd2 @URd2Range} : 1 4 1 1
};