четверг, 13 ноября 2025 г.

sass registers reusing

Lets continue to compose some useful things based on perl driven Ced. This time I add couple of new options to test script dg.pl for registers reusing

What is it at all? Nvidia as usually don't want you to know. It implemented in SASS as set of operand attributes "reuse_src_XX" and located usually in scheduler tables like TABLES_opex_X (more new like reuse_src_e & reuse_src_h are enums of type REUSE)

We can consider registers reusing as hint for GPU scheduler that some register in an instruction can reuse the physical register already allocated to one of its source operands, avoiding a full register allocation and reducing register pressure - or in other words as some registers cache

So the first question is how we can detect size of those cache? I made new pass (option -u) to collect all "reuse" attributes and find maximum of acting simultaneously - see function add_ruc

Results are not very exciting - I was unable to find in cublass functions with cache size more than 2. I remember somewhere in numerous papers about dissecting GPU came across the statement that it is equal to 4 - unfortunately I can't remember name of those paper :-(


 

And the next thing is: can we automatically detect where registers can be reused and patch SASS?

понедельник, 10 ноября 2025 г.

barriers & registers tracking for sass disasm

Finally I add registers tracking in my perl sass disasm

Now I can do some full-featured analysis of sass - like find candidates pairs of instruction to swap/run them in so called "dual" mode - and all of this in barely 1200 LoC of perl code

Let's think what must mean for couple of instructions to be fully independent:

  1. they should belong to the same block - like in case of
      IADD R8, -R3, RZ
    .L_x_14:
      FMUL R11, R3.reuse, R3
    instructions should be treated as located in different blocks
  2. they should not depend from the same barriers
  3. they should not update registers used by each other 

So I implemented building of code-flow graph, barriers & registers tracking

Building of CFG