пятница, 6 марта 2026 г.

SASS latency table: second try

In my first attempt I used latency tables extracted from MD file (located inside nvdisasm) and nothing good came out of it

Obvious reason is that real latency table should be located not in disassembler - it must be inside ptxas. But the problem with that file is that it is really huge - in SDK 13 it has size 40Mb. Sure no symbols included

This is not surprisingly bcs it contains lots of things:

  • ptxas parser
  • lots of macros
  • optimizing compiler with 159 passes and don't use LLVM at all
  • code generators for several different SMs

Besides it does not have any tracepoints and big part of string are encrypted. So it took lots of time and patience but finally I found and extracted right latency table

And then a lot of discoveries came my way

Firstly indices in that table are internal IR opcodes, so they are poorly fit on instruction tables. I made quick and dirty script for analysis - from 461 indices only 284 can be automatically mapped onto instruction names, and 136 has form like F2F (not F64) or HMMA.F32.{1684.E8M10|1688.F16|1688.E8M7}

Secondly - values are counterintuitive, for example MOV has latency 3, MOV.64 - 6, and suddenly both MOV.64.(HI | LO) - 9

another example - I2F - 3, but I2F (not F64) - 13

I think we can throw almost all articles about SASS optimization in the trash bin

And lastly - it contains 342 opcodes for 

MERCURY

Google gives practically no references to what it is - which confirms my hypothesis that all people who understand nvdisasm output have been working for Chinese military Deepseek for a long time
In short - when you compile ptx for sm100+ ptxas produces resulting cubin/obj with lots of sections prefixed with .nv.merc. Almost everything is duplicated:
  • symbol table - section type 0x70000085. Almost identical to original - only section indices were patched
  • relocs - section type 0x70000082, relocs differs from normal SASS and have prefix R_MERCURY
  • attributes - section type 0x70000083. Few of new attributes like EIATTR_MERCURY_ISA_VERSION
  • and even debug info - I add them into my dwarfdump

Section with code looks like

[31] .nv.capmerc.text._Z15warp_reduce_sumi type 70000016 [SHT_CUDA_CAPMERC] flags 10000000
00000000  18 00 00 00-01 00 00 C0|25 00 00 00-AE FA AA AA  ........%.......
00000010  02 00 00 00-41 5D 00 0A|41 0C 40 04-41 0C 40 04  ....A]..A.@.A.@.
00000020  41 0C 40 04-41 0C 40 04|D1 01 02 4B-1A 06 F8 00  A.@.A.@....K....
00000030  22 00 00 00-01 30 81 03|40 03 00 00-03 C0 02 C0  "....0..@.......
00000040  03 02 00 00-00 00 00 00|00 00 41 0C-40 04 D1 01  ..........A.@...
00000050  02 4B 1A 06-F8 00 22 00|00 00 01 30-81 03 40 03  .K...."....0..@.
00000060  00 00 03 C0-02 C0 03 02|00 00 00 00-00 00 00 00  ................
00000070  41 0C 40 04-D1 01 02 4B|1A 06 F8 00-22 00 00 00  A.@....K...."...
00000080  01 30 81 03-40 03 00 00|03 C0 02 C0-03 02 00 00  .0..@...........
00000090  00 00 00 00-00 00 41 0C|40 04 D1 01-02 4B 1A 06  ......A.@....K..
000000A0  F8 00 22 00-00 00 01 30|81 03 40 03-00 00 03 C0  .."....0..@.....
000000B0  02 C0 03 02-00 00 00 00|00 00 00 00-41 0C 40 04  ............A.@.
000000C0  D1 01 02 4B-1A 06 F8 00|22 00 00 00-01 30 81 03  ...K...."....0..
000000D0  40 03 00 00-03 C0 02 C0|03 02 00 00-00 00 00 00  @...............
000000E0  00 00 41 0C-40 04 42 33|00 0E 50 05              ..A.@.B3..P.

First DWORD is obviously index of SASS code section. As you can see entropy is low so it can't contain hash/HMAC or some compressed stream. But the most funny part is that attributes for this section contain values out of range:

60: EIATTR_COOP_GROUP_INSTR_OFFSETS len 0028
00000000  40 00 00 00-70 00 00 00|A0 00 00 00-D0 00 00 00  @...p...........
00000010  00 01 00 00-80 01 00 00|E0 01 00 00-40 02 00 00  ............@...
00000020  A0 02 00 00-00 03 00 00                          ........
Values for group instruction 0x1e0, 0x240, 0x2a0 & 0x300 correspond to original SASS code section and out of range 0-0xec

Also nvdisasm unable to show mercury sections - no one doubts the extent of Nvidia's paranoia. So I add some initial support of mercury sections to my nvd

I have no idea whether this is a completely new instruction set or just internal IR bytecode

Комментариев нет:

Отправить комментарий