In my first attempt I used latency tables extracted from MD file (located inside nvdisasm) and nothing good came out of it
Obvious reason is that real latency table should be located not in disassembler - it must be inside ptxas. But the problem with that file is that it is really huge - in SDK 13 it has size 40Mb. Sure no symbols included
This is not surprisingly bcs it contains lots of things:
- ptxas parser
- lots of macros
- optimizing compiler with 159 passes and don't use LLVM at all
- code generators for several different SMs
Besides it does not have any tracepoints and big part of string are encrypted. So it took lots of time and patience but finally I found and extracted right latency table
And then a lot of discoveries came my way
Firstly indices in that table are internal IR opcodes, so they are poorly fit on instruction tables. I made quick and dirty script for analysis - from 461 indices only 284 can be automatically mapped onto instruction names, and 136 has form like F2F (not F64) or HMMA.F32.{1684.E8M10|1688.F16|1688.E8M7}
Secondly - values are counterintuitive, for example MOV has latency 3, MOV.64 - 6, and suddenly both MOV.64.(HI | LO) - 9
another example - I2F - 3, but I2F (not F64) - 13
I think we can throw almost all articles about SASS optimization in the trash bin
And lastly - it contains 342 opcodes for
MERCURY
- symbol table - section type 0x70000085. Almost identical to original - only section indices were patched
- relocs - section type 0x70000082, relocs differs from normal SASS and have prefix R_MERCURY
- attributes - section type 0x70000083. Few of new attributes like EIATTR_MERCURY_ISA_VERSION
- and even debug info - I add them into my dwarfdump
Section with code looks like
[31] .nv.capmerc.text._Z15warp_reduce_sumi type 70000016 [SHT_CUDA_CAPMERC] flags 10000000
00000000 18 00 00 00-01 00 00 C0|25 00 00 00-AE FA AA AA ........%.......
00000010 02 00 00 00-41 5D 00 0A|41 0C 40 04-41 0C 40 04 ....A]..A.@.A.@.
00000020 41 0C 40 04-41 0C 40 04|D1 01 02 4B-1A 06 F8 00 A.@.A.@....K....
00000030 22 00 00 00-01 30 81 03|40 03 00 00-03 C0 02 C0 "....0..@.......
00000040 03 02 00 00-00 00 00 00|00 00 41 0C-40 04 D1 01 ..........A.@...
00000050 02 4B 1A 06-F8 00 22 00|00 00 01 30-81 03 40 03 .K...."....0..@.
00000060 00 00 03 C0-02 C0 03 02|00 00 00 00-00 00 00 00 ................
00000070 41 0C 40 04-D1 01 02 4B|1A 06 F8 00-22 00 00 00 A.@....K...."...
00000080 01 30 81 03-40 03 00 00|03 C0 02 C0-03 02 00 00 .0..@...........
00000090 00 00 00 00-00 00 41 0C|40 04 D1 01-02 4B 1A 06 ......A.@....K..
000000A0 F8 00 22 00-00 00 01 30|81 03 40 03-00 00 03 C0 .."....0..@.....
000000B0 02 C0 03 02-00 00 00 00|00 00 00 00-41 0C 40 04 ............A.@.
000000C0 D1 01 02 4B-1A 06 F8 00|22 00 00 00-01 30 81 03 ...K...."....0..
000000D0 40 03 00 00-03 C0 02 C0|03 02 00 00-00 00 00 00 @...............
000000E0 00 00 41 0C-40 04 42 33|00 0E 50 05 ..A.@.B3..P.
First DWORD is obviously index of SASS code section. As you can see entropy is low so it can't contain hash/HMAC or some compressed stream. But the most funny part is that attributes for this section contain values out of range:
60: EIATTR_COOP_GROUP_INSTR_OFFSETS len 0028Values for group instruction 0x1e0, 0x240, 0x2a0 & 0x300 correspond to original SASS code section and out of range 0-0xec
00000000 40 00 00 00-70 00 00 00|A0 00 00 00-D0 00 00 00 @...p...........
00000010 00 01 00 00-80 01 00 00|E0 01 00 00-40 02 00 00 ............@...
00000020 A0 02 00 00-00 03 00 00 ........
Also nvdisasm unable to show mercury sections - no one doubts the extent of Nvidia's paranoia. So I add some initial support of mercury sections to my nvd
I have no idea whether this is a completely new instruction set or just internal IR bytecode
Комментариев нет:
Отправить комментарий