I am very skeptical about patching of existing .cubin files - it requires too much book-keeping. Let's say we want to insert several additional instructions into some function - then we need
- extend section containing code for those function by patching sections table
- patch symbols table/relocs
- disasm whole function and build code-flow graph for all instructions in function
- fix offsets for jumps
- fix attributes like EIATTR_INDIRECT_BRANCH_TARGETS & EIATTR_JUMPTABLE_RELOCS
- and so on
While points 1-2 can be implemented with ELF patching libraries like elftools it is anyway too much tedious labour
For example CuAssembler prefers to create new .cubin files from scratch. In any case we need some engine to generate sass instructions and this task is perfectly achieve-able when you have ready disassembler. So I add to my sass disasm engine some primary features for code generation:
- dictionary of all instructions for given SM - method INV_disasm::get_instrs
- for each instruction add encoders describing how to put values for fields, tables, constant banks & scheduling
As illustration I've implemented interactive sass assembler (with some help of readline for auto-completion)
It allows you to pick SM and output file, ask for possible values of fields and so on. For example if you want to add couple instructions ldc/imad for sm75 just run ina -o output-file.name ./sm75.so:
> ldc
4 forms:
1) @Pg LDC E:sz E:ad E:Rd c:[Sa_bank][E:Ra Ra_offset] req_bit_set src_rel_sb dst_wr_sb E:usched_info E:batch_t
2) @Pg LDC E:sz E:ad E:Rd c:[Sa_bank][E:Ra Ra_offset] req_bit_set src_rel_sb dst_wr_sb E:usched_info E:batch_t
3) @Pg LDC E:sz E:Rd c:[E:URa][E:Rb Sa_offset] req_bit_set src_rel_sb dst_wr_sb E:usched_info E:batch_t
4) @Pg LDC E:sz E:Rd c:[E:URa][E:Rb Sa_offset] req_bit_set src_rel_sb dst_wr_sb E:usched_info E:batch_t
Here LDC has 4 different encoding forms, IMAD has 50 forms:
LDC 1
LDC R0,c:[0][0] &0 &0 &0 ?OFF_DECK_DRAIN ?NOP
I choose the first and prompt shows text of LDC instruction with all fields having default values. Now we can ask about encoding of some fields and set them like:
i sz
MaskLen 3 .Enum SZ_U8_S8_U16_S16_32_64 DefVal 4:
7 INVALID7
6 INVALID6
5 _64
4 _32
3 S16
2 U16
1 S8
0 U8
sz 3
LDC.S16 R0,c:[0][0] &0 &0 &0 ?OFF_DECK_DRAIN ?NOP Rd 11
LDC.S16 R11,c:[0][0] &0 &0 &0 ?OFF_DECK_DRAIN ?NOP
and so on. Notice that prompt always keeps instruction with all patched fields in text form. You also can show all filled fields of current instruction:kv
sz:3 (S16)
ad:0 (IA) DEF
Ra:255 (RZ) DEF
Rd:11 (R11)
Pg:7 (P7) DEF
Tab3 batch_t:0 (NOP) DEF
Tab2 dst_wr_sb:0
Tab3 usched_info:0 (OFF_DECK_DRAIN)
Tab1 src_rel_sb:0
As you can see fields usched_info & batch_t belongs to the same table and so have restricted set of possible values:
i batch_t
MaskLen 8 Enum BATCH_T DefVal 0:
5 BARRIER_EXEMPT
4 BATCH_END
3 REQ_BAR
2 BATCH_START_TILE
1 BATCH_START
0 NOP
TAB(batch_t):
batch_t usched_info
99 3 3
98 3 2
97 3 1
91 2 27
90 2 26
9 0 9
...
batch_t 1
cannot find row: 1 0
LDC.S16 R11,c:[0][0] &0 &0 &0 ?OFF_DECK_DRAIN ?NOP usched_info 27
LDC.S16 R11,c:[0][0] &0 &0 &0 ?W11 ?BATCH_START
There is no row with value 1 for batch_t and 0 for usched_info - nevertheless value for batch_t was saved because there are rows containing 1 for it. Then I set field usched_info to 27 and row "1 27" exists so we successfully patched scheduling fields.
To quit enter q, to show renderer for current instruction - r, to save current instruction - w, and to return to instruction selection - b
Numerical fields support decimal values, binary values with prefix 0b and hexadecimal with prefix 0x - nothing unexpected
For placeholders fields can be:
- XX@not for ! - in predicates only
- XX@negate for -
- XX@invert for ~
- XX@absolute for ||
Now let's see that we generated - run ina -i output-file.name ./sm75.so:
0: LDC.S16 R11,c:[0][0] &0 &0 &0 ?W11 ?BATCH_START
10: IMAD.U32.X R10,R11,R0,c:[0xA][0x40],P0 &0 ?W7EG ?BATCH_START_TILE
hd output-file.name
00000000 82 7b 0b ff 00 00 00 00 00 06 00 00 00 36 00 04 |.{...........6..|
00000010 24 76 0a 0b 00 10 80 02 00 04 0e 00 00 ce 0f 08 |$v..............|
Комментариев нет:
Отправить комментарий