пятница, 4 апреля 2025 г.

ptx instructions emitting by nvidia compiler. part 2

Part 1 described v10
And today let's check cicc v12. The first thing that catches your eye is its size - almost 76Mb! And it also contains at least 5 different decryptors - Nvidia really wants to hide something from its grateful clients

Why it is so fat?

Bcs it contains at least 4 code generators: for arm32, aarch64, x86 & nvptx
+ at least 27 llvm bytecode blobs (signature 0x42 0x43 0xc0 0xde) - they contains mostly bodies of intrinsic functions like nvvm_mulq/nvvm_divq but on some llvm-dis just crashes:

#0  0x000055abcc64743d in llvm::Intrinsic::getIntrinsicInfoTableEntries (id=0, T=...) at /home/redp/disc/src/llvm-project/llvm/lib/IR/Function.cpp:1339
1339      unsigned TableVal = IIT_Table[id-1];
>>> where
#0  0x000055abcc64743d in llvm::Intrinsic::getIntrinsicInfoTableEntries (id=0, T=...) at /home/redp/disc/src/llvm-project/llvm/lib/IR/Function.cpp:1339
#1  0x000055abcc5fe41f in UpgradeIntrinsicFunction1 (F=0x55abce63cf18, NewFn=@0x7ffc60414e70: 0x0) at /home/redp/disc/src/llvm-project/llvm/include/llvm/IR/Function.h:204
#2  0x000055abcc60111a in llvm::UpgradeIntrinsicFunction (F=F@entry=0x55abce63cf18, NewFn=@0x7ffc60414e70: 0x0) at /home/redp/disc/src/llvm-project/llvm/lib/IR/AutoUpgrade.cpp:1226
#3  0x000055abcc584778 in (anonymous namespace)::BitcodeReader::globalCleanup (this=0x55abce608e30) at /home/redp/disc/src/llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp:3696
#4  0x000055abcc5856cc in (anonymous namespace)::BitcodeReader::parseModule (this=<optimized out>, ResumeBit=<optimized out>, ShouldLazyLoadMetadata=<optimized out>, Callbacks=...) at /home/redp/disc/src/llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp:4385
#5  0x000055abcc5959ca in (anonymous namespace)::BitcodeReader::parseBitcodeInto (Callbacks=..., IsImporting=false, ShouldLazyLoadMetadata=false, M=0x55abce5f3d80, this=0x55abce608e30) at /usr/include/c++/9/bits/std_function.h:564
#6  llvm::BitcodeModule::getModuleImpl (this=<optimized out>, Context=..., MaterializeAll=<optimized out>, ShouldLazyLoadMetadata=<optimized out>, IsImporting=<optimized out>, Callbacks=...) at /home/redp/disc/src/llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp:7981
#7  0x000055abcc596070 in llvm::BitcodeModule::getLazyModule (this=0x7ffc60415d10, Context=..., ShouldLazyLoadMetadata=<optimized out>, IsImporting=<optimized out>, Callbacks=...) at /usr/include/c++/9/bits/std_function.h:263
#8  0x000055abcc550e49 in main (argc=<optimized out>, argv=<optimized out>) at /home/redp/disc/src/llvm-project/llvm/include/llvm/Support/CommandLine.h:1399

>>> p id
$1 = 0

At least they should check that index can become negative, no? Who would doubt that llvm is very reliable and secure

So if they process llvm ByteCode then they also must link half of llvm run-time to do it, but they also use 

MLIR

How can I be sure? well, just do strings cicc | grep -i cpp

llvm::StringRef llvm::getTypeName() [with DesiredTypeName = llvm::IPSCCPPass]
llvm::StringRef llvm::getTypeName() [with DesiredTypeName = llvm::SCCPPass]
See definition in NVPTXISelLowering.cpp
 

Oops, do they mean this? If you are still not convinced, you can search for 'Pass'
Actually I don't like mlir - idea is probably ok but implementation is worst of all possible:
  • why tablegen implemented in c++?
  • even ancient gcc allows you to write plugins in dynamically loaded .so - but llvm not and it makes development very slow
  • no even simple support for profiling of passes
  • so called 'verifiers' are actually just integrity checkers - and btw they called twice in each pass. And no - you can't disable them. Verifier must answer on simple questions like "are you sure that this particular instruction ever be placed in output? if so - what conditions must meet input?"

and so on

So I am too lazy to verify if their PTX codegen is really exactly llvm NVPTX or has some patches

Results

I've made decryptor for some data:
  • compiler errors 
  • list of llvm attributes - 1 & 2. don't ask me why there are two of them
  • ptx templates to place in output - also 1 & 2  

llvm attrs & ptx templates are encrypted. all 41515. Just imagine this picture - your boss ask you to encrypt names of all used llvm attributes scattered in whole llvm source tree

Комментариев нет:

Отправить комментарий