пятница, 17 октября 2025 г.

perl modules for CUBINs patching

After playing a bit with my ced I came to the conclusion that implemented DSL for editing is not enough - like it would be good to have subroutines to patch repeated/similar instructions, check that patched instruction is what I want, patch attributes/relocs etc
In other words, I need full-fledged PL. Although I've read books series "modern compiler implementation" from Andrew Appel and "crafting interpreters" I think making my own PL is overkill, so I made several XS modules to edit/patch CUBIN files for Perl. Why Perl?
  • I am able to write on it almost all I want
  • when I can't - I can always to develop my own module(s)
  • yet I don't feel sick like from pseudo languages like python
  • and it damn good and fast when you try to sketch out prototypes for things you have no idea how to make

 

ELF::FatBinary

for extracting/replacing CUBIN files from FatBinaries
see details here


Cubin::Ced 

In essence this is wrapper around Ced - it allows you to disasm/patch SASS instructions
Currently it don't support registers tracking
See doc in POD format 


Cubin::Attrs

Module to extract/patch attributes of CUBIN files + also relocs
doc in POD format

Sample

среда, 1 октября 2025 г.

addresses of cuda kernel functions

 Quote from official document:

It is not allowed to take the address of a __device__ function in host code

I haven't been surprised for a long time that entire CUDA is made up of ridiculous restrictions. What if I told you that paranoid nvidia lies as usually and actually you can get addresses of kernel functions in your host code?

But first lets check what workarounds we can employ to have functions pointers. I don't know for what pedagogical purpose this code intentionally was written so poorly and does not free the allocated memory - and now millions of brainless artificial idiots will copy-paste it forever, so I made patched version. You can realize that attempt to read from early gathered with cudaMemcpyFromSymbol addresses will results error 1 (invalid argument)

Ok. but we could just return address of function directly from another kernel function, right? So I made quick & dirty hack
I brute-forced all combinations of cf1(__device__/__constant__) & variants of cudaMemcpyFromSymbol/cudaMemcpy - and with no luck
So it's time to run

cuda-gdb