Having predicates for operands size and properties for type/identification we could write register tracking (well, at least up to sm90). But before we should familiarize yourself with couple of CUDA specific things
Uniform registers
As concise introduction you can read this
paper, especially paragraph 3.5.2:
Turing introduces a new feature intended to improve the maximum achievable arithmetic throughput of the main, floating-point capable datapaths, by adding a separate, integer-only, scalar datapath (named the uniform datapath) that operates in parallel with the main datapath
Regular instructions can access both uniform and regular registers. Uniform datapath instructions, instead, focus on uniform instructions almost exclusively
So for example on SM75 you have 255 regular registers and 63 uniform registers UR0-UR62 (and URZ clearly mapped to RZ) + uniform predicates UP0-UP6. Given that they "typically updating array indices, loop indices or pointers" and size of VRAM can be up to 192Gb someone would expect that this is whole new set of registers with width 64bit to access arrays > 4Gb
Well, reality is much more boring - they are just virtual mapping of regular 32bit registers. Proof:
S2R R3, SR_TID.X
S2UR UR4, SR_CTAID.Y
Here both SR_XX are so called "special registers" with width 32bit. Also EIATTR_MAXREG_COUNT (being itself 16bit) always contains value 0xff. I saw curious cases when "nvdisasm --print-life-ranges" shows GPR 223 and UGPR 35. If I can use calculator 223 + 35 = 258
I have zero ideas how those URs are mapped to real registers (and uniform predicates to ordinal predicates) - at least there is no EIATTRs for such mapping. Obviously they are not mapped 1:1:
S2R R10, SR_CTAID.Z ; R10 now contains value from special register SR_CTAID.Z
ULDC.64 UR10, c[0x0][0x118]
IMAD.WIDE R2, R10, R3, c[0x0][0x168] ; and here it's value is still alive
Also it's totally unclear how functions get initial values for this URs. I`ve wrote for my
nvd parser for EIATTR_PARAM_CBANK & EIATTR_KPARAM_INFO and it seems that often they are loaded exactly from nowhere:
/*30*/ ULDC.64 UR4,c[0][0x118];
; unknown cb off 118
/*40*/ IMAD.WIDE R2,PT,R7,R6,c[0][0x168] &req={0};
; cb in section 254, offset 168 - 160 = 8
as you can see const bank starts from 0x160 and UR4 was loaded from offset 0x118
Wide loading