The reason I decided to use CUDA was the hardware video encoding, the cards we are using support it and it is much faster than software implementations.
Can't give details, but the basic idea is
Load a 2D layered texture
Loop over all input files
Combine input files using layered texture as an input to the equation
Add to video
The key to me was that I could do everything in CUDA without transferring the frame backwards and forwards from video ram.
The 2D layered texture, once loaded stays in video ram. The generated frame can be converted to YUV in video ram The video frame generated in video ram
So the only host transfers in the inner loop are to load in the two input textures, and write out the video frame
I found the process of getting CUDA up and running in visual studio painless, really easy
However the details of writing CUDA code turned out to be a very acute pain. I found the documentation to be awful. A lot of things seem to have changed over the various versions, and you have the two schemes to work with. Total CUDA and CUDA c++.
When I finally was happy with my code, it took me a whole day to get it to compile. Not because I had done anything wrong, the code was correct. Hidden away in the project settings was a value that forced CUDA to compute 1.0 and shader model 1.0
Once I found and changed that, everything compiled fine.
But failed to run.
After more research and many internet searches I found out that my problem was not with the code, this time it was that my display drivers were older than the CUDA SDK I had installed. Why this should be a complete fail is beyond me, but hey ho.
Once I had updated my display drivers from 3.1 to 3.3, it finally ran!
I have done a hell of a lot of HLSL, not just for games. I've used HLSL as part of a multi touch remote input system, which was fun. Image filtering in HLSL is really fun to work with. I can see why you wrote a load of filters, it's amazing what a small equation can produce when applied to pixels.
Thinking about the whole pipeline I am working on, I know I am going to have to promote one of the textures from uchar4. I haven't decided yet what format to use, my instincts say promoting them to 16 bit, I will have to see what the difference will be loading a 1920 by 1080 image in the various formats