I'm not sure about "vendor-independent" solution for each single case (actually with exception for CUDA, which is NVidia only, you get vendor independent timing)...
For CUDA - you can use cudaEvent_t type and cudaEventCreate, cudaEventRecord, cudaEventSynchronize and cudaEventElapsedTime (dont forget to cudaEventDestroy in the end) - google up the functions in their docs if you are using CUDA.
For OpenCL - you can use clGetEventProfilingInfo (this is vendor independent, but applies only on OpenCL code).
For OpenGL - read about GL_ARB_timer_query extension (promoted to core with OpenGL 3.3 ... technically taken this is also vendor independent)
For Direct3D - you can profile DirectX 11 with queries too - google for it, I think MJP (one of the guy from GameDev.net) had an article about them. Can't remember the link though. (And yeah, this is also vendor independent)