I am more or less interested in whether or not it would make sense to try and harness a GPU(s) for collision calculations server side, or if the latency costs associated with something like that would negate any benefit. I'm looking at trying to offload bottlenecks server side when there is 1000's of object flying around at once.
You won't get much if any benefit from using a GPU. They are horrendously slow at processing a single thread (about a factor 100). They compensate for that by processing thousands of threads. This works great for graphics since there are thousands if not millions of pixels, and the communication goes one way (it's ok for the first pixel to appear on screen tens of milliseconds later).
But when you have a feedback loop, like with physics, that's a really high latency and it's not easy to find thousands of independent work items. In fact even with thousands of objects flying around it's a lot smarter to sort them spacially before doing any expensive collision detection, instead of just brute force processing everything. So what you need is fast single-threaded processing, as done by the CPU.
That doesn't mean you can't do compute intensive work. The latest Intel CPUs have 256-bit vector operations, four cores, and Hyper-Threading. For the majority of workloads, a modern CPU offers a well balanced mix between ILP, TLP, and DLP.
And it's only going to get better. Fused multiply-add instructions are already in the AVX specification, and gather/scatter instructions would dramatically speed up parallel indirect addressing. In the meantime GPUs are struggling to fight Amdahl's Law and are forced to spend more die space on latency optimizations, limiting their compute density. So they're both converging towards a device which is both latency optimized and throughput optimized. The GPU is worthless on its own, so in the long run the CPU will prevail.