math & physics
I woke up this morning, had a moment of inspiration, and figured out a cool way to partially parallelize force based physics simulations. I've been getting into writing articles/tutorials, so here you go:
I believe that this method is a bit different than the one that appears in GPU Gems 3.
PS: My computer is old and not parallel and thus the absence of a working code sample.
Update: fixed a few minor mistakes in the article (and reduced the number of interactions that need to be calculated even further).
Made a significant update to my n-body paper. It now includes better/bigger/more diagrams and I hope, a clearer explanation of the technique.
Edit: To make a correction pointed out by EricTheRed, it does not run in O(N) time (for asymptotically big N), but rather could theoretically run in N time so long as there are at least N/2 cores.
... it theoretically runs in O(N) time on highly parallel hardware.
The number of cores does not affect the complexity so does it also run in O(n) time in a single thread?
Maybe I do not correctly understand the exact meaning of complexity denoted by O-notation. What I've done is taken the N-Body force-interaction problem, which requires at least N(N-1)/2 forces to be calculated (typically in serial), and rearranged the them into groups such that it should be possible to calculate the forces N/2 at a time without any fighting over shared memory (supposing you have N/2 cores). Ideally, the time for computations should therefor be linear in N.
Yesterday I spent a bit of time reading about GPGPU and was disappointed to find that apparently GPUs are quite limited in the kinds of problems they are suited for (eg, one must avoid shared global memory as much as possible and one must break work into certain sized chunks depending on hardware, and transferring data to and from GPU is slow as well). Unfortunately, it seems that the parallelization trick I came up with may only be practical on systems with many CPUs (ie, supercomputer) or on specialized FPGA's.
Big-O notation is used to describe how the complexity increases asymptotically (i.e. as n gets very big). For your algorithm, it may run in O(n) if the number of cores is greater than n/2, but that is not the asymptotic behavior. That is one of the shortcomings of using big-O. Your algorithm is still O(n\\^2), but the constant factor is divided by the number of cores. This constant factor is an important part of the practical performance, but unfortunately cannot be expressed in big-O. Saying that the algorithm is highly parallelizable is accurate and will get your point across.
Eric,thanks for the explanation and correction. I understand the difference now and will edit my claims accordingly.