After carefully testing your fast sine function again (the c version, not the assembly version), it does seem to be about twice as fast as the standard function. =)
So starting implementing your dimensional sorting suggestion. Using insertion-sort to sort the objects in each dimension is super fast since there's good coherency between frames. Then the idea was to take each object and use the sorted lists to figure out which other objects are nearby in each dimension. Across the 3 dimensions, there would be some duplicate nearby objects, so I'd cull those and then finally do interactions for what's left over.
However, dimensional sorting doesn't seem to work if my people/objects are different sizes, which is desirable for my project. Still thinking about workaround...
Further, for each dimension (say x), each object potentially collides with all the objects inside of a slice in the other dimensions (all the objects in the collision range in x, but with any y or z). This is really expensive if the object density is high. This isn't an issue for a 1D system, and less for a 2D system, but seems like it would be crippling in 3D. Is there something I'm missing?
When I get a chance I may finish up the sort thing and get some real numbers anyway.
@alphadog Well, I've already established that yes, certain functions are being called upwards of 10million times a second. Even if I can reduce the number of interaction checks I can do with spatial partitioning, those functions will be called a lot. For example, I just benchmarked my collision algorithm, and commenting out its contents doubles the performance of my application. Performance is ok now, but once I add artificial intelligence for each object, its going to get slow.
The idea of writing a sort-algorithm in assembly never even crossed my mind. I don't think I will try it. Do you have a suggestion for a spatial subdivision algorithm that works well for many dynamic objects? My intuition says the cost would outweigh the benefit because of all the updating, but I'm not so experienced with these things.
Adding an initial box-test to my collision algorithm boosted performance by about 15%, which is not groundbreaking, but nice for 3 extra lines of code.
I also benched the circular, doubly-linked list that I was storing my objects/people in versus ordinary arrays. Iterating through the objects for interactions and rendering is only between 0 and 20% faster (total application performance) with arrays. I'll stick with the convenience of the linked-list for now.