http://raytracey.blogspot.com.au/ at that url, theres 1000 cubes getting raytraced (actually, pathtraced!!!) in realtime, how does this work so fast? youd think it would have to be a separate intersection test per cube! (and 1000 would kill the computer for sure.)
Really good spatial subdivision structures I guess. Either a BVH or a kd-tree that cuts way down on the number of cubes you actually have to test for an individual ray.
There were some interesting articles from NVIDIA recently about parallel BVH traversal and BVH construction. They're written about CUDA but the same principles should apply to OpenCL or to Direct3D11 compute shaders.
I did some experiments with instancing ray tracer. In reality, ray tracing can benefit from instancing much more than regular rendering: it doesn't matter which instance you intersect ray with.
I couldn't achieve realtime, but total triangle count is very high--each dragon has about 1M tris!
}:+()___ (Smile), yes! thats what i figured, it wasnt spacial division its something else. So how did you make yours - even though it wasnt quite fast enough?
The closest I could get to it would be like what stainless said, with distance fields, and just detect the closest primitive...
Well, I have nothing fancy: one global hierarchy + local per object hierarchy. Each hierarchy is recursive list of AABB with about 100 boxes per branching, so total depth is very small. Such hierarchy will be terribly slow if used in per ray basis but I intersect groups of 32N rays with one AABB/triangle list, so I have reduced branching and good parallel performance.
oh.. so it is spacial division... you know, i had this crazy idea that i could get instancing perfect as long as objects didnt overlap, using the old % mod trick on position, as long as you dont share space you only need to intersect the model once then modulate the position (it comes out like a grid at first) i was thinking you could work with that.
but often you do share space, so maybe its pointless...
Instancing here keeps memory lower, but u still have to traverse whole hierarchy. Basically thats what instancing is about in ray tracing.
you know, i had this crazy idea that i could get instancing perfect as long as objects didnt overlap, using the old % mod trick on position, as long as you dont share space you only need to intersect the model once then modulate the position (it comes out like a grid at first) i was thinking you could work with that.
You still have to move to the next cell if you don't found intersection in current. I tried grid-based instancing for grass but it's slow especially near horizon where rays can transverse many cells without hitting anything.
yeh, its a no free lunch...
Finally i got some time to better answer.
Grid traversal is highly uneffective the further your ray goes (this can be mostly solved by SVO or such though).
Anyway instanced ray tracing isn't that much faster than standard uninstanced one. I'd say it's even slower than doing uninstanced rendering. Imagine scene, we have 1M triangle mesh. lets place this mesh on 5 positions in world.
How simple standard ray tracer works:
- (possible to precompute) Create acceleration structure (F.e. Kd-tree) for whole scene (with F.e. SAH - to get realtime performance), leaves contains triangles
- Traverse each ray through Kd-tree
How simple instanced ray tracer works:
- (possible to precompute) Create acceleration structure for our mesh (Kd-tree with SAH), leaves contains triangles
- (possible to precompute) Create acceleration structure of scene, where leaves countains another Kd-tree (the object's one)
- Traverse ray through scene hierarchy
- If leaf is hit, then traverse ray through object hierarchy (transformed to local object space)
While point 1. will be a lot slower in standard ray tracer (for 5 meshes containing 1M tris, and ideal SAH Kd-tree builder O(N log N) - we create the Kd-tree at least like 5.6 times slower).
In importatnt point 2. the standard ray tracer will most likely own the instanced ray tracer (assuming we have good layout of Kd-tree in memory so we won't lose that much on cache misses).
Note that in 1st case our demands for memory will be more than 5 times bigger! So this is what instancing saves. Performance - not so much!
We would though need to benchmark it.
Yes, the biggest advantage of instancing raytracer is memory footprint: my test scenes easily exceeded 1G triangle count. Second advantage is dynamic scenes: rebuilding of the objects' acceleration structure much faster than rebuilding K-d tree of the whole scene.