xcrypt at February 19th, 2012 07:26 — #1
I'm wondering how lights are usually handled in Direct3D10.
I'd figure something out myself, but I think it's better to know the standard protocol.
Imagine you have a level with a thousand lights, but when you pass those lights to a shader, of course you can't deal with every light. I think to iterate over all the lights in the scene for every shader, and sort them according to distance(squared) with the object that is about to get shaded, then pick the first 10 or so, and pass them to the shader? But even that seems like a huge operation, because you need to do that for per frame, per object...
Any help is appreciated.
reedbeta at February 19th, 2012 11:31 — #2
In a forward shading renderer (that is the "typical" kind of renderer in which material properties and lighting are all evaluated in one pixel shader), for correct results you should identify which lights affect which objects, using bounding volume intersection tests and suchlike. For instance, for a point light with a spherical region of effect, you'd push the bounding sphere through your scene graph and see what objects it intersects. Each of those objects must then be rendered with that light. You can't impose an arbitrary cutoff like 10 lights, or you may cause artifacts and miss some lights in some cases. Additive blending can be used to accumulate light, with each object being rendered as many times as necessary to accumulate all the lights acting on it. As an optimization you might include shaders that compute two or more lights at once, to cut down on the total number of draws.
Another approach is deferred shading. If you haven't heard about this, google it; there are many, many articles on the subject. Briefly, the idea is to render material properties (color, normal, specular intensity and power, etc.) into an offscreen buffer (called a "G-buffer" for some reason), then do lighting in image space by writing a pixel shader that fetches the material properties from the G-buffer and evaluates the lighting equation. This decouples material shaders from lighting shaders, greatly reduces the number of draw calls, and doesn't require you to track which lights hit which objects; but it also requires more memory bandwidth, and places restrictions on the shading model due to all its parameters having to fit into the G-buffer.
In either of these approaches, good frustum/occlusion culling is essential for good performance; it pays to spend some effort making sure you don't draw things that aren't visible. And none of this is really about D3D10; these are generic rendering approaches that are applicable to any API. Finally, since you mentioned the workload per frame of figuring out what to draw, note that it's often possible to exploit temporal coherence here, since what to draw this frame is usually pretty similar to what you drew last frame. Try to build a data structure that allows you to hang on to some of the information from last frame rather than starting from scratch; of course it also must be able to adapt to changing circumstances, but you can still save a lot of performance this way.
xcrypt at February 19th, 2012 13:43 — #3
Thanks, very valueable information.
Rendering things like that is indeed generic, but I just wanted to point out that I'm using D3D10.
Could you give a simple example of the datastructure you mentioned, or just names/links/references?
Something that could help me get started
EDIT: I mean a data structure that wouldn't store the frames but the differences in the frames, what you mentioned last
reedbeta at February 19th, 2012 22:11 — #4
As an example, in many cases in a renderer you want to sort objects by shader, and the shaders draw in a certain order. Rather than accumulating all objects in a giant list and then sorting them, you could have a list for each shader; when you determine that an object needs to be drawn, add it to the list for its specific shader. These per-shader lists of objects are cleared and reconstructed each frame but the list of shaders is retained from one frame to the next and is already in order, so you don't need to do any sorting. This is a form of what's called bucket sort, where the shaders are the buckets.
The set of shaders being rendered might not churn at all depending on your app, but If it does churn, you can add new shaders by insertion-sort when necessary (insertion-sort is actually quite fast for nearly-sorted lists - although beware of corner cases that lead to inserting many elements at once), and delete ones that have had no objects added for a few frames.
This approach of bucket sorting with inter-frame-persistent buckets that are relatively rarely created and deleted - can be used for various cases where objects need to be sorted but the set of objects and their sort order will usually not change too quickly.
Also, try not to alloc/dealloc a lot of objects on the heap when doing all these operations, as that will also eat into your performance. Intrusive linked lists (i.e. those that store the next/prev pointers as members in the object itself, rather than having an external heap-alloced node struct, like std::list and friends) are helpful for this.
xcrypt at February 20th, 2012 08:45 — #5
Hmm, just one question: why would you prefer a list over a self-balancing-BST structure that autosorts on insertion? (std::set for example)
EDIT: And also, why would you reconstruct the per-shader list every frame?
thenut at February 20th, 2012 10:24 — #6
It's also not a good idea to render to many dynamic lights as that will chew into your framerate. I typically limit myself to 8 maximum, or 4 with lights that cast shadows. Even that can push the limits of your hardware. If I need more than that, I would consider baking lightmaps for static light sources or compute dynamic lightmaps whenever I need them. Deferred lighting is another approach.
why would you prefer a list over a self-balancing-BST structure that autosorts on insertion?
The difference is preallocation vs dynamic allocation. By preallocating an array containing all your shaders, with each shader containing an arbitrary list of objects that will be rendered in that frame, is faster than mingling (inserting, removing, sorting) the list every frame. Since the shader list is already sorted, you just have to blast through the array and render any objects assigned to them for that frame, which is O(n) time. If you wanted more control you could implement it as a BST, but you'll only benefit from that with hundreds of shaders. I'm not sure what allocation mechanism std::set uses internally, but you'll most certainly want an array based allocation so you don't waste time allocating and deallocating nodes off the heap, or dealing with memory fragmentation issues.