]]>For mine, the Sponza using the pbrt way...

227,500 triangles

depth = 18

min tri per node = 31

30MB total size for the treeSeems to work fine for the Sponza, but on others, it just doesn't. at all.

]]>Well, I don't think we can calculate the size of the tree just by using the 8 bytes node! I mean, what about the list of triangles each one holds? On each split, 2 sides get a chunk, So for the very first split of 310K triangles, the minimum (not including overlapping triangles) would be 155K each sides. At 4 bytes as an index, that's already 1.2MB just for the first split, as a minimum, for the first depth only. Or is it me that's not doing it right?

]]>Nice numbers Vilem! This illustrates what I was saying about building the tree too far down...see how the size explodes exponentially when you decrease the maximum triangles per node. Quite dramatic really, cutting the triangles per node by 4x could make the tree as much as 40x bigger!

]]>Just for information - it might help a little:

Sample scene - Sponza Atrium

One of the worst scenes for either Kd-trees or BVHs, so you'll get really bad trees (plenty of useless nodes) for it! Although this is actually the point why we test on it. I strongly recommend trying Sponza...Ray tracer - old GPU ray tracer, mono-tracing in OpenCL application, no actual optimization (just for general testing of tree quality) - e.g. simple ray tracing algorithm. So this one is just brain-dead simple implementation of ray tracer. Wald triangles used, I think...

Total Triangle Count - 67462 tris

Criteria: SAH Splitting, Primitive criteria = 64, Sah intersect cost = 80, Sah traversal cost = 1, Sah empty bonus = 0.5

Created 6081 nodes

Size of nodes = 48648 bytes (less than 50 KiB)

Size of all Kd-tree data = 512696 bytes (0.5 MiB)

Building done in 906 ms (1 core building algorithm posted here)

Average performance = 6.32 MRays/sCriteria: SAH Splitting, Primitive criteria = 16 (e.g. if there is 16 or less triangles, create node), Sah intersect cost = 80, Sah traversal cost = 1, Sah empty bonus = 0.5

Created 114775 nodes

Size of nodes = 918200 bytes (e.g. less than 1 MiB)

Size of all Kd-tree data = 3606472 bytes (e.g. a bit more than 3 MiB)

Building done in 1401 ms (1 core building algorithm posted here)

Average performance = 7.46 MRays/sCriteria: SAH Splitting, Primitive criteria = 4, Sah intersect cost = 80, Sah traversal cost = 1, Sah empty bonus = 0.5

Created 5626177 nodes

Size of all nodes = 45009416 bytes (e.g. 45 MiB)

Size of all Kd-tree data = 112243156 bytes (e.g.112 MiB)

Building done in 6848 ms (1 core building algorithm posted here)

Average performance = 3.72 MRays/s

]]>You have 310K triangles, so if you have 300MB of triangle data that's about a kilobyte per triangle. I assume your triangle struct is not actually that big. So either the memory is coming from somewhere else or you're generating a

hellof a lot of split triangles.I'm not asking you because I want to play a guessing game; you should instrument your code and

find outwhere the memory is going!

]]>Each nodes is 8 bytes, but what about the list of triangles they hold if they are leafs?

]]>I still think that number of nodes is quite a bit too high compared to the number of triangles, but at least it's more reasonable, not like 92 million or something. But at 8 bytes per node, 211,881 nodes is only 1.6 MB. So where is your 300 MB coming from?

]]>I meant 700MB Reedbeta, not 700KB. The nodes are 8 bytes, but each leaf holds a list of triangle indexes, so at start, the left node may have 200k tri, and the right node 500k. then the right node get split, say 250k on each site, all the way down. Unless I'm doing this all wrong as usual.

I just loaded a 310,000 poly model, and that created 211,881 nodes and that took about 300MB (not KB).

]]>By a "qword" you mean 8 bytes, I'm guessing? So then if you're getting 700 MB of allocation, that's 92 million nodes. Since that's far more nodes than triangles, I'd say something's very wrong! Can you make your code count the number of nodes alloc'd to see if this figure is correct?

For 900K triangles with 20 triangles per leaf node, that's 45K leaf nodes, so 90K nodes altogether. At 8 bytes per node, that's only \~700KB of memory. Hmm...did you by any chance confuse 700KB for 700MB?

]]>The node is a qword. For the depth, I use that of pbrt...

MaxDepth = int(8 + 1.3 * log2(PolyCount));

and for the min poly per leaf I use...

log2(PolyCount)

So for a 900k poly, it gives me a depth of about 33 and 19 or so poly min per leaf. Is that too much you think?

]]>That sounds extremely excessive. How big is your node structure? It should be quite small, I'm guessing around 16-32 bytes. Even assuming you keep building the tree all the way down to \~1 triangle per node, that would only give 27-55 MB for 900K triangles (\~1.8M nodes). And you shouldn't keep subdividing that far anyway; stop when there's some small number of triangles in the node (\~8-64 I'm guessing), and that will cut the memory by 90-98%. (The memory for a tree is dominated by its leaf nodes, so every extra level of subdivision roughly doubles the memory for the tree.)

]]>Is it normal that the kd-tree space goes up to 5GB of RAM for only 400K triangles? It's not always the case, but on some scenes, it does that, and it takes much longer to build. Sometimes it goes realy fast on 900K triangles and takes only 700MB !!??

]]>oh, I see, I'll stay with your kd-tree code! Thanks for explaining, I appreciate all the help

]]>It actually depends...

Kd-trees are winning a lot when you're tracing random rays through hierarchy (e.g. path tracing) or generally incoherent rays... on the other hand they're losing a bit when testing against packets (because they're not well suited for packet tracing), although note that packet tracing is definitely not a way to go...

BVHs are quite good for coherent rays (but they're winning by just a few percents over decent Kd-trees), and overally their tree quality is worse (because doing full SAH on them is way slower than doing it on Kd-trees).

So actually Kd-trees produce better quality trees well suited for incoherent rays, random rays and good for packets (they're losing just a little).

BIHs are a mix between Kd-trees and BVH - you build them as BVH thus getting faster build times, but traversing them as packet-friendly Kd-trees - thus being very good for ray packets. On the other hand BIHs are outperformed by Kd-trees when your rays are a bit incoherent (they're losing even with reflective surfaces, not even mentioning path tracing).

What is quite a concurrent for Kd-trees is actually QBVH - tetrary BVH. It gives quite good results even for incoherent rays, they're a lot faster to build (although they're still just approximating SAH with binning - we can do the same for Kd-trees and they're not winning that much in building times).

So it really depends - even basic grid or nested grid is usable acceleration structure ... somewhere. Actually SAH Kd-trees have one huge advantage - they give very good quality for almost any kind of scene (where BVH or QBVH dies on long thin triangles - Kd-trees can handle them, even triangle fans can be handled when giving correct numbers to SAH computation).

]]>So in overall, is kd-tree faster (at render time, not build time) than bvh, bih, octree and the rest?

]]>So I can point you to Havran's PhD thesis for Kd-trees - http://dcgi.felk.cvut.cz/home/havran/phdthesis.html

- it's quite huge, but descriptive. But what might be even better is http://pbrt.org/pbrt-2ed-chap4.pdf
- PBRT chapter 4 online for free (not just SAH Kd-trees, but also BVHs and grids are described there) ... I bought the book and it's quite awesome - if you're into ray tracing like me, definitely buy it, it's worth it.
Actually you can speed up SAH Kd-tree builder with few tricks, and it can even generate trees in realtime - like adding threading support and pre-sorting bounding edges, and so on... for me optimizing it further is mostly "try & fail" with profiler.

Though if you really need more speed, you can use idea of "Binned SAH building" that is used in BVHs or BIHs and adapt it to Kd-trees - I haven't tried it yet at Kd-trees (only at BVHs and QBVHs), but there is paper from Saarland http://graphics.cg.uni-saarland.de/fileadmin/cguds/Users/cygnus/Danilewsk-GPU-kdTree.pdf describing a bit about Binning SAH Kd-tree building (on Gpu - but actually Gpu and Cpu doesn't differ that much and I think that getting idea from this is not that hard)

So there is actually quite huge research how to find best splitting plane. Anyway if you're not targeting ray tracing - but physics, consider median splitting instead of SAH - it's almost instant to build (especially with pre-sorting the primitives), because it actually splits geometry to two halves, which might be better suited for physics (and maybe even for frustum testing - but one would need to profile this).

]]>@Villeim , yes i thought they were two different things, can you point me at some paper / theory for computing the best split plane ??

i am interested to revamp those level compilers of mine

]]>#v71

I'm mostly referring to ray test of course, as mentioned if you don't work just with triangle ray tracer, splitting NURBS or such surfaces can be a nightmare (although I'm not saying that it's impossible). Another problem is, that splitting triangle generates more than 2 triangles in general case (and this is bad - F.e. on Power plant model or such - it can generate quite a huge amount of triangles - which is very bad).Although as for frustum testing - working with indices like me (btw. first it were pointers, but I needed to move the algorithm to OpenCL, where passing pointers is quite illogical - so the IDs are the way to go) helps a lot. It's quite easy and fast to remove duplicated IDs of triangles after getting triangle list to-test (to-draw). Although note that this probably doesn't count for rasterizers, where splitting would be necessary (or maybe polygon offset?) - it's better to use VBOs - which implies it's better to use hierarchical bounding volumes (BVH) rather than Kd-trees.

#Alienizer

Minimal number of tree nodes can be checked and stored during construction (global variables - or give CKDTree class a variable to store it).

The tree depth also (max function for recursive variable in argument of procedure/method and some variable in class/global variable to store the depth).E.g. you can actually pre-compute (it's actually just store, not actual computing) these variables during construction.

Ehm, and btw. no need to thank so much I just posted the code + little explanation I wrote some time ago and have been working on it since (today it's actually quite complete - no memory leaks, not so slow building times, good resulting kd-trees) + i've got quite nice results with GPU ray tracing in these Kd-trees.

]]>@Villeim Otte , when you say its not necessary to split the triangle, are you referring to a frustum intersection test ar a ray test ?

to be honest i have not understood how the triangle spanning more nodes cannot be drawn two times if both nodes are inside the frustum .

can you explain me that please ?

]]>One more thing. Is there some kind of algo to determine the tree depth and the min numb of tri per nodes?

]]>WOW :lol: Vilem Otte

This works so beautifully and fast, it fixed my problems of missing triangles in the render too!

I don't know how to thank you for taking the time to explain in such details and provide the code too! That's really, really, really nice of you......... me-->thank_you * INFINITY;

]]>Okay, so I'll try to point out full answer...

#v71

You start correctly - with bounding box containing all the triangles, but some kind of heuristics is necessary (and even you must have some) - basically you do "In the middle of current axis" and loop axes... this will end up in very bad tree thus kinda destroying purpose of Kd-trees. In practice you have to use either true correct Surface Area Heuristics (further SAH), or some kind of approximation of SAH.Also splitting triangles is not necessary - you assign it to both sub-cells and it won't get rendered double times (correct traversal ends on splitting plane!).

#Alienizer

So i'll try to explain how to build and traverse Kd-trees on my code...Basically the core procedure (or method - it's based upon whether your code is procedural or object oriented) is looking like this:

`void CKDTree::RecursiveBuildTree(unsigned int mNodeID, vector<CTriangle> *mCurrent, vector<CAABB> *mPrimBounds, CAABB mBounds, unsigned int *mPrimIDs, unsigned int mPrimCount, CKDTreeBoundEdge *mEdges[3], unsigned int mRecursionDepth)`

Where:

mNodeID - ID to current node we're working on

mCurrent - pointer to whole triangle list

mPrimBounds - pointer to all triangle bounds from triangle list

mBounds - current node bounds

mPrimIDs - IDs of triangles in current node

mPrimCount - number of triangles in current node NOTE: mPrimIDs and mPrimCount could be merged

mEdges - bounding edge of all triangle bounding boxes for each axis and for each object there are 2 - start and end bouding edge

mRecursionDepth - self-explaining itselfSo basically in Kd-tree building you do:

1.) Check whether the current node can be leaf- if it can, make leaf out of it:`// Check whether we hit conditions for leaf if(mPrimCount <= this->mMaxPrimitivesInNode || mRecursionDepth > this->mMaxRecursionDepth) { // mTriangleIDs is std::vector containing just uint indices to triangles std::vector, grab pointer to end (e.g. where triangle ids for this node starts) unsigned int mPointer = (unsigned int)mTriangleIDs.size(); // Push triangle ids from this node to indicies std::vector for(unsigned int i = 0; i < mPrimCount; ++i) { mTriangleIDs.push_back(mPrimIDs[i]); } // Init leaf node with primitive count mPrimCount at mPointer position from mTriangleIDs buffer this->mNodes[mNodeID].InitLeaf(mPointer, mPrimCount); // voila! return; }`

Now if we don't get to this code, we need to split the node...

2.) Splitting the node

2.1) Finding the split axisThere is about a zillion ways how to find sometimes better, sometimes worse splitting axis. I'll put here correct SAH splitting:`// Get longest axis int mAxis = mBounds.GetLongestAxis(); // Split position - if we couln't find a split with good sah cost, let's split in the middle of longest axis float mSplit = mBounds.mMin[mAxis] + (mBounds.mMax[mAxis] - mBounds.mMin[mAxis]) * 0.5f; // Some variables for SAH unsigned int mSahBestAxis = 0; int mSahBestPos = -1; int mSahBestCost = 2000000000; int mSahBelow, mSahAbove; float mTotalArea = mBounds.GetSurfaceArea(); float mInvTotalArea = 1.0f / mTotalArea; CVector3 mDimensions = mBounds.mMax - mBounds.mMin; // Loop through all axis for(unsigned int i = 0; i < 3; ++i) { // Note: this part can be pre-computed to speed up Kd-tree build // Let's get bounding edges (start and end) from triangle bounding boxes for(unsigned int j = 0; j < mPrimCount; ++j) { int mCurrentPrimitive = mPrimIDs[j]; mEdges[mAxis][2 * j] = CKDTreeBoundEdge((*mPrimBounds)[mCurrentPrimitive].mMin[mAxis], mCurrentPrimitive, true); mEdges[mAxis][2 * j + 1] = CKDTreeBoundEdge((*mPrimBounds)[mCurrentPrimitive].mMax[mAxis], mCurrentPrimitive, false); } sort(&mEdges[mAxis][0], &mEdges[mAxis][2 * mPrimCount]); // Now we find the best splitting pos for current axis // Rule: It will be on bounding edge of one of the primitives mSahBelow = 0; mSahAbove = mPrimCount; // So loop through all bounding edges and compute SAH on left and right, store lowest price one for(unsigned int j = 0; j < mPrimCount * 2; ++j) { if(mEdges[mAxis][j].mType == CKDTreeBoundEdge::END) { --mSahAbove; } float mSplitCandidate = mEdges[mAxis][j].mPosition; if(mSplitCandidate > mBounds.mMin[mAxis] && mSplitCandidate < mBounds.mMax[mAxis]) { unsigned int mOtherAxis0 = (mAxis + 1) % 3; unsigned int mOtherAxis1 = (mAxis + 2) % 3; float mBelowArea = 2.0f * (mDimensions[mOtherAxis0] * mDimensions[mOtherAxis1] + (mSplitCandidate - mBounds.mMin[mAxis]) * (mDimensions[mOtherAxis0] + mDimensions[mOtherAxis1])); float mAboveArea = 2.0f * (mDimensions[mOtherAxis0] * mDimensions[mOtherAxis1] + (mBounds.mMax[mAxis] - mSplitCandidate) * (mDimensions[mOtherAxis0] + mDimensions[mOtherAxis1])); float mBelow = mBelowArea * mInvTotalArea; float mAbove = mAboveArea * mInvTotalArea; float mEmptyBonus = (mBelow == 0.0f || mAbove == 0.0f) ? this->mSahEmptyBonus : 0.0f; float mCost = this->mSahTraversalCost + this->mSahIntersectCost * (1.0f - mEmptyBonus) * (mBelow * mSahBelow + mAbove * mSahAbove); if(mCost < mSahBestCost) { mSahBestCost = mCost; mSahBestAxis = mAxis; mSahBestPos = j; } } if(mEdges[mAxis][j].mType == CKDTreeBoundEdge::START) { ++mSahBelow; } } // Continue on next axis mAxis = (mAxis + 1) % 3; } // Store best splittng axis and position (it's our split plane) mAxis = mSahBestAxis; mSplit = mEdges[mSahBestAxis][mSahBestPos].mPosition;`

2.2) Assigning primitives to left or right or left and right sub-nodesSo let's continue...`// Allocate left and right + set counters to zero (I know I could use std::vector - but it's slower ;) - found out through heavy profiling of this code) unsigned int* mLeft = new unsigned int[(*mCurrent).size()]; unsigned int mLeftCount = 0; if(!mLeft) { // Out of memory } unsigned int* mRight = new unsigned int[(*mCurrent).size()]; unsigned int mRightCount = 0; if(!mRight) { // Out of memory } // Note: Those out of memory checks are necessary // I used this Kd-tree on F.e. Power plant model, which is quite huge and I actually got out of memory sometimes :D // Not a problem now though - bought more RAM :D - 16GB is enough for now :P // Now loop through all primitives and get their min and max bound on splitting axis for(unsigned int i = 0; i < mPrimCount; ++i) { float mMax = max(max((*mCurrent)[mPrimIDs[i]].mVerts[0][mAxis], (*mCurrent)[mPrimIDs[i]].mVerts[1][mAxis]), (*mCurrent)[mPrimIDs[i]].mVerts[2][mAxis]); float mMin = min(min((*mCurrent)[mPrimIDs[i]].mVerts[0][mAxis], (*mCurrent)[mPrimIDs[i]].mVerts[1][mAxis]), (*mCurrent)[mPrimIDs[i]].mVerts[2][mAxis]); // If their max is lesser than splitting pos, they-re left-only if(mMax <= mSplit) { mLeft[mLeftCount] = mPrimIDs[i]; mLeftCount++; } // If their min is greater than spliting pos, they-re right-only else if(mMin >= mSplit) { mRight[mRightCount] = mPrimIDs[i]; mRightCount++; } // Otherwise assign it to BOTH - left AND right else { mLeft[mLeftCount] = mPrimIDs[i]; mLeftCount++; mRight[mRightCount] = mPrimIDs[i]; mRightCount++; } }`

2.3) Now calculate sub-nodes bounding boxes + recursively build themSo...`// Calculate left and right bounds CAABB mLeftBounds = mBounds; CAABB mRightBounds = mBounds; mLeftBounds.mMax[mAxis] = mSplit; mRightBounds.mMin[mAxis] = mSplit; // Build left sub-tree (left sub-node is at node+1 address) this->RecursiveBuildTree(mNodeID + 1, mCurrent, mPrimBounds, mLeftBounds, mLeft, mLeftCount, mEdges, mRecursionDepth + 1); // Next child will be right bound - now we can init current node as interior unsigned int mAboveChild = this->mNextFreeNode; this->mNodes[mNodeID].InitInterior(mAxis, mAboveChild, mSplit); // Build right sub-tree this->RecursiveBuildTree(mAboveChild, mCurrent, mPrimBounds, mRightBounds, mRight, mRightCount, mEdges, mRecursionDepth + 1); // Clean used data delete [] mLeft; delete [] mRight;`

So that's all!

3.) The TraversalNow the another important thing is the traversal. Because we have some triangles duplicated (note: splitting is actually very wrong way to do this! - we don't always have to work with triangles, and splitting F.e. NURBS surfaces is a nightmare), we need to write correct traversal!

`// Stack traversal element, contains pointer to current node, start and end of ray struct CKDTreeStackElem { const CKDTreeNode *mNode; float mNear, mFar; }; // KDtree intersect routine int CKDTree::Intersect(vector<CTriangle> *mTriangles, const CRay &mRay, CVector3 *mBarycentric, float *mDepth, int *mID) { // Start and end of ray in start (yeah, no near/far clipping planes like in bad rasterizers! - just intervals!) float mNear = 0.0f; float mFar = INFINITY; // Early out those rays, that miss the whole scene (bounding box of it) if(!this->mBounds.IntersectRay(mRay, &mNear, &mFar)) { return 0; } // Stack for stack traversal CKDTreeStackElem mStack[KD_TREE_STACK_SIZE]; int mStackPos = 0; // Temp variables int mHit = 0; float mTempDistance = mFar; CVector3 mTempBarycentric; // We start at root node const CKDTreeNode* mNode = &this->mNodes[0]; // While the node to traverse exists (it could be while(1), but it's necessary to handle EMPTY scenes) while(mNode != NULL) { // If current nearest hit is closer than ray start, this node won't have nearest hit if((*mDepth) < mNear) { break; } // If node is interior node if(!mNode->IsLeaf()) { // get it's splittng axis and hit-distance to splitting plane hit int mAxis = mNode->GetSplitAxis(); float mHitPosPlane = (mNode->GetSplitPosition() - mRay.mOrigin[mAxis]) * mRay.mInvDirection[mAxis]; // Order whether left or right goes first and second (in terms of distance) const CKDTreeNode* mFirst; const CKDTreeNode* mSecond; int mBelowFirst = (mRay.mOrigin[mAxis] < mNode->GetSplitPosition()) || (mRay.mOrigin[mAxis] == mNode->GetSplitPosition() && mRay.mDirection[mAxis] >= 0.0f); // Left first, then right if(mBelowFirst) { mFirst = mNode + 1; mSecond = &this->mNodes[mNode->GetAboveChild()]; } // Right first, then left else { mFirst = &this->mNodes[mNode->GetAboveChild()]; mSecond = mNode + 1; } // Only first if(mHitPosPlane > mFar || mHitPosPlane <= 0.0f) { mNode = mFirst; } // Only second else if(mHitPosPlane < mNear) { mNode = mSecond; } // Otherwise push second on stack (with correct ray start/end), and continue in first else { mStack[mStackPos].mNode = mSecond; mStack[mStackPos].mNear = mHitPosPlane; mStack[mStackPos].mFar = mFar; ++mStackPos; mNode = mFirst; mFar = mHitPosPlane; } } // Otherwise we got a leaf node else { // Just test against all primitives (the 1 primitive branch is just optimization for really huge scenes with tiny geometrym you can drop it out) unsigned int mNumPrims = mNode->GetPrimitivesCount(); if(mNumPrims == 1) { const CTriangle &mTri = (*mTriangles)[mNode->mPrimitive]; if(mTri.Intersect(mRay, &mTempBarycentric, &mTempDistance)) { if(mTempDistance < (*mDepth) && mTempDistance > 0.0f) { (*mDepth) = mTempDistance; *mBarycentric = mTempBarycentric; *mID = mNode->mPrimitive; mHit = 1; } } } else { //const unsigned int *mPrims = mNode->mPrimitives; const unsigned int *mPrims = &this->mTriangleIDs[mNode->mPrimitivePtr]; for(unsigned int i = 0; i < mNumPrims; ++i) { const CTriangle &mTri = (*mTriangles)[mPrims[i]]; if(mTri.Intersect(mRay, &mTempBarycentric, &mTempDistance)) { if(mTempDistance < (*mDepth) && mTempDistance > 0.0f) { (*mDepth) = mTempDistance; *mBarycentric = mTempBarycentric; *mID = this->mTriangleIDs[mNode->mPrimitivePtr + i]; mHit = 1; } } } } // If there are nodes on stack if(mStackPos > 0) { // Pop out --mStackPos; mNode = mStack[mStackPos].mNode; mNear = mStack[mStackPos].mNear; mFar = mStack[mStackPos].mFar; } // Otherwise end traversal else { break; } } } // Return result return mHit; }`

Okay, this should be all... I hope it can help a little

]]>ok thanks v71

]]>Yes, you have , the drawback is that at rendering time, you have to check if the triangle has been already rendered to avoid drawing two times, this adds another stage, scan trhough all the triangles, see if a flag has been set , if not draw and set to true if yes , skip it.

I found this very inefficient so, at the cost of putting more triangles in both node, i skip totally the conditions checking at runtime, because if you don't you need to build vertex buffer object during the rendering time, for me its a no no.

I split triangles, create vbo in an offline phase and at rendering time i just call the vbos

]]>Very clear thank you. But what if I can't split a triangle? Do I simply include it in both children?