vilem_otte at August 19th, 2012 17:17 — #1
as I mentioned in the article - http://devmaster.net/forums/topic/16715-fastest-way-to-display-a-static-model/
- here is my little attemp to try to explain those "why's". I'd be glad if anyone made a reply containing - whether I shall continue and further do the explanation (e.g. explain stuff like how does triangle rendering & shading work and why, and also how to access buffers - etc. etc. - there is really a TON of stuff to explain) ... now few little notes for beginning...
I'm really NOT writing this or the accompanying application as something high-performance, not that this can't be turned into high-performance renderer in the end - this whole thing is to explain how rendering works and mainly WHY it works like that.
So what do we need (for compiling & using the application) - you need just Gtk, C99 standard compiler and ehm... thats probably all.
What is our goal in this part - our only goal in this part is to write two basic types into the renderer - the framebuffer and the (vertex)buffer. Plus additional procedures for clearing framebuffer and drawing points (really the easy and extremely useless way now - just to show something on screen).
The source + headers + makefile with explanation is here http://www.otte.cz/Graphics\_Lesson1.tar.gz now the details (e.g. the article):
1. The Framebuffer object The word framebuffer is quite self-explaining itself - it's the buffer that holds the frame (e.g. the result of rendering). Of course this explains why it has to have width, height and some data. Additional parameters are holding data like number of channels in it (this comes useful when you have different types of framebuffer, like RGBA (4 channels), or DEPTH (1 channel)) and size of one pixel in bytes (useful for simple stepping through buffers pixel by pixel).
So why do we need framebuffer - basically the framebuffer is what you think, it's a thing that allows us drawing to texture (because storing dimensions + pixel size + channels are actually texture infomation) - and textures are good. Our resulting image is a texture, or if we want for example shadow mapping for shadows - we need rendering to texture - this is where framebuffers are quite useful. Note that we can bind different textures to framebuffer sequentially (changing the parameters and data pointer of framebuffer now, but actually I'd like to point out that there should be pointer (or pointers) to some texture_t type). If I'll have some success with this thread - I'll extend this so we'll be able to attach F.e. multiple textures to framebuffer (and this explaining how MRT work).
2. The (Vertex)Buffer object Further I'll call this just buffer. This one is quite similar to Framebuffer, but stores just 1D data. The buffers mostly stores vertices (4D floats) and also other parameters for them (texture coordinates, normals, etc. etc.). So...
Why vertex buffers? Why won't we call Vertex(x, y, z, w) all the time? Because calling that is a huge waste of resources and for static meshes, it's even more huge waste! There is a little (very very little computing power we need for each call - but doing that 1 million times a frame more is really quite a bit of computing power). For static objects this is even larger - because we can create buffer once at start and then we don't have to touch it, for dynamic objects we can just update vertices that need to be updated (so rest vertices wont be even touched) + updating buffer in a loop is faster than calling Vertex(x, y, z, w) zillion times (you'll save call overhead multiplied zillion times).
But this isn't all - vertex buffers are stored in a single memory block
- so it means, that transforming vertex buffer in a loop is a lot faster than doing it per vertex (memory accessing is a lot faster - because next vertex to transform will most likely be in cache - be it on CPU or GPU). Also the matrix-vertex multiplication (and practically whole vertex processing) can be batch-processed - and thats a win (in performance of course)!
So how do we work with buffers? Because this renderer is a state machine, we work with single framebuffer/vertexbuffer at once - and that single is the binded one. I mostly followed the way OpenGL does it (but DirectX isn't far off from this - in the end they're both quite similar)
- so you have to generate the object, bind the object and then you can work with binded object (e.g. fill it with data, read parameters from it and write into it's data). Although writing to them is quite simple now it will work until we meet the parallelization (then it becomes not-as-simple) - I hope I'll ever make it that far - to explain why we need to map/unmap buffers in OpenGL.
The last thigs are - basic operations - clearing the currently bind framebuffer (just see the file basic_ops.c) and draw operations with brain dead simple point write (warning, no clipping occurs to really keep it brain dead simple).
Well, this should be probably everything for now. I hope I'll get at least a bit of feedback whether it is helpful and whether I should continue or whether I should rather to stick to programming (instead of explaining - I know that I'm not a good teacher). If this is understandable and explaining, at least a bit - and if you allow me to continue - I'd like to get to lines, triangles, vertex processing and pixel processing next time(s). And then I might get enough courage to post as article (and feel like real ninja on devmaster :ph34r: ).
And by the way, thanks for reading this.
alienizer at August 19th, 2012 18:33 — #2
WOW, nice! Thank you, and YES please keep going like a Ninja :ph34r:
You are pehaps jumping to conclusion too fast, maybe you try to explain too many things at once? donno, I understand it all, but for a beginner, I'm not sure if he will.
What I think, is first explain how the GPU works, for example, does it scan the viewport pixels and convert that to vertex and call the vertex shader? Or does it loop the list of triangles and calls the vertex shader for each vertex? When you call a gl function, is it executed right away or does it go in a statement buffer until you call another gl function? You know, the basic oparating of the GPU so we know what it does. Like explaining how a car works, need gas for the pistons to go up and down, front wheel for easy steering, not the back wheels, they steer too fast, etc. Not to go into too much technical details that gets complicated.
Once we have a good base of the GPU, then explain about what it expect from the programmer in order to display his model. Including limitations, such as 32 textures max or whatever, so the programmer don't start thinking 'oh, I can fit the whole thing in 2,000 textures and draw the whole thing at once' kind of things. What I mean by "what it expect" is some of the gl functions to get going, and why they are needed and what they do, and why in that specific order. That kind of stuff.
I'm probably missing tons of stuff, but that's just a geneal idea of what I think the layout of a tutorial should be for everyone, not just for the genius all of you are. If I had that tutorial when I first started, I would never have posted on this board those dumb questions of mine!
vilem_otte at August 19th, 2012 19:35 — #3
Basically I'd like to continue in this into really full article explaining how today's GPUs work - because in the old times (what most articles are talking about) we had some bunch of vertices, set texture and light, send it to GPU and it magically created a scene out of it (thats the all time blamed fixed function). Today the GPUs works in very similar way to our multicore CPUs (e.g. not even close to what they did before) - e.g. they're general, parallel and vectorized (like SSE in our CPU). So explaining it whole in single article is quite a TON of information (and most people would run away from it).
Although lots of people are still learning from articles all time discussing about fixed-function and using it and in my opinion thats very wrong - it's unnatural for GPU to work that way today - it's like writing 386 processor emulator for our core i7 and then using it to actually do stuff (but it's even worse, because 386 was architecturally closer to core i7 than old GPU to new programmable one).
I could also make an OpenGL code for each article - that could give idea how much shorter is calling just GPU to do the stuff (e.g. to see how much work the OpenGL library actually does for us). And it could also be better for beginners in OpenGL to understand what is going on in the library. (Thanks for pointing me here).
Anyway your point that in some introduction where I should say how GPU works today and the basics of it before I drop implementation at the reader is very good (the purpose of this thread is mainly finding some good way to structurize the article(s) and if they will be useable, then if the staff here agrees, I'd like to publish them here on DevMaster.net).
Okay... time to play Nightcore and start coding
And well another thing came to my mind - I'm one of the linux guys, though I think it could be good that I'd give away also windows binary + source (because I doubt that most beginners work on linux).
alienizer at August 19th, 2012 20:53 — #4
Very good points. The new GPU stuff is a must, and drop the old as there are many tutorials on that old stuff. Windows, yes, Linux, yes, but what about a more general one? pseudo-code, so everyone can understand it, not just c++ users, and pseudo-code is platform independent.
Maybe write a table of content first to make sure it's well organized and covers everything you want it to cover?
About a title for the tutorial?...
In depth tutorial - Today's GPU - by Vilem Otte
vilem_otte at August 19th, 2012 23:15 — #5
On the one side pseudo code is fine (for description), but also giving some real C code thats actually showing the stuff in motion is important in my opinion.
So far in my "pseudo" table of contents I've got:
1.) The basics - how does CPU \<-> GPU interaction actually work - basically all the stuff the graphics developer should know about GPU
2.) Implementing basic software renderer (e.g. emulating toplevel library + driver + GPU) - the "first triangle", of course it will be a lot simplified (we'll fuse those 3 together to keep it simple (it'd really get quite complicated if we wouldn't) - although one can get a picture how much work one needs to draw a single triangle)
3.) Doing the same magic as GPU (e.g. rendering some actual scene with textures, lighting and maybe shadows) - e.g. to see that created library is capable of actually rendering the stuff (and I hope i'll manage to get at least some fps on my Core i3 here on laptop).
4.) ... (Any hints here?)
Basically I will break 2.) into few pieces, and 3.) into two or three. The whole thing should be to show how much stuff is internally happening (1), how does rendering actually work (2) and to show that whole thing wasn't as useless as it seems to be (3).
Note: Fusing those 3 together is also necessary for getting quite good performance. I'd recommend looking at mesa for actual re-implementation of what EXACTLY graphics library, driver and hardware does ... after 5 minutes it should be clear, that simplifying the stuff is really necessary (especially for people that don't know too much about how exactly rendering works - dropping implementation and simulation and description of virtual hardware on then would be really scary in my opinion). And the last thing is performance - doing it the mesa way is slower on CPU and I really mean a lot slower.
rouncer at August 20th, 2012 17:50 — #6
As far as i know the main things to avoid with using direct x is bus overload with too many state sets or matrix sends when animating objects versus projection count of points. if you hit a before b then youve completely screwed it up and not getting max projections. another no no is using the geometry shader for instancing. woops thatll pump 3 times too many projections into it and slow you to a halt also. this idiot did all of the above, so definitely i know what to avoid.
nifty thing to do is run the whole thing on gpu, all the animation on gpu circumvents the bus send problem for approx 4 times the speed on my comp, with an experiment.
all that just gives you the max instance count achievable, so if you want to render a detailed city, you need to render appropriately - i bet all this is in your doc, ill scan read it over, 2 cents added.
forgive my french, but this is extremely idiotic and i prefer to call it rooting the bus, avoid it at all costs its the most lame excuse for render code there is.
vilem_otte at August 21st, 2012 19:13 — #7
Okay, so far I've got implemented the whole thing (just few little details remain) - it's not optimal and written in plain C (thats why it went THAT fast). So far it seems that it will really be big and I mean it.
I asked myself a few questions what I want and what I don't want to include (now I mean in the article). Because well... it's quite huge (and I'm glad I decided to merge whole thing together and not split it to virtual machine, driver, toplevel library and client application - because well, then it would really be a lot bigger than it's now).
So far I've implemented (and thats probably everything what will be in the description) - rasterization as fixed function (e.g. client application can't re-work this one - like it's in OpenGL/Direct3D), programmable vertex & pixel shading (when you get idea how it's working here, you'll most like get idea that adding geometry shader there isn't that hard (same goes for tessellation shaders)) - e.g. to deliver user the idea that HE is writing most of the stuff, GPU just does what it's said like the normal CPU (ehm... in case of this project it actually is normal CPU), and calls for "driver/toplevel library". It's actually in single project and everything is using the single one same CPU - but the code is structured (partly commented - need to finish this one) and uses different naming conventions to get user idea what is done on device and in driver in real scenario.
Now is the time to do some "real scene" into this (one with texture & per-pixel lighting (maybe even shadow maps, if my Core i3 will manage to do it interactively) for example) and write the article (with purpose to help people understand what is actually going on when I do rendering (at least I hope I'll help them understand)).
alienizer at August 26th, 2012 17:41 — #8
We all can't wait :ph34r:
vilem_otte at August 28th, 2012 18:22 — #9
It's quite time to tell my progress, right now I got like 10 pages in word processor written, thought still covering like two thirds of the stuff I'd like to cover there. An application is complete and needs just few commentaries right now, and I'm happy it works quite fast on my Core i3, and very fast on my Core i7 (even though it's not written to be fast).
I'll also upload application in few days (maximally - I need to make and debug Windows port).
alienizer at August 29th, 2012 00:05 — #10
Sounds very exciting! I hope the sysop will put it in it's own section.
vilem_otte at August 30th, 2012 19:21 — #11
Okay article structured and mostly done (as for now it has some 17 pages of text, including few images and math) ... now just get ready 2 sample applications (and some tech stuff about them to article) for it and then write some final words. Note that I actually have no idea what to do with it, I'll definitely upload it to my server (as pdf) and post link here, possibly getting some feedback on errors there (if they would be there - nobody is perfect and we make mistakes, especially in works done "overnight" after work) ... and if it will be good enough, I'd be very honored to post it here on devmaster (somehow if that would be possible).
reedbeta at August 30th, 2012 19:32 — #12
Vilem, we would be happy to have it hosted here on DevMaster as an article, or perhaps split up into an article series given the length. On the articles page can you see the "Submit Article" button? (I'm not sure if it's only there for mods, or for all users.) Once you're ready, you can submit there, or just PM me or Dia and we can put stuff up there.
vilem_otte at September 4th, 2012 19:49 — #13
Okay, after like 2 days I managed to get back to this project - and so far everything is checked as complete (except the demonstration programs). Right now I'm working on them and today (meaning Wednesday), or tomorrow I'll most probably finish them.
alienizer at September 4th, 2012 22:54 — #14
vilem_otte at September 5th, 2012 06:40 — #15
Okay, first sample application done (Sorry for delay, I've been actually sleeping between these posts). It'll need though code cleanup & comment a bit more - but it's finally working. It has been heavily inspired by cubes with very old NeHe's texture
Here is image of wonderful application! Note that it features everything from clipping, perspective correct texture coordinates (e.g. the stuff that device is doing these days without our interfering), through vertex shading (where matrix-vector multiplications are going on) to pixel shading (where texture is sampled).
And no project would be complete without a glitch in development. On this one the clipping went actually crazy (I'm not totally sure what happened there, but I was working with our beloved W coordinate of vertex and messed something up).
I like the glitch! :ph34r:
Note. It's though a bit slow - although it could be made at least 10 times faster with a while spent on optimizations (using more intrinsics in the code, more static memory instead of dynamic, modifying half-space triangle raster procedures to work on NxN blocks instead of single pixel, also interpolating perspective-correct texture coordinates is done from barycentric coordinates per pixel (using deltas would make it a lot faster), etc.) - basically I think it's quite descriptive (the code) and it can be seen whats going on there and that was my point. Also this was actually created during evenings and free time in quite a little time - optimizing it would need at least another week or so, and the code would be a lot more messy than it's now. Okay time to cleanup code, comment and do the second sample (scene model and textures are ready!).
vilem_otte at September 5th, 2012 07:45 — #16
Okay, so I'm gonna get a little break - anyway - the paper & sample are at http://www.otte.cz/Graphics.pdf and http://www.otte.cz/Graphics\_Sample1.tar.gz
Any feedback is welcome, if there is some mistake, or anything missing, please write here and I'll add it to the article (or sample). If it's okay, please write it and I'll try contact Reedbeta about putting the document here on DevMaster.
I'll add second sample later in the process
alienizer at September 12th, 2012 12:05 — #17
Very well done! I can't wait to see what's next
It would be better if ReedBeta put this thread on top so it doesn't kep going down the list. Like the link on how to search the site.
reedbeta at September 12th, 2012 12:49 — #18
Hold your horses, Alienizer. Vilem and I are going to be working on an article version of this.
alienizer at September 12th, 2012 17:49 — #19
oh ok, sorry, I thought he was being ignored since nobody posted anything about what he did!