I am a newbie to the field of Digital Image Processing and Computer Graphics. I am reading about Geometry for 3D vision. I am finding very hard to understand the concepts of perspective projection and homogeneous coordinates. I tried googling, various books, and I also read a similar post from one of the members of this forum, but could not get a satisfactory answer.
I have the following doubts and hope to have them cleared:
Why do we need homogeneous coordinates?
I also read that, points in the projective space are expressed in homogeneous coordinates (REF: Image Processing Analysis and Machine Vision. By, Milan Sonka, Vaclav Hlavac, Roger Boyle). Why can't we represent the points using the same Cartesian coordinates?
If there is/are any error(s) in the questions, I apologize. Please consider my case sympathetically.
- Why do we need homogeneous coordinates?
You need them for affine transformations and projection. With an ordinary 3x3 matrix you can only apply linear transformations such as rotation, scaling and shearing. The point at (0, 0, 0) will always remain at that position. A transformation matrix is essentially just a linear system of equations. A vector (x, y, z) transformed by a 3x3 matrix M is essentially just x*M.xAxis + y*M.yAxis + z*M.zAxis. It's ovious that you can't have a translation if all the components of the vector are zero, so you need an extra term: x*M.xAxis + y*M.yAxis + z*M.zAxis + translation. By using homogeneous coordinates, you include a w-component to the vector (which usually is an implied 1) in order to be able to include translation in the matrix: x*M.xAxis + y*M.yAxis + z*M.zAxis + w*M.translation. To get to the actual 3d point in space, you should divide the resulting vector by it's w-component. By making sure that w is always 1 this will save a divide. Therefore common calculations only involve 3d vectors with an implied w=1 and 3x4 or 4x3 matrices with an implied last row/column of (0, 0, 0, 1)
For perpective projections, you need to be able to divide. For a point in view space, the projected point on a 2d screen can be calculated by dividing x and y by z. This is an operation that is not possible with matrices. However, using the properties of homogeneous coordinates, the division by z can be achieved simply by copying 'z' to 'w' using a 4x4 transformation matrix.
Note that while mathematically homogeneous coordinates are very well-defined, in computer graphics we merely use them as a convenience to be able specify the calculations that we need. Most computer graphics applications will never use homogeneous coordinates to their full extent. Before perspectice transformation, 'w' will always equal 1, and the only place where an actual division by w takes place is when projecting the points on a 2D surface before rendering. No one cares about the property that (1, 2, 3, 1) and (2, 4, 6, 2) are essentially the same point. In fact, having w=2 will break most code as they simply assume that w=1, and it may not even be possible to explicitely store a w component.
Your coherent explanation has resolved all my doubts. Thank you very much, .oisyn.