What nick talks about is also known as Morton-Order, and it creates the z-order space filling curve. A neat advantage of this order is, that you can implement affine texture mapping with nearly zero overhead.
The usual per pixel task for texture mapping is to extract the texture address and then increment or decrement the uv coordinates. Doing the same in morton order can be done somewhat like this:
(I use 4.4 formated fixed point which is way to imprecise to be practical, but it's ok to show the idea behind it)
That's how the UV coordinates look before converting them:
( w = whole bits, f = fraction bits)
u := wwwwffff
v := wwwwffff
Now interleave zero bits into the uv coordinates:
u := w0w0w0wffff
v := w0w0w0wffff
Convert your deltas for U and V for as well, but interleave with ones:
dudx = w1w1w1wffff
dvdx = w1w1w1wffff
And here comes the trick: The deltas can just be added onto the coordinates if you mask out
the zero bit gaps after addition. This makes the addition work directly in morton order.
for (int x = left; x < right; x++)
// get texture address in morton order:
int addr = (u>>4)|((v>>4)<<1)
putpixel (x,y,texture[addr], yadda yadda)
// increment uv coordinates per pixel.
u = u + dudx;
v = v + dvdx;
// mask out the gap-bits
u &= 010101011111b;
v &= 010101011111b;
The netto cost per pixel are two AND operations. That's near to nothing if you consider that your cache coherency will skyrock. Oh, and you'll loose some bits due to the interleaving with zeros, but that shouldn't be a problem if you put everything in mmx registers or limit the texture size and subtexel precision a bit.
The only expensive part is to interleave the whole bits with ones and zeros. Unfortunately there is no clever way to do this fast unless your cpu offers a special instruction for such stuff, but you only have to do this once per polygon or once per perspective correction, so it's not that bad really.
Hope that sheds some light on how easy it is to implement texture swizzeling effectively.