The result is actually slower, but I might know why:
I generated a lookup table like you said:
const half aCosT =
1.49259,........... and so on...
then referenced it like this:
half theta_r = aCosT[(VdotN * 0.5) * 64]; \<- VdotN stays between 0.0f - 1.0f
I rember reading in an ATI paper that referencing (indexing) with a float value is costly. Any way to perform this lookup without using a float (or half)?