Thanks for the links, Kusma and Nils.
I'll use Google's cache to read the pages.
IE and Mozilla fail to open them directly =/
If I remember correctly the cast ops are slow because they actually result in a function call to ftol.
More precisely, with _ftol two FPU state changes happen.
First the FPU current rounding mode is changed, then the
desired bitmask is built, and finally the FPU rounding mode
is set back to the original state.
This is done to guarantee taht the ANSI C standard is fully
respected, no matter the platform, because there's no
dedicated asm instruction***.
*** maybe due to the fact that the IEEE 754 is still under
R&D, and may change in future. Just a wild guess...
These state changes stall the FPU twice per each cast.
Collaterally, the designated CPU pipe has to sit and rot
while waiting for the FPU to finish.
To add insult to injury, the state changes are unnecessary
in most cases anyway.
Simply write a piece of assembly code that pops the float you want from the floating point stack. That will atuomatically truncate it. I can post a piece of code when I'm home from work in a few hours or so.
I didn't mention that I know little to no asm.
I recognize jumps, and some basic instructions.
I know that the count of "push" and "pop" must always
But that's all.
As far as my job is concerned, I don't need to know assembly.
Yet game programming is my 2nd hobby, and I'll have to
buy a good book someday.
Examining the asm produced by the compiler isn't enough.
Kind regards everyone,
Ciao ciao : )