math & physics
mihail121 at February 6th, 2010 07:05 — #1
Hi, does anyone know, why this is the case:
// Given a single precision float flt (23 bit mantissa) with intV and fltV being the integer and the fraction part respectively, so: flt = (float) intV + fltV
float flt = ...
// Do this
flt = flt + (float) (1 << 23)
// Now somehow the mantissa of flt has become exactly the bitwise representation of intV, so:
(int) flt & (mantissa_mask = (1<<23)-1)
// gives exactly intV, the truncated flt value.
I would be greatful if somebody provides some info on the matter, how this great magic works!
edit: No, sorry, the code seems to perform a nearest rounding.
reedbeta at February 6th, 2010 14:08 — #2
When you add two floating-point numbers, conceptually speaking they have to be shifted to line up their radix points with each other before you can add. The number (1 \<\< 23) is a 1 followed by 23 bits of zero, so as a float its radix point lies at the very end of its mantissa, and its mantissa is all zeros (the 1 being implied). When you add flt to it, flt is first shifted so that its radix point lies at the same place. Thus the fractional part of flt falls off the end of the mantissa and the integer part is all that is left. Do the add, and you wind up with the integer part of flt in the mantissa field, at least up to rounding.
If you want truncation you may be able to switch to round-toward-zero mode (google for floating point rounding modes). Or, if your numbers are all positive, an easy trick is to subtract 0.5 before you round it.
unrealsolo at May 29th, 2010 09:30 — #3
Another way of thinking about it is to imagine the numbers represented as bitfield/integer which has enough bits to store the numbers in fixed point, but you only have a window of bits to operate with. you then add multiply , divide as a normal integer, the result window will move so most significant 1 occupies the highest bit in the window. Im not sure if this makes it easier, but it helps me when doing fixed point operations.
oisyn at May 30th, 2010 18:36 — #4
Note that a float doesn't store it's most significant 1, as it is always implied. And obviously, the trick will only work for positive numbers between 0 and 223 (if you want the next 11 bits as well, you could apply the same trick by adding 232, then shifting the resulting int representation by 23 bits and add them to the lower 23 bits already calculated)