Fish tank with Puck the Fish Fish tank with Puck the Fish Water on the floor spelling Fish
 

Float

September 26th, 2005

Why now?

My friend over at SciFiHiFi suggests that Microsoft likes trees, but Apple likes hash tables. Well, I think that Microsoft prefers integer arithmetic, while Apple (or at least Cocoa) likes floating point. Here’s a bunch of examples I found:

(Apple does it with doubles – hey, that’s a good bumper sticker.)

Ahem. Ok, so Apple is floating-point happy. (And it didn’t always used to be this, way, of course. Quickdraw used integer coordinates, and Quartz switched to floating point, remember?). So, as a Mac programmer, I should really get a good handle on these floating point thingies. But floating point numbers seem pretty mysterious. I know that they can’t represent every integer, but I don’t really know which integers they actually can represent. And I know that they can’t represent every fraction, and that as the numbers get bigger, the fractions they can represent get less and less dense, and the space between each number becomes larger and larger. But how fast? What does it all LOOK like?

Background

Well, here’s what I do know. Floating point numbers are a way of representing fractions. The Institute of Electrical and Electronics Engineers made it up and gave it the memorable name "IEEE 754," thus ensuring it would be teased as a child. Remember scientific notation? To represent, say, 0.0000000000004381, we can write instead 4.381 x 10-13. That takes a lot less space, which means we don’t need as many bits. The 4.381 is called the mantissa, and that -13 is the exponent, and 10 is the base.

Floating point numbers are just like that, except the parts are all represented in binary and the base is 2. So the number .01171875, that’s 3 / 256, would be written 11 x 2-100000000.

Or would it? After all, 3/256 is 6/512, right? So maybe it should be 110 x 2-1000000000?

Or why not 1.1 x 2-10000000?

Ding! That happens to be the right one; that is, the one that computers use, the one that’s part of the IEEE 754 standard. To represent a number with floating point, we multiply or divide by 2 until the number is at least 1 but less than 2, and then that number becomes the mantissa, and the exponent is the number of times we had to multiply (in which case it’s negative) or divide (positive) to get there.

Tricksy hobbit

Since our mantissa is always at least 1 but less than 2, the 1 bit will always be set. So since we know what it will always be, let’s not bother storing it at all. We’ll just store everything after the decimal place. That’s like saying “We’re never going to write 0.5 x 10-3, we’ll always write 5. x 10-4 instead. So we know that the the leftmost digit will never be 0. That saves us one whole numeral, the numeral 0.” A WHOLE numeral. Woo-hoo? But with binary, taking out one numeral means there’s only one numeral left. We always know that the most significant digit is 1, so we don’t bother storing it. What a hack! Bit space in a floating point representation is almost as expensive as housing in Silicon Valley.

So the number 3 / 256 would be written in binary as .1 x 2-10000000, and we just remember that there’s another 1 in front of the decimal point.

Unfair and biased

We need to store both positive exponents, for representing big numbers, and negative exponents, for representing little ones. So do we use two’s complement to store the exponent, that thing we hated figuring out in school but finally have a grasp on it? The standard for representing integers? (The exponent is, after all, an integer.) Nooooooo. That would be toooooo easy. Instead, we bias the number. That just means that the number is always stored as unsigned, ordinary positive binary representation, but the REAL number is what we stored, minus 127! So if the bits say the exponent is 15, that means the REAL exponent is -112.

A good sign

We also need to represent positive and negative numbers (remember, a negative exponent means a small but positive number; we don’t have a way of representing actual negative numbers yet). At least there’s no real trickery involved here – we just tack on a bit that’s 0 for a positive number, and 1 for a negative number.

Let’s make a float

So now that we have a handle on all the bizarre hacks that go into representing a float, let’s make sure we did it right. Let’s put together some bits and call them a float. Let me bang some keys on the top of my keyboard: -358974.27. There. That will be our number.

First, we need a place to put our bits that’s the same size as a float. Unsigned types have simple bit manipulation semantics, so we’ll use one of those, and start with 0.

   unsigned val = 0;

Ok, next, our number is negative, so let’s set the negative bit. In IEEE 754, this is the most significant bit.

   unsigned val = 0;
   val |= (1 << 31);

All right. Start dividing by 2. I divided 18 times and wound up with 1.369 something, which is between 1 and 2. That means that the exponent is 18. But remember, we have to store it biased, which means that we add 127. In IEEE 754, we get 8 bits for the exponent, and they go in the next 8 most significant bits.

   unsigned val = 0;
   val |= 1 << 31;
   val |= (18 + 127) << 23;

Now the mantissa. Ugh. Ok, 358974.27 in straight binary is 1010111101000111110.010001010001 and then a bunch of others 0s and 1s. So the mantissa is that, minus the decimal place. And IEEE 754 says we get 23 bits for it. So first, chop off the most significant bit, because we know it will always be one, and throw out the decimal point, and then round to 23 bits. That’s 01011110100011111001001, which is, uhh, 3098569. There. That’s our mantissa, which occupies the remaining 23 bits.

   unsigned val = 0;
   val |= 1 << 31;
   val |= (18 + 127) << 23;
   val |= 3098569;

Ok, let’s pretend it’s a float, print it out, and see how we did!

#include <stdio.h>
int main(void) {
   unsigned val = 0;
   val |= 1 << 31;
   val |= (18 + 127) << 23;
   val |= 3098569;
   printf("Our number is %f, and we wanted %f\n", *(float*)&val, -358974.27f);
   return 0;
}

This outputs:

Our number is -358974.281250, and we wanted -358974.281250

Hey, it worked! Or worked close enough! I guess -358974.27 can’t be represented exactly by floating point numbers.

(If you’re on a little-endian machine like Intel, you have to do some byte-swapping to make that work. I think.)

Loose ends

There’s a few loose ends here. Remember, we get the mantissa by multiplying or dividing until our number is between 1 and 2, but what if our number started out as zero? No amount of multiplying or dividing will ever change it.

So we cheat a little. We give up some precision and make certain exponents “special.”

When the stored exponent is all bits 1 (which would ordinarily mean that the real exponent is 128, which is 255 minus the bias), then everything takes on a special meaning:

  • If the mantissa is zero, then the number is infinity. If the sign bit is also set, then the number is negative infinity, which is like infinity but less optimistic.
  • If the mantissa is anything else, then the number isn’t. That is, it’s Not a Number, and Not a Numbers aren’t anything. They aren’t even themselves. Don’t believe me?
    #include <stdio.h>
    int main(void) {
       unsigned val = -1;
       float f = *(float*)&val;
       int isEqual = (f==f);
       printf("%f %s %f\n", f, isEqual ? "equals" : "does not equal", f);
       return 0;
    }
    

    This outputs nan does not equal nan. Whoa, that’s cosmic.

When the stored exponent is all bits 0 (which would ordinarily mean that the real exponent is -127, which is 0 minus the bias), then everything means something else:

  • If all the other bits are also 0, then the floating point number is 0. So all-bits-0 corresponds to floating-point 0…phew! Some sanity!
  • If all the bits are 0 EXCEPT the sign bit, then we get negative 0, which is an illiterate imperfect copy of 0 brought about by Lex Luthor’s duplicator ray.
  • If any of the other bits are 1, then we get what is called a denormal. Denormals allow us to represent some even smaller numbers, at the cost of precision and (often) performance. A lot of performance. We’re talking over a thousand cycles to handle a denormal. It’s too involved a topic to go into here, but there’s a really interesting discussion of the choices Apple has made for denormal handling, and why, and how they’re changing for Intel, that’s right here.

Please stop boring us

So I set out to answer the question “What does it all LOOK like?” We’re ready to paint a pretty good picture.

Imagine the number line. Take the part of the line between 1 and 2 and chop it up into eight million evenly spaced pieces (8388608, to be exact, which is 223). Each little chop is a number that we can represent in floating point.

Now take that interval, stretch it out to twice its length, and move it to the right, so that it covers the range from 2 to 4. Each little chop gets twice as far from its neighbor as it was before.

Stretch the new interval again, to twice its length, so that it covers the range 4 to 8. Each chop is now four times as far away from its neighbor as it was before. Between, say, 5 and 6, there are only about two million numbers we can represent, compared to the eight million between 1 and 2.

Here, I’ll draw you a picture.

There’s some interesting observations here:

  • As your number gets bigger, your accuracy decreases – that is, the space between the numbers you can actually represent increases. You knew that already.
  • But the accuracy doesn’t decrease gradually. Instead, you lose accuracy all at once, in big steps. And every accuracy decrease happens at a power of 2, and you lose half your accuracy – meaning you can only represent half as many numbers in a fixed-length range.
  • Speaking of which, every power of 2 is exactly representable, up to and including 2127 for floats and 21023 for doubles.
  • Oh, and every integer from 0 up to and including 224 (floats) or 253 (doubles) can be exactly represented. This is interesting because it means a double can exactly represent anything a 32-bit int can; there is nothing lost in the conversion from int->double->int.

Zero to One

On the other side of one, things are so similar that I can use the same picture.

The squiggly line represents a change in scale, because I wanted to draw in some denormals, represented by the shorter brown lines.

  • At each successive half, the density of our lines doubles.
  • Below .125, I drew a gradient because I’m lazy to show that the lines are so close together as to be indistinguishable from this distance.
  • 1/2, 1/4, 1/8, etc. are all exactly representable, down to 2-126 for normalized numbers, and 2-149 with denormals.
  • The smallest "regular" (normal) floating point number is 2-126, which is about .0000000000000000000000000000000000000117549435. The first (largest) denormal for a float is that times .99999988079071044921875.
  • Denormals, unlike normalized floats, are regularly spaced. The smallest denormal is 2-149, which is about .000000000000000000000000000000000000000000001401298.
  • The C standard says that <float.h> defines the macro FLT_MIN, which is the smallest normalized floating point number. Don’t be fooled! Denormals allow us to create and work with floating point numbers even smaller than FLT_MIN.
  • “What’s the smallest floating point number in C?” is a candidate for the most evil interview question ever.

So that’s what floating point numbers look like. Now I know! If I made a mistake somewhere, please post a correction in the comments.

 

The Internet!

π = 3.2860203432

good job explaining stuff! i was rather astonished at all the float uses in cocoa when i first moved over to macs, but now, it seems second nature.

[...] ridiculous_fish doesn’t post much, but the posts are always interesting. Take the most recent post, which is about the encoding of floating point numbers. I laughed at this bit: If the man [...]

Alex Rosenberg

Apple’s recent love-affair with floats isn’t all a rosy picture. As you’ve noticed, not everybody really understands the properties of floating-point numbers.

AppKit is just one of many layers of code above CoreGraphics that improperly uses floats. For example, resizing a window vigorously can result in subviews moving ever so slightly around from where they should be. NSSplitView gives all of the fractional portion of a resize to the bottommost pane. This can be seen in FileMerge: slowly grow the window to watch the bottom pane open itself up.

Goldberg’s “What Every Computer Scientist Should Know About Floating-Point Arithmetic” should be required reading. is one place to find it.

One of the PITA things about floats is that base ten decimals are irrational numbers when represented in binary floating point representations. So 0.5 turns into 0.49999999999999 if you’re not careful about it.

There is a very interesting essay on floating point number, rounding, overflow and underflow and numeric representation in general and SANE (Standard Apple Numerics Environment) in the preface to the “Apple Numerics Manual”, second edition. Reading, MA: Addison-Wesley, 1988 by William Kahan a professor at Berkeley.

Kahan’s similar but less humorous essay on the IEEE 754 standard is found here:

http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

and his web page is full of papers on the subject:

http://www.cs.berkeley.edu/~wkahan/

The original essay isn’t found in Apple’s successor document which might, however, be interesting to people following this topic:

Inside Macintosh: PowerPC Numerics
http://developer.apple.com/documentation/Performance/Conceptual/Mac_OSX_Numerics/Mac_OSX_Numerics.pdf

I agree… Apple’s occasional overuse of floats (and doubles) bothers me too. The fact that, in floating point arithmetic, you can’t assume that (A+B)-B==A causes lots of subtle bugs, and the nonuniformity of floating-point values makes them a bad choice for CFAbsoluteTime.

On the other hand, C has no direct support for fixed-point types, which would be the Right Way ( ;-) ) to do a lot of the things done with floats here.

On the gripping hand, floats are almost-good-enough, almost-all-the-time…

Reinder

“Quickdraw used integer coordinates, and Quartz switched to floating point”.

You forget (or don’t know about) QuickDraw GX, which used fixed-point coordinates (16.16)

CoreGraphics prefers to manipulate graphics in vector format. To be useful (that is, to print or display onscreen), vectors must be rastered. However, once rastered, some precision is lost, so Quartz doesn’t raster until it absolutely must.

Likewise, maybe we need a numeric type which can be operated on, but doesn’t resolve precision until the very last moment. Its value would be stored as a series of equations which would not resolve until asked, at which point the asker will know the precision he needs. This must be how Mathematica works…?

I dub thee CoreMathâ„¢.

[...] ng Point Tuesday, September 27, 2005 Floating Point
Peter Ammon:

So, as a Mac programmer, I should really get a good handle on these
floating point th [...]

one interesting side note on floats and unexpected results. currently in ruby, (4.10 * 100).to_i returns 409
http://blog.leetsoft.com/articles/2005/09/27/wtf

Kevin:
One not so nice side effect in quartz graphics is, that little white lines on the edges between objects can easily be produced. This is not good. I don’t know if this is due to the late rasterisation, or simply due to bad anti-aliasing, but it is kind of unprofessional. You can try yourself in e.g. keynote. draw adjacent rectangles without lines…. and i doubt that the edge points have different coords. I’ll write a bug report soon with sample code that shows this….

John

I’ve just run across this blog… Very very interesting reading (the best explanation of 754 I’ve ever seen)

Nice work and publish more

“So the number 3 / 256 would be written in binary as .1 x 2-10000000, and we just remember that there’s another 1 in front of the decimal point.”

Well, that would of couse be a “binary point”, rather than a decimal one.

Hm, my pseude-html tag to show you all I am aware of my pedantry was stripped by the comment system. Yes, I know that what I said is an extremly pedantic point.

Float

Ridiculous Fish dissects floating point numbers.

Great work on the article. To answer an unasked question:

“(If you’re on a little-endian machine like Intel, you have to do some byte-swapping to make that work. I think.)”

No. No swapping necessary in your code. An int’s an int’s an int on any platform, until you start screwing with byte-streams.

I always think about it this way — (unsigned)( 1

Nice. My comment above totally broke.

that should finish off “(unsigned)( 1 shifted left 10 ) is equal to (unsigned)1024″. Could you imagine how hard it’d be to remember swapping things like that on different platforms?

I guess the use by Apple of floating point numbers may originate with their use of PowerPC chips, certainly floating point is faster on PPC than on x86, and you get the bonus of pipelining. There again it might also be due to support for multiple display types. I remember reading the QuickDraw docs years ago and being blown away by Apple’s approach to graphics support ( even if it was slower than righting to the raw device. )

Mac-arena the Bored Zo

a point about your conversion to float: I used to do the *(cast)&blah thing as well, until I realised that that’s what union types are for. adapting your code:

union {
unsigned uval; //replaces the ‘value’ variable
float fval;
} raw_float_assemblage = { .uval = 0U };
//replace ‘value’ with ‘raw_float_assemblage.uval’ in your code – I always have problems picking a name for my union variables)
printf(“Our number is %f, and we wanted %f\n”, raw_float_assemblage.fval, -358974.27f);

Anon

Thought I’d pass along that everything in Windows Presentation Foundation is based on floats http://www.charlespetzold.com/blog/0512210106.html.

Yoshchka

This was a really interesting article. Unfamiliar with the C code, (bit twiddling and the > stuff always caused a brain buffer overflow) but overall a very interesting discussion of an important gewgaw.

[...] l programmers should be conscious of when they choose to use them. Ridiculous Fish have a most wonderful article about how floats are stored and what numbers they can [...]

Anonymous

“If you’re on a little-endian machine like Intel, you have to do some byte-swapping to make that work. I think.”

well, i am on a x86 machine (AMD Athlon XP, not all x86 machine are Intel) and the first example (make a float) gives me what’s expected without any byte swaping.

Isn’t the IEEE 754 the same for all machine, is it ?

Byte swapping is not needed on either platform, because ints and floats are swapped in the same way. That is, yes, an int on x86 is “backwards” from an int on ppc, but floats are “backwards” in the same exact way, so it all evens out in the end.

p mac

Floating point Time??? That’s insane–time addition should be exact. (Checks manual. Nope. Time is integer math; long seconds, long microseconds. Phew.)

From the manual, gettimeofday:

struct timeval {
long tv_sec; /* seconds since Jan. 1, 1970 */
long tv_usec; /* and microseconds */
};

[...] ] Some links Alexa numbers worse than useless LazyFOAF » Bookmark on del.icio.us Ridiculous_fish [...]

Float!

Float! Und was man schon lange drber wissen sollte.

Damien

I would think the main reason that win32 mainly uses integers is just because it’s an older API compared to Cocoa. Back then, floats where really slow on x68 chips, floating point hardware was an optional add-on, and everyone used integers whenever possible. Even when floats were necessary, it was often emulated using fixed point arithmetic with integers.

I imagine nowadays, where backwards compatibility is not an issue, that Microsoft would use floats and doubles where it makes sense.

Sounds very interesting, Also very innovative Blog style here, looks very fun :)

Erick

And don’t forget that large parts of the display engine in OS X share a lot with the PDF spec, which is, as you might have guessed, based on floats. This might also be the origin of the bottom left origins in Quartz graphics versus upper left origins in QuickDraw graphics.

Jernej

The article is misinformation, MS does NOT do time in integers but INT64, which is 64 bit and reprisents 18446744073709551616 different states, which translates into:
18446744073709551.616 miliseconds
307445734561825.86026666666666667 minutes
5124095576030.4310044444444444444 days
14038618016.521728779299847792998 years…

data range:
INT64: –2^63..2^63–1, you get 14038618016 years of date represintation

however, the so called “double” that apple uses:
DOUBLE: 5.0 x 10^–324 .. 1.7 x 10^30 – the range it reprisents with sufficient precision matches a 32 bit UINT – range 0..4294967295 , it is no match to precision of int64 that windows and linux use which can reprisent more numbers with FULL precision.

so basicly, you can compare it like this, mac uses doubles which are precision equal to 32 bit INTEGERS, windows and linux use 64 bit INTEGERS, which are more precise.

i suggest everyone, especially the author to read THIS DOCUMENT: http://lua-users.org/wiki/FloatingPoint about FPU and precision.

Warrenb

The point of the article is to explain how floating point numbers work, not further the agenda of mac vs windows, fanboy.

“Jernej ” wrote: The article is misinformation, MS does NOT do time in integers but INT64

I’ll let this one speak for itself.

[...] ey are handled, what they are, etc. (tags: Computers math Numbers programming reference) ridiculous_fish » Blog Archive » Float Another useful article about floating point nu [...]

Anonymous

hello

[...] 5:15  | 

Pomijając dyskusję nad pisownią tego trudnego wyrazu, tutaj jest dość ciekawy artykuł na ten temat, a jeżeli kogoś nie przeraża techno- [...]

ditech

ditech
ditech – ditech
Long life is in store for you.
Kiss me, Kate, we will be married o’ Sunday.
– William Shakespeare, “The Taming of the Shrew”

[...] s, but NSSlider is floating point. MS does time in integers but Apple does it with doubles.read more | digg story No Comments N [...]

Tony

Not sure if anyone else caught this, but there’s a little bug where you take the first 23 bits (minus the first) for the mantissa… the first 24 bits are:
0101 1110 1000 1111 1001 0001 , while you use:
0101 1110 1000 1111 1001 001 for the mantissa.

What’s weird is why it still works.