A Q&A White Paper Regarding the Pentium's Flaw ___________________________________________________ Prepared by PC Magazine Labs, PC Magazine Labs U.K., and PC Week Labs December 16 1994 Q> What is this problem with the Pentium? A> The problem exists in the hardware used to calculate floating point divides. The error mainly affects actual mathematical divison operations that are executed with floating point instructions, but it can also affect reciprocal operations. Q> How does the problem occur? A> With the Pentium processor, Intel does not use a standard algorithm for division. The standard algorithm can be considered to be a binary equivalent to long division; it requires one processor clock-cycle for each binary digit. Instead, Intel uses a faster method which deals with two binary digits in each cycle. Intel's method relies on using a lookup table for estimating the range of intermediate results of the division operation, and, as some elements are missing from this table, errors can occur in the results. Q> When do these errors occur? A> The error occurs when you execute a division with certain floating point values. According to Intel, if you choose random floating point values, an error will occur once in every 9 billion divisions. Q> Intel says that in the worst case the error will be seen from the fourth decimal digit. Should people be concerned over this level of inaccuracy? A> Such an error can be part of a much larger calculation. Very often the error will become lost in subsequent calculations and become insignificant. But the reverse can also occur. Such an error can become magnified, making a large difference to final results. The sample spreadsheet calculations that have been much publicized in the press demonstrate these kinds of errors. Q> But Intel is stating that the error is much less likely to occur than, say, a hard disk crash or a memory parity error. Is this true? A> For most users, Intel's statement is true. However if you get a memory error or your hard disk malfunctions, you get "notice" that these errors have occurred and you know to remedy the situation and how to treat your most recent data. If you encounter the divide error, there is no way of knowing that this has occurred other than checking your calculations on a different machine. This is one of the factors contributing to the magnitude of the problem. Q> IBM states that Intel is not depicting the magnitude of the problem accurately. They claim that errors occur much more frequently than Intel admits. Is this true? Q> When Intel claimed that the floating point error only occurred 1 in 9 billion they were looking at all possible floating point numbers. IBM contends that most users do not use random types of floating point numbers in the course of their normal computer operations. Even in financial work, where there is a good deal of number crunching, many users will primarily be dealing with numbers that have two decimal places. IBM claims to have demonstrated that simple arithmetic - primarily addition and subtraction - done in a spreadsheet with such numbers has a tendency to produce values that when they are used in a subsequent divide operation will more likely create errors. This type of number has recently been called the "bruised integer". This is not in itself an error, but has a greater likelihood of creating an error when used in a divide. A "bruised" integer is a number such as "4" which, in a floating point representation, will be held in the PCs memory as the binary equivalent of "3.999999...". This small rounding effect after a calculation is quite normal, and does not appear on your screen, but IBM is stating that these "bruised" integers makes errors more likely when they are subsequently used in divisions on a Pentium. Another IBM contention is that Intel's much-quoted likelihood of an error (once in 27,000 years for basic (sic) spreadsheet users) does not take into consideration financial analysts and users of other numerically intensive business applications - the sort of person who sits in front of 1-2-3 or Excel all day, every day. However, these users will only be at risk if their calculations contain significant numbers of floating point divide operations. IBM's conclusion is that with some users having this higher frequency of actual divide instructions, combined with the greater risk of "bruised" integers, the overall risk for some business users is very much higher than Intel suggests. Q> Who should users believe? Intel or IBM? A> The analysis that both companies - and many others - have been doing is intended to describe the likelihood of a wide range of different types of users of both encountering the divide error, and of this creating serious results. In fairness to both Intel and IBM, this is very difficult to quantify In the analysis the analysis done both by ourselves and many others, and the conversations we have had with both Intel and IBM, we consider that Intel has understated the problem and IBM has overstated it. In the testing and analysis conducted by PC Magazine, PC Magazine UK and PC Week, we have concluded that the risk for serious business users is significantly higher than suggested by Intel, but not as high as the position taken by IBM. Our findings are documented in a white paper called "What Does the Pentium Do, and When Does it Do It?". This available on the PC Week World Wide Web page: http://www.ziff.com/~pcweek. It is also available on CompuServe by typing GO PCWEEK. Q> Who will be most affected by this problem? A> The vast majority of users are not affected. For example, although there may be rare exceptions, applications such as word processing, presentation graphics, desktop publishing and databases do not use floating point mathematics, and so this problem is just not relevant to them. Other applications that do use floating point, such as graphics programs, games, etc., may encounter errors, but the effects are almost always going to be insignificant. Intel has made it clear that it considers users involved in intensive mathematical or scientific analysis and technical engineering applications to be at risk. It has been replacing processors for such people. The gray area between these two extremes concerns users of spreadsheets and other financial software, and most will not be affected. Many spreadsheets simply do not contain divide instructions, and if they do, they are not executed repeatedly with a sufficient number of different values to make the likelihood of an error significant. However, economists, stock traders and others who use complex models or statistical analysis are more likely to be at risk, and should consider themselves in a similar risk category to the scientific analysts. We have estimated that such users may encounter an error once in every 20 to 30 years. The most intensive spreadsheet users are at greater risk, and may experience an error every two months to ten years, though the tiny magnitude of these errors makes them unlikely to affect a business decision. Q> How can I tell if my Pentium is one that causes these divide errors? A> You can tell by executing calculations that are known to go wrong. Both PC Magazine and PC Week have utilities that are available that can test this for you. PC Magazine Internet: Downloads: ftp://ftp.pcmag.ziff.com/pub/pcmag/special/pentst.zip Web info: http://www.pcmag.ziff.com/~pcmag Compuserve: Downloads: ZNT:UTILFORUM, Library: Labs, File: pentst.zip PC Week Internet: Downloads: ftp://www.ziff.com/pub/pcweek/fdv/fdv.zip Web info: http://www.pcweek.ziff.com/~pcweek Compuserve: Downloads: ZNT:PCWEEK, Library: Labs/Netweek, File: fdv.zip To check for errors on specific calculations, compare your answer with a program that does not use the affected portion of the Pentium. One such product is Derive from Soft Warehouse of Honolulu, Hawaii. PC Week Labs Float Divide Verification Suite This archive includes the C source code and a DOS executable for the PC Week Labs Float Divide Verification Suite and a white paper describing our findings on the Pentium floating point division bug. It is designed to demonstrate the frequency and magnitude of division errors when working with numbers in the neighborhood of 1, typical of many business analyses. To run the software, simply type fdv at the command line. In order to examine all the results, you will have to redirect the output to a file (fdv > results). So far, we have tested this source code on DOS, NT, OS/2, and Linux (using both 486 and Pentium processors). For other processor architectures, we tested the program (so far) on SunOS 4.1.3 using a SPARC processor and an HP/UX 9.01 on an HP 710 workstation using the PA-RISC processor. The source code has been successfully compiled using Microsoft Visual C++ (both 1.51 and 2.0), Watcom C (both 9.0 and 10.0), Zortech C++ 3.1, and the Free Software Foundation's GNU C++ version 2.5.8. We found some problems when compiling that you will need to be aware of if you plan on modifying the source or compiling on different architectures. Both the Watcom and Visual C++ compilers will reduce precision somewhat when optimization is turned on, which will affect the number of errors found. Be sure to turn floating point optimizations off during compiling. On Visual C++, for example, that requires adding a /Op option on the command line or choosing the "improve float consistency" in the optimizations dialog box. To compile the source code on Unix, be sure to link the math libraries, since the source code uses two calls normally found there (fabs() and pow()). Using GNU C++, for example, this requires a -lm command line option. The command line we used on GNU C++ (on both Linux and SunOS) was: gcc fdv.c -lm -o fdv or (for an optimized version) gcc -O fdv.c -lm -o fdv In this test, we focused on the range from 0.1 through 9.9 at increments of 0.1. We chose groups of four values (calling them a, b, c, and d) from this range in every possible combination, and looked at every possible division of the form (a/b)/(c/d). On the Pentium, and only on the Pentium, we found 2,184 significant errors in 96,059,601 cases (we ignored cases of dividing zero by anything else). The largest errors observed were in the 10th decimal place, representing roughly 10,000 times the inaccuracy that a user would reasonably expect from such calculations. This works out to 204,622 times as many errors as Intel implies with its statement that a random division operation has only a one-in-9 billion chance of producing an error. The source code can easily be modified to look at other variations on the (a/b)/(c/d) problem. For example, problems of the form (a-b)/(c-d) yielded 624 errors (or fewer than one per 100,000), which is still roughly 60,000 times the frequency of error predicted by Intel. For more information, read the enclosed white paper, WHITEP.TXT [Ed.Note: On this ftp server PBUG_PCWEEK_PAPER.TXT]. Any questions or commands can be directed to: Peter Coffee: 3571756@mcimail.com. Eamonn Sullivan: esullivan@pcweek.ziff.com David Berlind: dberlind@pcweek.ziff.com ___________________________________________________________