Subject: Pentium Divide Bug FAQ Path: math.fu-berlin.de!zib-berlin.de!prise.nz.dlr.de!news.dfn.de!swiss.ans.net!howland.reston.ans.net!usc!news.isi.edu!not-for-mail From: carlton@darkstar.isi.edu (Mike Carlton) Newsgroups: comp.sys.intel Subject: Pentium Divide Bug FAQ Date: 12 Dec 1994 19:46:17 -0800 Organization: USC Information Sciences Institute Lines: 509 Distribution: world Message-ID: <3cj5e9$nej@darkstar.isi.edu> Here is an updated version of my FAQ. I've incorporated lots of information that Intel has released. As always, please email me any corrections or suggestions for additions--I cannot read all of the posts to comp.sys.intel. Also, I want to keep the FAQ to factual matters--there are already plenty of opinions on the net to choose from. I would like to know how often people would like to see this posted. There still seems to be a lot of people who aren't aware of the basics. For now I'll probably stick to a weekly posting unless more frequent posting would be useful. The first version of this didn't make it explicitly clear, but you may freely post or redistibute this FAQ. The current version is always available at ftp://www.isi.edu/pub/carlton/pentium/FAQ cheers, --mike carlton@isi.edu -------------------------------------------------------------------------------- Pentium Divide Bug Frequently Asked Questions Version 2 12-Dec-94 Contents 0) Disclaimer 1) What is the bug? 2) How do I tell if my Pentium machine has the bug? 3) What do single-, double-, extended-precision, exponent and mantissa mean? 4) How many cases of the bug are there? 5) How often does the bug occur? 6) How big can the error get? 7) Will the bug affect my program? 8) What chips have the bug? 9) What is the cause of the bug? 10) When will the bug be fixed? 11) Are there any other bugs in the Pentium chip? 12) Will Intel replace my buggy chip? 13) What is the history of the bug's discovery? 14) Is there a way to deal with the bug in software? 15) Where can I get more information? 16) Where can I find more discussion of the bug? 17) Acknowledgments Copyright 1994 Mike Carlton - Permission is granted to distribute or reprint this document freely. 0) Disclaimer This document summarizes what I understand about the Pentium FDIV bug. It is based upon my experiments, what has been reported to me or that I have read on the net, and on my understanding of the technical information Intel has released so far. This document does not represent the position of the University of Southern California or of the USC Information Sciences Institute. Mike Carlton USC Information Sciences Institute 4676 Admiralty Way; Marina del Rey, CA 90292-6695 carlton@isi.edu (310) 822-1511 FAX: (310) 823-6714 1) What is the bug? There are some rare cases when the Pentium chip divides two floating-point numbers in which it returns an answer that is slightly inaccurate (the precision of the result is less than expected). Intel has confirmed that the bug affects single- precision, double-precision and extended-precision divisions. The bug occurs only for certain pairs of numbers. It is repeatable--i.e. if a pair of numbers is known to be affected by the bug, the pair will be affected every time it is tested on every chip with the bug. The bug is also independent of the speed of the chip and of any previously executed instructions. The bug is not affected by the rounding mode being used. The actual bug is in the divide unit of the Pentium's floating- point unit. Thus any instructions that use the divide unit can potentially exhibit the bug. Intel reports that the divide and remainder instructions will exhibit the bug. Additionally, Intel reports that the following instructions use the divider and thus can also be affected: FPTAN, FPATAN, FYL2X, and FYL2XP1. There has been one error case reported for these instructions--Cleve Moler reported a case of the arctangent instruction returning an incorrect result. Richard Wirt of Intel has reported (in a net post) that sine, cosine and the exponentials have been proven to not be affected by the bug. The two log functions are still being investigated. The bug does not appear to effect any other instructions in the processor. In particular, it will not affect any integer arithmetic instructions. 2) How do I tell if my Pentium machine has the bug? Here are two simple tests that you can try on any calculator program or spreadsheet running on a Pentium-based PC. 1) Divide 5505001 by 294911 The buggy answer is: 18.66600093 The correct answer is: 18.66665197 2) Divide 4195835 by 3145727 The buggy answer is: 1.33373907 The correct answer is: 1.33382045 If you get the buggy answer then that Pentium chip definitely has the bug. If you get the correct answer, the chip may or may not have the bug. Some programs use different methods which do not use the floating-point divide instruction and so they are not affected. I've tried the cases above with the Microsoft Windows calculator and Microsoft Excel and they do show the bug. If you get the correct answer with the Windows calculator, with Excel or with any other program known to be using the floating-point divide instructions, then you definitely have a non-buggy chip. You can also test for the bug by computing the value of the residual with the equation "x-(x/y)*y" (where x is the dividend given above and y the divisor). The result should always be 0; on a buggy Pentium the results will be 192 for the first case and 256 for the second. If you use this method you must make sure that floating-point division is used and that the division is performed first or else the error will not occur (even on a buggy chip). Another method to test for the bug is to run p87test by Terje Mathisen. This is a small assembly language program which will report several details about the processor and whether or not it has the bug. It is available on several FTP sites, including: ftp://math.ucdavis.edu/pub/fdiv/p87test.zip and ftp://www.isi.edu/pub/carlton/pentium/other/p87test.zip 3) What do single-, double-, extended-precision, exponent and mantissa mean? The Pentium (along with every other major microprocessor) uses the IEEE 754 standard to represent floating point numbers. A floating- point number stores a real number as sign*1.XXX*2^YYY, i.e. one plus a fraction, raised to a power of two, along with the sign. The XXX part is the mantissa and the YYY part is the exponent. The IEEE standard defines a single-precision number to have a 23-bit mantissa. This provides 24-bit precision (counting the leading one) and is equal to approximately 7 decimal digits. A double precision number has a 52 bit mantissa, equal to about 15 decimal digits. Extended-precision has a 63 bit mantissa, equal to about 18 decimal digits. If you are using the C programming language, a 'float' variable is single-precision, a 'double' variable is double-precision and a 'long double' is typically extended-precision (however, not all compilers support this last option). 4) How many cases of the bug are there? If we limit our scope to cases involving single-precision numbers and where the accuracy of the result is less than single-precision then there appear to be 1738 unique cases. Of these, 87 have only approximately 4 decimal digit accuracy. Due to the nature of the bug, either number of a pair can be multiplied (or divided) by any power of 2 and/or have its sign changed and still be affected by the bug in the same way. This is because the bug is triggered by bit patterns in the mantissas of the numbers and these operations do not change the mantissas. All numbers with the same mantissa are considered just one unique case. Every unique single-precision dividend mantissa for each single- precision divisor mantissa believed susceptible to the bug has been checked. Assuming that Tim Coe's model of the divider bug is correct, then the 1738 cases mentioned above are all the single-precision pairs of numbers which are affected by the bug. In double-precision there are many more cases that exhibit less than single-precision accuracy. Note that the single-precision numbers are a subset of the double-precision numbers, so the cases above all result in the same error when treated as double-precision. Additionally, in double-precision a range of numbers can often exhibit the bug. This is because in many cases the dividend or divisor can be slightly changed (adding or subtracting a small fraction) and still be affected. This merely changes the least significant bits of the mantissa, while the most significant bits, which are triggering the bug, are the same. If we expand our scope to double-precision cases with less than double-precision accuracy (but more than single-precision), the number of cases grows again. However, these errors are quite small (at least 7 digits are correct) and will affect even fewer users. 5) How often does the bug occur? Intel's statement (see below for information on how to get this via automated fax) indicates that one in nine billion randomly-selected pairs of number will exhibit reduced precision when divided. Intel has confirmed this number with two different methods. Here is a simple, back-of-the-envelope calculation of the single- precision probability which tends to agree with Intel's statement. There are 2^23 unique single-precision mantissas. Thus, if you pick 2 single-precision numbers at random there are 2^46 (64 trillion) possibilities. There are 1738 cases where two single-precision numbers produce less than single-precision accuracy. 1738 is a little less than 2^11, so this implies that there is about a 2^11/2^46 = one in 2^35 chance of hitting the bug. 2^35 = 32 billion, so this agrees roughly with Intel's results (especially since Intel reports that the probability for single-precision numbers is lower than that of double-precision). Keep in mind that there is an important distinction between randomly-occurring errors (such as memory parity errors) and deterministic errors that occur under specific conditions (such as the Pentium divide bug). If you run a program that picks 9 billion pairs of numbers and divides each pair, on a buggy chip you can expect 1 result among those to have reduced precision. However, if you do hit one of the affected pairs of numbers, it will return the wrong result every time you divide that pair. If you rerun the program with the same set of numbers you will always get exactly the same results as the first time. In contrast, a randomly-occurring error will affect each run of a program independently and the probability of hitting the error on two successive runs is the square of the probability of hitting it once. For example, if the probability of some randomly-occurring error occurring in one run of a program is 1 in 9 billion, then the probability of the error occurring in two successive runs is 1 in 81 billion billion. Another important consideration is that the numbers people use in practice are not necessarily random. Depending on the distribution of the numbers a particular problem and program uses, the probability of hitting an affected pair of numbers can be greater or less than the 1 in 9 billion probability and thus the results could be more likely or less likely to be affected by the bug. 6) How big can the error get? Intel reports that the worst case inaccuracy occurs in the 12th bit of the mantissa, or in the 4th significant decimal digit. Including the leading 1 bit, this implies that the 12 most-significant bits are always correct, but that the 13th can be incorrect. This confirms the earlier results reported by people outside of Intel. Additionally, Intel reports that only 1 in 360 billion random pairs of numbers will have an error this large (i.e. an error in the 12th mantissa bit). A good way to look at this question is to consider the relative error (i.e. the amount of error divided by the correct result). In the worst cases, such as the 4195835/3145727 example given above, the relative error can be as large as approximately 1 part in 16000. Although that error may seem large, it is just 0.006% (six-thousandths of a percent). You can compute the relative error by dividing the residual value (described above) by the dividend, i.e. (x-(x/y)*y)/x. For example, 4195835-(4195835/3145727)*3145727 = 256 (on a buggy chip), and 256/4195835 = 0.000061. As mentioned above, you can multiply either of the numbers by a power of two and still be affected by the bug. This will multiply the absolute error by the same amount, but will not change the relative error. Additionally, when the bug affects a division, the magnitude of the result returned is always slightly less than the correct result. This reflects the fact that when the bug occurs one or two bits which should be set to one are instead set to zero. 7) Will the bug affect my program? There is no simple answer to this question. It depends upon the problem you are solving and the methods the program uses. Some general conclusions can be drawn, but ultimately you must determine if the bug can affect your results. If your program does not use floating-point instructions then it will not be affected. Most word processors, text editors, games and e-mail programs fall into this category. For those programs which do use the floating-point instructions, an important consideration is that for most problems, a single division with reduced accuracy is unlikely to affect a final result. For example, most graphing programs will not need more than 3 digits of accuracy. Consider printing on a 600-dpi laser printer--3 significant digits is enough to specify an individual printed dot, so an error in the 4th digit will not change the printout. Spreadsheet programs typically use double-precision numbers, so they are susceptible. Many uses of spreadsheets will be unaffected, however certain uses are more likely to be affected. If your spreadsheet doesn't compute any divisions (or use the other instructions mentioned above) it won't be affected. Additionally, many uses of division in a spreadsheet calculate only a single division (i.e. the result is not divided again). In these cases, the result will always be accurate to at least 3 significant figures. In general, any program which produces critical results must be verified to determine if it could be affected by the bug. In particular, methods which repeatedly use the result of a division or which subtract nearly equal numbers can magnify any error which is introduced and produce incorrect results. 8) What chips have the bug? The bug is only in Pentium chips. As of the time of this writing there have been some early reports of Pentium-based machines without the bug. It has been reported in the 60MHz, 66MHz, 75MHz, and 90MHz versions. I have not heard of any tests of 100MHz Pentiums. 9) What is the cause of the bug? The ultimate cause is a few missing entries (due to human error) in a lookup table used by the divider. Floating-point division in most microprocessors (including the Pentium) uses an algorithm analogous to the long division method we all learned in school. Typically this produces one bit (one binary digit) of the result per cycle. However, optimizations can be used to speed up the division. The Pentium uses a radix-4 SRT division which allows it to produce 2 result bits per cycle. At one step in the division the floating- point unit uses the remaining bits of the dividend and divisor mantissas as indices into the lookup table to determine the next two bits of the result quotient. If any step of a division happens to use one of the missing entries then incorrect result bits are produced. 10) When will the bug be fixed? It has been reported on the net and in a New York Times article that Intel states that they found and fixed the bug in June. The November 7 EE Times also reports that it was fixed mid-year. The New York Times article states that Intel has only recently begun providing the fixed chips to their largest customers. It isn't clear yet when fixed Pentiums will appear in dealers inventory. Several people have reported getting a replacement Pentium directly from Intel. 11) Are there any other bugs in the Pentium chip? An unrelated bug in 100MHz Pentiums only, affecting multitasking, has been reported. Reports are that fixed versions of the 100MHz chips are shipping. There are no other publicly known bugs. 12) Will Intel replace my buggy chip? Currently Intel is handing replacement requests on an individual basis. If you would like to have a buggy chip replaced you can call Intel Technical Support at (800) 628-8686 and speak to them directly. 13) What is the history of the bug's discovery? Outside of Intel, the bug was first found by Dr. Thomas R. Nicely of Lynchburg College (nicely@acavax.lynchburg.edu). He posted a message to Compuserve on October 30, 1994 describing the case he had found. His example was 1/824633702441. He had had an unexpected result in one of his experiments and eventually tracked it down to a bug in the Pentium divider, which Intel confirmed. It was first reported in the print media in the November 7 issue of EE Times. Some early reports on the net, using random searches of double-precision numbers, found a couple dozen more cases of the bug. Around November 10, Andreas Kaiser (ak@ananke.s.bawue.de) posted, in comp.sys.intel, a list of 23 cases where a number of the form 1/x failed. The smallest of these was 1/12884897291. Tim Coe of Vitesse Semiconductor (coe@vitsemi.com) developed a model of how the divider was working and why it was failing. He posted a message to comp.sys.intel on November 16 describing his model. He included the case 4195835/3145727 which his model had predicted would fail. This was the first known case which had less than single-precision accuracy. On November 21 Tim Coe posted another message to comp.sys.intel with a refined model of the divider and pointed out that the bug affected both single- and double-precision divisions. He included the prediction that between 50 and 2000 single-precision pairs would have less than single-precision accuracy. Mike Carlton of USC/ISI (carlton@isi.edu), posted a program to comp.sys.intel on November 21 which generates 819 more examples with less than single-precision accuracy, 66 of which have just 14 bit accuracy. This program performed an exhaustive search limited to single-precision dividends and divisors of a forms generally matching Tim Coe's model. This post included the example 5505001/294911. More recent searching (independently by Coe and Carlton) has expanded this to 1738 single-precision cases and 87 with just 14-bit accuracy. Intel has formally described the bug, its cause and effects and is now releasing the information as a white paper. This paper is currently being distributed directly by Intel, although at least one person at Intel is working on making it available on the net. 14) Is there a way to deal with the bug in software? Yes, a simple and effective method has been developed that can be easily supported by software developers. The developers of MATLAB (including Cleve Moler) originally devised a simple software workaround based on verifying the result of a division and repeating the division with scaled operands if the first result was inaccurate. This is a simple and efficient software solution. The total effect on the speed of a program will be negligible unless the program is doing a very large number of divisions. Since then, Moler, Terje Mathisen and Tim Coe have developed an improved solution which is even more efficient. This solution relies on identifying divisors which could lead to an incorrect result and scaling them before dividing if so. It can be shown that the scaled operands cannot be affected by the bug and will return the intended answer. Intel is now working with this group (along with Peter Tang of Argonne National Laboratories) on this solution. Intel is also working with compiler vendors to help them incorporate the software fix in their products. This means that as software developers incorporate the method into their program users will be able to use the revised software even on buggy Pentiums and can be confident in the accuracy of the divisions. Unfortunately, it is not possible to create a binary patch (i.e. a program which a user could run to patch an arbitrary program to incorporate this fix). Thus users will have to rely on software developers to include the fix and release updated versions of their programs. Once compilers have included the fix, then all programs compiled with those compilers will similarly not be affected by the bug. A copy of Cleve Moler's post to comp.sys.intel, complete with source code and a detailed explanation, is on the MathWorks' WWW server: http://www.mathworks.com/ Thanks to all those responsible for this solution for making it publicly available. 15) Where can I get more information? Intel's WWW server is at: http://www.intel.com/ Intel also has an automated fax back service. You can call them at (800) 525-3019 and request document #9788 for a statement regarding the bug. Their technical support can be reached at (800) 628-8686 or (916) 356-3551. Additionally, I have been given the following international numbers to contact Intel, but have not confirmed them myself. In Germany try 0130 / 81 89 21 and in the UK, +44 / 17 93 43 11 55. Information the author of this FAQ has collected (including lists of the known single-precision bug cases and programs to generate them) is available for anonymous ftp at: ftp://www.isi.edu/pub/carlton/pentium/ The latest version of this FAQ is available at: ftp://www.isi.edu/pub/carlton/pentium/FAQ Bill Broadley of UC Davis has also collected information about the bug and made it available for anonymous ftp at: ftp://math.ucdavis.edu/pub/fdiv/ The MathWorks, Inc. has several documents related to the Pentium available on the WWW: http://www.mathworks.com/Pentium/README.html EE Times is on the WWW at: http://www.wais.com/techweb/eet/current/hr.html Edward Vielmetti of Msen Inc. has some documents available at: http://www.msen.com/~emv/pentium/ 16) Where can I find more discussion of the bug? It is beginning to be widely reported in the mass media. It has been covered on CNN and several major newspapers. The principal discussion of the bug on the Internet has taken place in the newsgroup comp.sys.intel. This will likely remain the center of discussion for a while. 17) Acknowledgments Thanks to the following people for their efforts in finding, tracking down, documenting and understanding the bug: Dr. Thomas R. Nicely (nicely@acavax.lynchburg.edu) Andreas Kaiser (ak@ananke.s.bawue.de) Terje Mathisen (Terje.Mathisen@hda.hydro.com) Tim Coe (coe@vitsemi.com) Cleve Moler (cleve@mathworks.com) Edward Vielmetti (emv@Msen.com) and the many readers of comp.sys.intel Thanks also to Intel Corporation for releasing a technical description of the bug and its causes.