Path: fu-berlin.de!math.fu-berlin.de!cs.tu-berlin.de!fauern!xlink.net! howland.reston.ans.net!swrinde!sgiblab!rpal.rockwell.com!news.Stanford.EDU! Sunburn.Stanford.EDU!pratt From: pratt@Sunburn.Stanford.EDU (Vaughan R. Pratt) Newsgroups: comp.sys.intel Subject: Re: PRESIDENT OF INTEL SPEAKS ABOUT BUG Date: 20 Jan 1995 09:30:47 GMT Organization: Computer Science Department, Stanford University. Lines: 183 Message-ID: <3fnvs7$qu4@Radon.Stanford.EDU> References: <3f9qlq$9ds@ornews.intel.com> <3fh47f$mgq@news.jf.intel.com> In article , Anil Das wrote: >In article <3fh47f$mgq@news.jf.intel.com>, proussel@pdx144.intel.com (Patrice >Roussel) writes: >> All the entries of the guess PLA of the Pentium were exercized (something >> you >> can do with few thousand (dividend,divisor) pairs). Unfortunately the >> problem was that some entries were missing :-(. > > What is meant by `missing' here? If dividend/divisor pairs >that use those entries where tested, they will come out as 0 instead >of 2, and the problem would have been found, right? This depends on how they came to be 0. The Intel white paper attributes the missing entries to an error in a script written to download entries into a PLA, in which case such a test would presumably have caught this. However at a lecture I gave on the Pentium bug a week ago at Stanford, William Kahan who was in the audience said that the five entries were missing because it had been proved (erroneously) that they were never accessed by the division algorithm, like all the entries above them in the table. Obviously one cannot test any of the inaccessible entries by running the algorithm, so if you believe these five are inaccessible then you will of course not provide a test case for them. Having heard such conflicting accounts from Intel and Kahan, neither of whom I am competent to contradict, I am now thoroughly mystified as to the real origin of the bug. At this point the only apparently relevant fact about the table that I am 100% certain of is the following. In the five erroneous columns of the PD-plot (these being the ones we can measure, thanks to the bug), the six thresholds (three each in the positive and negative halves) are set to values I have tabulated below. The relevant fact is that these values would be perfectly symmetric about -1 had the erroneous threshold (at the top of the positive half) been set correctly, namely one cell higher. This higher threshold would then not illegally penetrate the 8/3 D line (shown in the Intel white paper), the root cause of the problem. (The reason -1 is the appropriate midpoint is that sampling truncates the partial remainder towards -oo in both halves, by an amount between 0 and 2 thanks to the redundant carry-save representation of the partial remainder. Had an irredundant representation been used the truncation would have been by an amount between 0 and 1 and the appropriate midpoint would have been -0.5. Had the sampled partial remainder been truncated towards oo instead of -oo the midpoint would have been positive instead of negative.) In order for the bug to have been caused by a faulty proof, both those who wrote the proof and those who read it would need to believe that a correct table could have the inner two threshold pairs (between absolute-value quotient digits 0 and 1, and between 1 and 2) perfectly symmetric yet have the outer pair (between 2 and "3") asymmetric. I have been unable to come up with any plausible failure mode for the proof which would reasonably account for confidence in such a peculiarly isolated asymmetry. Perhaps none of the parties examining the table noticed that the inner two pairs of thresholds were symmetric. Or perhaps some reason was advanced for why there should be a single exception to the symmetry that somehow everyone involved found convincing. I find neither of these scenarios terribly plausible, and am therefore mystified as to what form a faulty proof could have taken. Here are the observable thresholds, originally told to me by Tim Coe and which I have since confirmed independently for myself by verifying that any other choice of thresholds would be inconsistent with the observed behavior of the buggy Pentium. I've laid this out here as a transposed PD-plot, with the five observable divisors (17,20,23,26,29) indexing five rows (oriented as columns in the white paper), and with the entries (quotient digits) in each region of the table indexing the columns. The entries in my table are the ranges of chopped (sampled) partial remainder which yield the indicated quotient digits. The six vertical bars in the table separate the seven regions and constitute the thresholds between the regions. (An unambiguous way to realize these as numeric thresholds might be as the mean of the two adjoining chopped partial remainders, e.g. at D=17 the 1-2 threshold, between 11 and 12, could be considered to be set at 11.5.) \ q = 0("-3") -2 -1 0 1 2 0("3") D_5\--------------------------------------------------------------------- 17=1.0001 ...-26 | -25..-14 | -13..-5 | -4..2 | 3..11 | 12..22 | 23... | | | | | | 20=1.0100 ...-30 | -29..-16 | -15..-6 | -5..3 | 4..13 | 14..26 | 27... | | | | | | 23=1.0111 ...-34 | -33..-18 | -17..-6 | -5..3 | 4..15 | 16..30 | 31... | | | | | | 26=1.1010 ...-38 | -37..-20 | -19..-7 | -6..4 | 5..17 | 18..34 | 35... | | | | | | 29=1.1101 ...-42 | -41..-22 | -21..-7 | -6..4 | 5..19 | 20..38 | 39... ok ok ok ok ok ouch The first five thresholds are marked as ok. The sixth, marked ouch, is one too low. Although the 11 other chopped divisors, namely D=16,18,19,21,22,24,25, 27,28,30,31, are not observable via the bug, Tim conjectures that their thresholds are those of their "ceilings" defined as the smallest higher buggy divisor, e.g. D=18 and 19 have the exact same thresholds as D=20. This leaves D=30 and 31 with unmeasured thresholds, but by extending the evident arithmetic progressions these are reasonably conjectured to be 31=1.1111 ...-46 | -45..-24 | -23..-8 | -7..5 | 6..21 | 22..42 | 43... With that regularity the table would only require 6 "fat" columns (one per three chopped divisors, less one at each end) instead of 16 (one per chopped divisor), shrinking the table to 6/16 its original size. Sampling the divisor to determine which of the 6 columns to use would only be done once, at setup. In contrast, sampling the partial remainder is done at every cycle of the algorithm, and it may well be preferable to keep separate positive and negative halves of the table to avoid the extra time required to . That the actual table is asymmetric is very strong evidence that the positive and negative halves are represented separately. (But not conclusive evidence---a simple logic circuit could in principle have been used to lower the top threshold by 1 for positive divisors. Such an ad-hoc addition to the logic is however almost surely excluded on the ground that it would serve no purpose even if the lower threshold *had* been correct.) Here is a slick and easily proved closed form formula for the entries of the positive half of a *correct* table. If D (between 16 and 31) is the chopped divisor, let d = D/3 + 1 (between 6 and 11, serving as an index to the 6 "fat" columns), and let p be the chopped partial remainder, p >= 0. Take the PD-plot table entry at row p and fat column d to be given by the closed form formula (d <= 2*p) + (2*d <= p) where Booleans are numerically 0 if false and 1 if true (as in C). Thus at D = 17 we have d = 17/3 + 1 = 5 + 1 = 6, and we then get for p=2,3,11,12 the values p d <= 2*p 2*d <= p (d<=2*p) + (2*d<=p) ---------------------------------------------------- 2 0 0 0 3 1 0 1 11 1 0 1 12 1 1 2 These four examples show that this closed form formula locates the two inner thresholds exactly where the Pentium puts them, at least for D=17; similar calculations confirm that this formula tracks the Pentium's inner two thresholds perfectly at the other divisors. One can imagine ways to evaluate this formula sufficiently fast during each division cycle as to obviate the need for a precomputed table of its values. This formula makes the inaccessible entries 2. If it is desired to reduce power by setting these to something else, e.g. 0 or 3, this can be safely accomplished using the easily proved fact that, with one exception, they are precisely those entries satisfying 4*d <= p. Note that this formula does *not* produce the notorious bug, which corresponds to using the more complicated formula 4*d <= p+1 (in the positive half), which compromises the rightmost third of each fat column. The one exception mentioned above is for d = 11, p = 43, where thanks to the absence of column D=32 this more complicated formula does not produce the bug, else there would be 6 groups of problem divisors instead of 5. However I would imagine the tiny economy of setting one additional entry to 0 is not worth the trouble and attendant risk of error; stick to 4*d <= p uniformly and let d=11, p=43 stay at 2. The negative half of the table is obtained simply by reflecting the positive half about -1, also easily proved correct (an outline of this proof is in the parenthetical paragraph near the beginning of this message starting "The reason"). Had these simple and easily derived formulas been known for the PD-plot table, all this trouble would have been avoided. There is *nothing* so effective as simplicity to avoid bugs, whether in computers, mathematics, or any other area where embarrassing and/or expensive mistakes are possible. -- Vaughan Pratt FTP: boole.stanford.edu:/pub/ABSTRACTS http://boole.stanford.edu/boole.html --------------------------------------------------------------------------------