Geoff Chappell - Software Analyst
The eax register is implied input to the cpuid instruction to select what Intel variously names a function or leaf. Possibly in combination with more input in ecx, different leaves produce different output in eax, ebx, ecx and edx to tell of different things about the processor. The question naturally arises of which inputs are valid. Given that the instruction is supported at all, the answer is that leaf 0 is always valid and its output in eax is the maximum valid leaf for the instruction’s further use.
Executing the cpuid instruction with eax beyond the maximum that’s reported by leaf 0 doesn’t cause it to fault—Intel and other manufacturers even document how their processors interpret such input—but programmers surely do better to execute cpuid leaf 0 once to obtain the maximum and then never execute cpuid without checking that the leaf they want is within range. An exception might be made for leaf 1, without which the instruction is barely useful: modern versions of the Windows kernel do indeed take as granted that leaf 1 is within range.
Put aside the elaboration of extended leaves for which the input in eax starts at 0x80000000, and more generally of disjoint ranges of leaves starting at other high values, and you might think there can’t be more to say about eax from cpuid leaf 0. Perhaps the only point to note would be that the kernel doesn’t get each processor’s maximum just the once and keep it in the processor’s KPRCB to avoid ever re-executing cpuid just for leaf 0.
So why make a separate—let alone lengthy—page of what cpuid leaf 0 returns in eax? The answer is that some behaviour from the earliest days of both the instruction and of Windows are just the sort of curiosity that a writer with an interest in the archaeology of software can’t resist recording, not least because they had lasting effects. Even 64-bit Windows 10 must do things now because of decisions that were made a quarter-century ago about interpreting eax from cpuid leaf 0.
The cpuid instruction did not start as a generalised means of obtaining multiple sorts of information about the processor. Its origin was much more closely tied to its name as producing not CPU information but specifically a CPU identifier. The premise here is that at least on the drawing board the cpuid instruction was a simple thing that had no operand, implicit or not, but just loaded the eax register with the same processor identification signature that is the processor’s initial state for the edx register. This surely is the CPU ID that names the instruction. It is nowadays returned in eax from cpuid leaf 1, but because it was known to Microsoft’s programmers as having once been returned independently of eax on input, it presented a problem for how to interpret eax from cpuid leaf 0. The interpretation that Microsoft devised had consequences for at least the next two decades, not just within Windows but for BIOS manufacturers and for Intel’s processors (if not those of other manufacturers too).
That cpuid had been designed as loading the processor identification signature into eax with no use of eax for input was in plain sight for many years. Where the Pentium™ Processor User’s Manual from 1993 presents cpuid in its Instruction Set reference in Volume 3: Architecture and Programming Manual (order number 241430-001), its simplified description in a box at the top of the page is
EAX ← CPU identification information
This is Intel’s notation for only the simplest of instructions. As luck would have it, what was then the very next instruction in alphabetical order gives an example. The cwd instruction does nothing but sign-extend ax into dx, and gets the correspondingly simple description
DX ← sign-extend of AX
The ready explanation is that the simple description in the box for cpuid actually was correct at the time the documentation was prepared, perhaps long before its formal release, and then by oversight outlived the instruction’s development into very much more. The wonder is how long this oversight persisted. It survived to 1999 for the Intel Architecture Software Developer’s Manual, which by then had a separate Volume 2: Instruction Set Reference (order number 243191-002). It is gone, however, by 2000 for the Volume 2: Instruction Set Reference (order number 245471-001) in the slightly renamed IA-32 Intel Architecture Software Developer’s Manual. Its replacement is much more appropriate for an instruction whose reference documentation by then spread over 14 pages:
Returns processor identification and feature information to the EAX, EBX, ECX, and EDX registers, according to the input value entered initially in the EAX register.
Meanwhile, the separate Application Note AP-485 originally titled Intel Processor Identification With the CPUID Instruction (order number 241618) stated very clearly that the processor identification signature that “has been available at reset” and what’s returned in eax from cpuid leaf 1 are one and the same:
With processors that implement the CPUID instruction, the processor signature is available both upon reset and upon execution of the CPUID instruction.
but see that the phrasing doesn’t yet mention the need for 1 in eax as input. Indeed, in Revision 003 from October 1994 (which is the oldest I have yet found) the whole section (3.2) on the processor signature does not hint at dependence on any input. Perhaps this dependence is intended as understood from nearby diagrams, yet the sections immediately before and after (3.1 and 3.3) both open with what’s required in eax as input. Again, the ready explanation is that the section on the processor signature is original text from a time when the cpuid instruction truly didn’t take input.
The only known direct evidence that is credibly from Intel is source code in a file of assembly-language macros that Intel apparently published freely once upon a time and then rethought. The file, named p5masm.mac and dated October 1992, has comments that distinguish “Pre B0 steppings” whose cpuid takes no input and “B0 and later steppings” whose cpuid fits the formally released specification. Presumably, one or more A steppings were manufactured at least for Intel’s own testing. Whatever existed, it looks like Intel soon preferred not to talk of it. For the Pentium, in contast to the 80386 and 80486, Intel became commendably open about listing steppings and errata, yet even an early Pentium® Processor Specification Update (order number 242480-002, dated March 1995) reads as if the first Pentium stepping is B1.
However thin and indirect may be the surviving public record from Intel, there certainly did exist a time when Microsoft’s knowledge of cpuid was that it loads the processor identification signature into eax without needing that anything particular have been loaded into eax first. This is knowable from pre-release builds of Windows NT 3.1 such as can nowadays be found readily on the Internet (at least in part because hobbyists rightly or wrongly treat them as abandonware). To the historian, such binaries are relatively public records of the software’s development. Unlike source code, which a company might reasonably contend is its private thinking that was never meant to be seen outside, these binaries plainly were intended to be used by outsiders, even if only by outsiders who were thought sufficiently friendly or self-interested for Microsoft to risk treating as insiders. Some of these binaries evidently got a limited release to help Microsoft generate public attention. No doubt there’d have been an expectation that the public attention would be favourable or that critical feedback would be discreet, but I treat them here as previews of coming attractions and thus as fair game for analysis.
Version 3.10.297.1 built on 28th June 1992 has no knowledge of cpuid, either in the kernel or in the loader (which is where this version does all the work of processor identification). Not four months later, for 12th October 1992, and still most of a year before the formal release of Windows NT 3.1, the kernel for build 3.10.328.1 tests that the processor has changeable AC and ID bits in the eflags. It then executes its one and only cpuid without preparing eax. The output in eax is interpreted as the processor identification signature (though only to learn the family and stepping). By 6th March 1993, build 3.10.397.1 has what has ever since been the familiar doubled execution of cpuid for leaves 0 and 1.
It seems a fair proposition that anyone in the early 90s who was contemplating what to do with cpuid would reasonably have looked at the accumulation of such things as the toggling of eflags bits, the changing specification of inputs and outputs, and later the anticipation of a processor exception, as inviting even more caution than Intel recommended. Early Windows versions have two defences that are Microsoft’s own:
The first presents the lasting trouble.
Starting all the way back at version 3.10—at least from the pre-release version 3.10.397.1 built on 6th March 1993—a cpuid instruction that reports a maximum leaf greater than 3 is disregarded entirely. The processor gets treated as a Pentium that does not have cpuid. It perhaps never will be known for sure, even inside Microsoft after all this time, whether this maximum-of-3 defence was written because any Pentium—perhaps one of those “Pre B0 steppings”—actually was known to return seemingly spurious values in eax or just because imposing some limit seemed like a reasonable precaution after the pre-history described above. Either way, the defence created a hostage for the future if new processors added leaves faster than Microsoft’s customers threw away their old Windows versions.
Unless Intel was somehow to be stopped from adding leaves for new processors, some future Windows version would have to give up the defence. Even then, customers who installed an old Windows on a new processor would find at best that the old Windows was hobbled by its seeing no features that the processor would report through cpuid leaves up to 3. Importantly, it wouldn’t know of instructions and registers whose existence is indicated by set bits in edx from cpuid leaf 1. Windows NT 4.0 already varied its behavour according to eight such feature flags. Windows 2000 knew of yet more, including such desirable new functionality as SSE instructions and the 128-bit XMM registers, and also could use cpuid leaf 2 to prepare better for the processor’s several levels of caching. Were the maximum of 3 retained this far, these Windows versions would in some ways perform less well on new processors than on old ones.
As it happens, things never did get as far as affecting Windows 2000. Push came to shove late in 1999 and the maximum of 3 was removed for Windows NT 4.0 SP6. Even then, it looks like years were yet to pass before any real-world users were capable of being affected. When, after all, did a processor that reports a higher maximum first become available? Until a few years ago, the listing for cpuid in the Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-L included a table of the Highest CPUID Source Operand for successive models of processors. According to this, Intel’s first processor that has a basic cpuid leaf greater than 3 is the “Pentium 4 Processor supporting Hyper-Threading Technology”. Yet this processor wasn’t added to the table until some time in 2003 or 2004.1 Although the availability of these processors was announced in November 2002 through Intel’s news release Intel Delivers Hyper-Threading Technology With Pentium® 4 Processor 3 Ghz Milestone, it’s scarcely credible that even Microsoft had any of these processors to test as early as 1999. Removing the limit in 1999 looks like one of those rare things that both users and manufacturers should be glad of: a fix that was made in advance of users reporting a problem.
This is not to say, though, that no Windows users ever were affected. Of course they were, just not for a few years. For all sorts of reasons, even relatively ordinary computer users sometimes install an old Windows version on a new computer. Although the only people nowadays who would install anything older than Windows NT 4.0 SP6 are hobbyists, software developers will have needed to test their work on Windows NT 4.0 for many years after 1999. Some will have kept old computers. More will have installed it onto new computers, including into virtual machines on new computers. What they will have found is not just that early versions of Windows under-perform on new processors but that Windows NT 4.0 in particular crashes.
To see why, look first at what exactly it means that a processor whose cpuid reports a maximum leaf greater than 3 is treated like a Pentium whose cpuid is unusable. Its meaning in versions 3.10 to 3.51 is straightforward. The processor is a Pentium, which means it has a cr4 register, which the kernel must include when saving and restoring processor state. But cpuid is never executed again. Version 4.0 changed this a little. One change is that it doesn’t infer existence of cr4 from the processor’s being a Pentium, but what’s immediately relevant is that it doesn’t completely ignore cpuid as unusable. Instead, it executes cpuid leaf 1 exactly once. The problem is that it acts on this very early and later gets confused when it executes cpuid leaf 0, sees what it thinks is an implausible response in eax, and concludes that cpuid is unusable.
Where the crash comes in is that if this first execution of cpuid just for leaf 1 finds that the processor has the cmpxchg8b instruction, then the kernel in version 4.0 commits to the instruction’s use and requires that all processors have the instruction. If the later, detailed test for any processor, including the boot processor, determines that cmpxchg8b is missing, then Windows stops with the bug check MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED (0x3E)—yes, even on a single-processor system.
This became a well-known problem because the later test looked for cmpxchg8b support only if the vendor string from cpuid leaf 0 showed the processor as being made by Intel, AMD or Cyrix. Other vendors’ processors that implemented cmpxchg8b looked like they have it for the boot processor according to the early test but not for any other processor, including the boot processor, according to the later test. These other vendors and their customers will have been understandably unhappy with the consequent bug check. Microsoft attended to this in Windows NT 4.0 SP4 by making it that the later test was no longer vendor-specific.
What evidently wasn’t noticed immediately is that the later test misses cmpxchg8b support not just for processors from the wrong vendors but also for new processors whose cpuid support extends beyond leaf 3. Any processor that was new enough to have a cpuid leaf greater than 3 would surely also be new enough to have the cmpxchg8b instruction. Support for this would be seen by the quick-and-early test and missed by the later test, triggering the bug check just as for an older processor from a lessor vendor. My suspicion is that Microsoft realised this, and corrected it in Windows NT 4.0 SP6, not in reaction to an observation but because the programmers revisited cmpxchg8b support in the lead-up to Windows 2000.
However it got attended to, a lesson was learnt: Windows has never since imposed a maximum on the maximum for basic leaves. But the consequences of having done so all those years ago are still with Windows even now, and perhaps even more with Intel and with BIOS manufacturers.
So that new computers could run old Windows versions, Intel arranged that new processors can be configured to have cpuid report a maximum leaf of 3. This is arranged by setting bit 22 in the Model Specific Register IA32_MISC_ENABLE. Of course, this is only needed in processors that would anyway report a maximum leaf higher than 3. The first known documentation is the IA-32 Intel® Architecture Software Developer’s Manual Volume 3: System Programming Guide from 2004 (order number 253668-013).
Some computers offer this controlled reduction of capability as an option in the BIOS setup. It’s not a bad workaround. Version 4.0 anyway doesn’t use a cpuid leaf higher than 1. If you want to run an early Windows on a new computer, then this BIOS option not only lets you do it but lets your early Windows see all the processor features it ever would have. It does, however, create another hostage for the future.
A new Windows on the same computer will want—and, far enough into the future, may need—information from higher cpuid leaves and will be crippled if the BIOS option somehow remains enabled. Starting with version 6.0, Windows makes a point of clearing bit 22 in IA32_MISC_ENABLE on selected models of Intel processor before its initialisation of the processor has proceeded far enough to want higher cpuid leaves.
The other early defence was retained up to and including version 6.2. The wonder is that it isn’t original: it wasn’t introduced until version 3.50. What it seems to address is whether some Pentium, or more likely a 80486, implements the cpuid instruction in too early or reduced a state to use. If only from what Intel documents, the cpuid instruction cannot itself distinguish an 80486 from a Pentium except from the family field in the processor signature that leaf 1 produces in eax. Version 3.10 doesn’t check that cpuid has a leaf 1. Version 3.50 does. Whether this is because a Microsoft programmer got picky about not executing any cpuid leaf other than 0 without checking against the maximum leaf or because a processor with no leaf 1 ever was encountered may never be known, but a processor whose cpuid leaves do not reach 1 is a processor that not only has no usable cpuid but can only be an 80486 (or even 80386). Anything more that is learnt about it is from the methods of CPU Identification Before CPUID.
[1] It’s not in the IA-32 Intel® Architecture Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M (order number 245471-012) from 2003 but is in what appears to be its immediate successor (order number 253666-013).