Geoff Chappell - Software Analyst
CURRENT WORK ITEM - PREVIEW ONLY
Executing the cpuid instruction with 2 in eax produces meaningful output in all four of the possible registers. The 32-bit Windows kernel uses cpuid leaf 2 in version 5.0 and higher for processors whose vendor string from cpuid leaf 0 is GenuineIntel and in version 6.2 and higher for processors whose vendor string is CentaurHauls. The 64-bit kernel has no code for using cpuid leaf 2.
The Revision History in Intel® Processor Identification and the CPUID Instruction (Application Note 485, apparently no longer available online from Intel) dates Intel’s documentation of cpuid leaf 2 to December 1995. This leaf was then described as producing Configuration Parameters, generally, and Cache Size and Format Information where the documentation gets to the details. This expanded to Cache Size, Format and TLB Information before giving way to Cache Descriptors. This last term is specially apt since the output in eax, ebx, ecx and edx is essentially a collection of single-byte descriptors of the processor’s various caches. Where the Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-L presents the cpuid instruction, it nowadays describes leaf 2 as producing TLB/Cache/Prefetch Information. This too is apt since the most plausible reason that 32-bit Windows persists with cpuid leaf 2 and 64-bit Windows has never bothered with it is to learn how much memory is moved closer to the processor by the prefetchnta instruction: 64-bit Windows has the luxury of taking this granularity to be fixed at 64 bytes, but 32-bit Windows accommodates 32 and 128 too.
Of the many defined descriptors, Windows is interested only in those for caches. Descriptors for what early editions of Application Note 485 listed as an Instruction TLB, Data TLB, “Instruction cache” and even “Data cache” are ignored.
The kernel’s first interest in cpuid leaf 2 was only to find a SecondLevelCacheSize to keep in the processor’s KPCR. In version 5.0, whichever of the recognised cache descriptors comes last is the one whose cache size is saved. Version 5.0 is unconcerned about associativity. Version 5.1 also keeps a SecondLevelCacheAssociativity in the KPCR. If multiple cache descriptors are recognised, the one whose cache size and associativity are saved is the one for which the cache size divided by the associativity is the largest.
Windows 2000 SP3 adds an interest in how many bytes the processor pre-fetches for Non-Temporal Access (NTA). This prefetchnta granularity is 32 bytes by default. Three newly recognised descriptors for 1st-level data caches tell the kernel that the granularity is 64 bytes.
In version 5.1, the kernel starts attending to the line size for some cache descriptors so that it can determine the largest line size of any cache for any processor. This is made readily available to kernel-mode software as the return value of what was then the new KeGetRecommendedSharedDataAlignment function and less readily to user-mode software as output from the NtQuerySystemInformation function when given the information class SystemRecommendedSharedDataAlignment (0x3A). This reported line size is 32 bytes even if no processor’s cpuid leaf 2 reports any caches, and so the line size for any cache descriptor matters only if it is larger than 32.
For brevity, the tables below distinguish releases within some versions just as early or late. The cut-offs are:
A natural expectation not just for Intel and Microsoft but for computer users is that the latest Windows one can buy can get the most from the latest processors one can buy. This cannot happen without some cooperation between Intel and Microsoft, including advance knowledge as if Microsoft has what we might nowadays call an Insider Preview of developments at Intel. To see this at work, the table shows not just which Windows versions recognise each cache descriptor but also when each was added to Intel’s Application Note 485 (as best as seems knowable from comparing revisions and in some cases matching with the Note’s own Revision History). Beware that Windows version order is not chronological order:
Beware also that what Windows interprets for a cache descriptor is not always what Intel documents for what Intel eventually defined for that same cache descriptor. Where these differ, Intel’s are in parentheses.
Descriptor | Cache Size | Associativity | Line Size | Versions | Intel |
---|---|---|---|---|---|
0x22 | 512KB | 4 | 128 bytes | 5.1 and higher | added for 020/021 revision March/May 2002 |
0x23 | 1MB | 8 | 128 bytes | 5.1 and higher | added for 020/021 revision March/May 2002 |
0x24 | 0 (1MB) | 8 (16) | 128 bytes (64) | 5.1 and higher | documented in 2013 |
0x25 | 2MB | 8 | 128 bytes | 5.1 and higher | added for 020/021 revision March/May 2002 |
0x26 | 0 | 8 | 128 bytes | 5.1 and higher | not listed |
0x27 | 0 | 8 | 128 bytes | 5.1 and higher | not listed |
0x28 | 0 | 8 | 128 bytes | 5.1 and higher | not listed |
0x29 | 4MB | 8 | 128 bytes | 5.1 and higher | added for 020/021 revision March/May 2002 |
When version 5.1 was built in August 2001 with a newly recognised range of descriptors from 0x22 to 0x29, Intel was still more than six months from documenting them. The initially defined descriptors, i.e., 0x22, 0x23, 0x25 and 0x29 are all said by Intel to indicate a 3rd-level cache with a 64-byte line size but with two lines per sector, so that the line size as Windows thinks of it is 128 bytes.
The undefined descriptors in the range, i.e., 0x24, 0x26, 0x27 and 0x28, are left to a default of indicating no cache. This is not inconsequential, however, for the kernel does take all the descriptors in the range as indicating a 128-byte cache line. Eventually, though not until 2013, Intel did define cache descriptor 0x24. Perhaps inevitably, Intel did not then follow Microsoft’s pattern. To Intel, cache descriptor 0x24 is for a 16-way set-associative cache of 1MB with just a 64-byte line size.
Descriptor | NTA Granularity | Versions | Intel |
---|---|---|---|
0x2C | 64 bytes | late 5.1 and higher | added for 023 revision March 2003 |
Chronologically, the first recognition of cache descriptor 0x2C is in March 2003 for version 5.2, timed very closely with Intel’s documentation of it. Intel has 0x2C as encoding an 8-way set-associative 1st-level data cache of 32KB with a 64-byte cache line. The only interest that the Windows kernel has in such a small cache is for efficient prefetching.
Descriptor | Cache Size | Associativity | Line Size | Versions | Intel |
---|---|---|---|---|---|
0x41 | 128KB | 4 | 5.0 and higher | added for 004 revision December 1995 | |
0x42 | 256KB | 4 | 5.0 and higher | added for 004 revision December 1995 | |
0x43 | 512KB | 4 | 5.0 and higher | added for 004 revision December 1995 | |
0x44 | 1MB | 4 | 5.0 and higher | added for 005 revision December 1996 | |
0x45 | 2MB | 4 | 5.0 and higher | added for 008 revision January 1998 | |
0x46 | 4MB | 4 | (64 bytes) | 5.0 and higher | added for 029 revision March 2005 |
0x47 | 8MB | 4 (8) | (64 bytes) | 5.0 and higher | added for 029 revision March 2005 |
0x48 | 16MB (3MB) | (12) | (64 bytes) | 5.0 only | added for 033 revision November 2008 |
0x49 | 32MB (4MB) | (16) | (64 bytes) | 5.0 only | added for 030 revision January 2006 |
Cache descriptors 0x41 to 0x45 were well established in Intel’s literature long before Windows, for version 5.0, first thought to care. They each are said to encode a 4-way set-associative 2nd-level cache with a 32-byte cache line. They differ in the cache size, which doubles from one to the next. Microsoft apparently anticipated that this doubling might continue for another four descriptors. Version 5.1 dialed back the optimism to just two more descriptors.
Of course Intel’s much later definitions of these descriptors do not follow the pattern. Both 0x46 and 0x47 encode 3rd-level caches with the cache sizes that Microsoft expected, but they have 64-byte cache lines and 0x47 is 8-way set-associative, not 4. It is perhaps just as well that cache descriptors 0x48 and 0x49 are recognised only by version 5.0. The corresponding caches have very much the “wrong” sizes—3MB and 4MB (12- and 16-way set-associative), respectively—and have 64-byte cache lines. The cache that’s represented by 0x49 is said by Intel to be a 2nd-level cache ordinarily but 3rd-level for family 15 model 6.
Descriptor | Cache Size | Associativity | Line Size | Versions | Intel |
---|---|---|---|---|---|
0x4A | 4MB (6MB) | 8 (12) | 64 bytes | late 5.2 and higher | added for 030 revision January 2006 |
0x4B | 6MB (8MB) | 12 (16) | 64 bytes | late 5.2 and higher | added for 030 revision January 2006 |
0x4C | 8MB (12MB) | 16 (12) | 64 bytes | late 5.2 and higher | added for 030 revision January 2006 |
This sequence of descriptors for 3rd-level caches with 64-byte cache lines does not appear in Intel’s Application Note 485 until almost a year after recognition was built into the kernel for Windows Server 2003 SP1.
Descriptor | NTA Granularity | Versions | Intel |
---|---|---|---|
0x66 | 64 bytes | late 5.0 and higher | present in 017 revision February 2001 |
0x67 | 64 bytes | late 5.0 and higher | present in 017 revision February 2001 |
0x68 | 64 bytes | late 5.0 and higher | present in 017 revision February 2001 |
These 1st-level data caches represented by cache descriptors 0x66 to 0x68 are the first from which the kernel infers a granularity for prefetching. All are 4-way set associative with varying sizes 8KB, 16KB and 32KB. All that matters to the kernel is that prefetching a cache line will move 64 bytes.
Descriptor | Cache Size | Associativity | Line Size | Versions | Intel |
---|---|---|---|---|---|
0x78 | 1MB | 4 | 64 bytes | late 5.2 and higher | added for 026 revision May 2004 |
0x79 | 128KB | 8 | 128 bytes | 5.1 and higher | present in 017 revision February 2001 |
0x7A | 256KB | 8 | 128 bytes | 5.1 and higher | present in 017 revision February 2001 |
0x7B | 512KB | 8 | 128 bytes | 5.1 and higher | present in 017 revision February 2001 |
0x7C | 1MB | 8 | 128 bytes | 5.1 and higher | present in 017 revision February 2001 |
0x7D | 2MB | 8 | 64 bytes | late 5.2 and higher | added for 026 revision May 2004 |
0x7F | 512KB | 2 | 64 bytes | late 5.2 and higher | added for 026 revision May 2004 |
The caches that Windows takes as having 128-byte cache lines are again said by Intel to have a “64-byte line size, two lines per sector”.
Descriptor | Cache Size | Associativity | Line Size | Versions | Intel |
---|---|---|---|---|---|
0x81 | 128KB | 8 | 5.0 and higher | not listed | |
0x82 | 256KB | 8 | 5.0 and higher | present in 017 revision February 2001 | |
0x83 | 512KB | 8 | 5.0 and higher | added for 018 revision June 2001 | |
0x84 | 1MB | 8 | 5.0 and higher | added for 015 revision May 2000 | |
0x85 | 2MB | 8 | 5.0 and higher | added for 015 revision May 2000 | |
0x86 | 4MB | 8 | 5.0 to early 5.2 | see note below | |
512KB | 4 | 64 bytes | late 5.2 and higher | added for 023 revision March 2003 | |
0x87 | 8MB | 8 | 5.0 to early 5.2 | see note below | |
1MB | 8 | 64 bytes | late 5.2 and higher | added for 023 revision March 2003 | |
0x88 | 16MB | 5.0 only | not listed | ||
0x89 | 32MB | 5.0 only | not listed |
As with the sequence that starts at 0x41, Microsoft seems to have anticipated for this sequence at 0x81 that the doubling cache size for successive descriptors might continue well beyond the 2MB that was the largest that’s even roughly contemporaneous with the Windows version that first recognises any. Indeed, none of the descriptors in this range were yet documented by Intel when Windows 2000 was released, let alone while it was being written. Also like the sequence at 0x41, version 5.1 accepts that to anticipate 16MB and 32MB is to let the doubling go too far. Unlike the sequence at 0x41, Windows does notice that the last two remaining in the sequence were eventually defined by Intel without continuing the doubling.
Descriptor | NTA Granularity | Versions | Intel |
---|---|---|---|
0xF0 | 64 bytes | late 5.1 and higher | added for 026 revision May 2004 |
0xF1 | 128 bytes | late 5.1 and higher | added for 026 revision May 2004 |
Newly defined descriptors that are explicitly dedicated to telling how much data the processor prefetches are known first to Windows Server 2003, more than a year ahead of Intel’s documentation.