Geoff Chappell - Software Analyst
If only for now, this article is specific to 32-bit Windows (i386 or x86).
Intel’s processors have long provided for the cpuid instruction to return information about the various caches that are built into the processor. The Windows kernel starts collecting this information as of version 5.0.
The particular interest of this article, if only for now, is what the kernel learns about the size and associativity of the second-level (L2) cache while initialising the processor. The results go into the SecondLevelCacheAssociativity and SecondLevelCacheSize members of the KPCR structure. How the kernel gets these results from multiple calls to cpuid is a small lesson in the practical difficulties an operating system’s manufacturer has with supporting processors that evolve rapidly and come from multiple vendors.
Version 5.0 bothered only with Intel. The algorithm remains essentially unchanged, even to version 10.0, but as the possible cases for interpretation grew, the interpretations changed and even though the interpretations were settled ahead of version 6.0 they don’t all match Intel’s literature (which also changes).
The cpuid instruction is here understood as taking a leaf number in the eax register and returning results in eax, ebx, ecx and edx. Given that earlier CPU identification has established from leaf 0 that the vendor is GenuineIntel, the Windows kernel expects cache characteristics from leaf 2. The general idea is that:
To proceed, the kernel re-executes leaf 0 to check that it sets eax to at least 2. Otherwise, leaf 2 is not supported and there can be no cache characteristics to learn. Given that leaf 2 is supported, its first execution is special for returning in al the number of times that cpuid must be executed, each time with 2 in eax, for complete retrieval. Except for that case, each non-zero byte of any register that is returned with the 0x80000000 bit clear is a descriptor. The kernel examines the registers in the order eax, ebx, ecx, edx, and the bytes of each register from least significant to most.
Descriptor | Cache Size | Associativity | Line Size | NTA Granularity | Applicable Versions |
---|---|---|---|---|---|
0x22 | 512KB | 4 | 128 bytes | 5.1 and higher | |
0x23 | 1MB | 8 | 128 bytes | 5.1 and higher | |
0x24 | 0 | 8 | 128 bytes | 5.1 and higher | |
0x25 | 2MB | 8 | 128 bytes | 5.1 and higher | |
0x26 | 0 | 8 | 128 bytes | 5.1 and higher | |
0x27 | 0 | 8 | 128 bytes | 5.1 and higher | |
0x28 | 0 | 8 | 128 bytes | 5.1 and higher | |
0x29 | 4MB | 8 | 128 bytes | 5.1 and higher | |
0x2C | 64 bytes | 5.1 from Windows XP SP2, and higher | |||
0x41 | 128KB | 4 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x42 | 256KB | 4 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x43 | 512KB | 4 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x44 | 1MB | 4 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x45 | 2MB | 4 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x46 | 4MB | 4 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x47 | 8MB | 4 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x48 | 16MB | 5.0 only | |||
0x49 | 32MB | 5.0 only | |||
0x4A | 4MB | 8 | 64 bytes | 5.2 from Windows Server 2003 SP1, and higher | |
0x4B | 6MB | 12 | 64 bytes | 5.2 from Windows Server 2003 SP1, and higher | |
0x4C | 8MB | 16 | 64 bytes | 5.2 from Windows Server 2003 SP1, and higher | |
0x66 | 64 bytes | 5.0 from Windows 2000 SP3, and higher | |||
0x67 | 64 bytes | 5.0 from Windows 2000 SP3, and higher | |||
0x68 | 64 bytes | 5.0 from Windows 2000 SP3, and higher | |||
0x78 | 1MB | 4 | 64 bytes | 5.2 from Windows Server 2003 SP1, and higher | |
0x79 | 128KB | 8 | 128 bytes | 5.1 and higher | |
0x7A | 256KB | 8 | 128 bytes | 5.1 and higher | |
0x7B | 512KB | 8 | 128 bytes | 5.1 and higher | |
0x7C | 1MB | 8 | 128 bytes | 5.1 and higher | |
0x7D | 2MB | 8 | 64 bytes | 5.2 from Windows Server 2003 SP1, and higher | |
0x7F | 512KB | 2 | 64 bytes | 5.2 from Windows Server 2003 SP1, and higher | |
0x81 | 128KB | 8 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x82 | 256KB | 8 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x83 | 512KB | 8 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x84 | 1MB | 8 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x85 | 2MB | 8 | 5.0 and higher (cache size); 5.1 and higher (associativity) |
||
0x86 | 4MB | 5.0 only | |||
8 | 5.1 to 5.2 before Windows Server 2003 SP1 | ||||
512KB | 4 | 64 bytes | 5.2 from Windows Server 2003 SP1, and higher | ||
0x87 | 8MB | 5.0 only | |||
8 | 5.1 to 5.2 before Windows Server 2003 SP1 | ||||
1MB | 8 | 64 bytes | 5.2 from Windows Server 2003 SP1, and higher | ||
0x88 | 16MB | 5.0 only | |||
0x89 | 32MB | 5.0 only | |||
0xF0 | 64 bytes | 5.1 from Windows XP SP2, and higher | |||
0xF1 | 128 bytes | 5.1 from Windows XP SP2, and higher |
Version 5.0 is unconcerned about associativity or line size. Early builds interpret the descriptors only to learn the cache size. The build for Windows 2000 SP3 recognises additional descriptors as telling just of the processor’s capacity to prefetch for Non-Temporal Access (NTA). The NTA granularity affects what was then the newly introduced (but not yet documented) RtlPrefetchMemoryNonTemporal function. At any given time, the granularity that the kernel uses is whatever it last learnt for any processor, defaulting to 32 bytes.
Different versions deal differently with the occurrence of more than one descriptor that is interpreted as specifying a cache size. In version 5.0, the cache size that is adopted for the processor is simply the last that’s found. In version 5.1 and higher, the descriptor that counts for both cache size and associativity is the one for which the size divided by the associativity is greatest.
Of descriptors that are interpreted as specifying a line size, the one that counts is whichever gives the largest that is anyway larger than 64 bytes. The size of the largest cache line on any processor is made readily available to all kernel-mode software as the result of the documented KeGetRecommendedSharedDataAlignment function and less readily to all user-mode software as information produced by the NtQuerySystemInformation function when given the undocumented information class SystemRecommendedSharedDataAlignment.
Processors that have the vendor string AuthenticAMD are supported in version 5.1 and higher. For these, the kernel expects cache characteristics from cpuid leaves 0x80000005 and 0x80000006.
If executing cpuid leaf 0x80000000 returns at least 0x80000005 in eax, then leaf 0x80000005 is supported and executing it returns the NTA granularity in cl.
If executing cpuid function 0x80000000 returns at least 0x80000006 in eax, then leaf 0x80000006 is supported and executing it returns cache characterics in ecx:
For family 6, model 3, stepping 0, the cache size is taken to be 64KB whatever the result from cpuid.
Associativity is encoded according to the following table:
Encoding | Associativity | Applicable Versions |
---|---|---|
0x02 | 2 | 5.1 and higher |
0x04 | 4 | 5.1 and higher |
0x06 | 8 | 5.1 and higher |
0x08 | 16 | 5.1 and higher |
0x0F | 16 | 5.2 and higher |
else | 1 | 5.1 and higher |
Recognition of 0x0F for the high four bits is prevented until version 5.2 because of a coding error (which did not get fixed in chronologically later service packs of version 5.1).
Not until version 6.2 are processors whose vendor string is CentaurHauls even considered for any sort of determination of their cache characteristics. Now that they are, they’re treated exactly as if they’re GenuineIntel.