If only for now, this article is specific to 32-bit Windows (i386 or x86).

Second-Level Cache Support in the Windows Kernel

Intel’s processors have long provided for the cpuid instruction to return information about the various caches that are built into the processor. The Windows kernel starts collecting this information as of version 5.0.

The particular interest of this article, if only for now, is what the kernel learns about the size and associativity of the second-level (L2) cache while initialising the processor. The results go into the SecondLevelCacheAssociativity and SecondLevelCacheSize members of the KPCR structure. How the kernel gets these results from multiple calls to cpuid is a small lesson in the practical difficulties an operating system’s manufacturer has with supporting processors that evolve rapidly and come from multiple vendors.

Intel

Version 5.0 bothered only with Intel. The algorithm remains essentially unchanged, even to version 10.0, but as the possible cases for interpretation grew, the interpretations changed and even though the interpretations were settled ahead of version 6.0 they don’t all match Intel’s literature (which also changes).

The cpuid instruction is here understood as taking a leaf number in the eax register and returning results in eax, ebx, ecx and edx. Given that earlier CPU identification has established from leaf 0 that the vendor is GenuineIntel, the Windows kernel expects cache characteristics from leaf 2. The general idea is that:

To proceed, the kernel re-executes leaf 0 to check that it sets eax to at least 2. Otherwise, leaf 2 is not supported and there can be no cache characteristics to learn. Given that leaf 2 is supported, its first execution is special for returning in al the number of times that cpuid must be executed, each time with 2 in eax, for complete retrieval. Except for that case, each non-zero byte of any register that is returned with the 0x80000000 bit clear is a descriptor. The kernel examines the registers in the order eax, ebx, ecx, edx, and the bytes of each register from least significant to most.

Descriptor Cache Size Associativity Line Size NTA Granularity Applicable Versions
0x22 512KB 4 128 bytes   5.1 and higher
0x23 1MB 8 128 bytes   5.1 and higher
0x24 0 8 128 bytes   5.1 and higher
0x25 2MB 8 128 bytes   5.1 and higher
0x26 0 8 128 bytes   5.1 and higher
0x27 0 8 128 bytes   5.1 and higher
0x28 0 8 128 bytes   5.1 and higher
0x29 4MB 8 128 bytes   5.1 and higher
0x2C       64 bytes 5.1 from Windows XP SP2, and higher
0x41 128KB 4     5.0 and higher (cache size);
5.1 and higher (associativity)
0x42 256KB 4     5.0 and higher (cache size);
5.1 and higher (associativity)
0x43 512KB 4     5.0 and higher (cache size);
5.1 and higher (associativity)
0x44 1MB 4     5.0 and higher (cache size);
5.1 and higher (associativity)
0x45 2MB 4     5.0 and higher (cache size);
5.1 and higher (associativity)
0x46 4MB 4     5.0 and higher (cache size);
5.1 and higher (associativity)
0x47 8MB 4     5.0 and higher (cache size);
5.1 and higher (associativity)
0x48 16MB       5.0 only
0x49 32MB       5.0 only
0x4A 4MB 8 64 bytes   5.2 from Windows Server 2003 SP1, and higher
0x4B 6MB 12 64 bytes   5.2 from Windows Server 2003 SP1, and higher
0x4C 8MB 16 64 bytes   5.2 from Windows Server 2003 SP1, and higher
0x66       64 bytes 5.0 from Windows 2000 SP3, and higher
0x67       64 bytes 5.0 from Windows 2000 SP3, and higher
0x68       64 bytes 5.0 from Windows 2000 SP3, and higher
0x78 1MB 4 64 bytes   5.2 from Windows Server 2003 SP1, and higher
0x79 128KB 8 128 bytes   5.1 and higher
0x7A 256KB 8 128 bytes   5.1 and higher
0x7B 512KB 8 128 bytes   5.1 and higher
0x7C 1MB 8 128 bytes   5.1 and higher
0x7D 2MB 8 64 bytes   5.2 from Windows Server 2003 SP1, and higher
0x7F 512KB 2 64 bytes   5.2 from Windows Server 2003 SP1, and higher
0x81 128KB 8     5.0 and higher (cache size);
5.1 and higher (associativity)
0x82 256KB 8     5.0 and higher (cache size);
5.1 and higher (associativity)
0x83 512KB 8     5.0 and higher (cache size);
5.1 and higher (associativity)
0x84 1MB 8     5.0 and higher (cache size);
5.1 and higher (associativity)
0x85 2MB 8     5.0 and higher (cache size);
5.1 and higher (associativity)
0x86 4MB       5.0 only
8     5.1 to 5.2 before Windows Server 2003 SP1
512KB 4 64 bytes   5.2 from Windows Server 2003 SP1, and higher
0x87 8MB       5.0 only
8     5.1 to 5.2 before Windows Server 2003 SP1
1MB 8 64 bytes   5.2 from Windows Server 2003 SP1, and higher
0x88 16MB       5.0 only
0x89 32MB       5.0 only
0xF0       64 bytes 5.1 from Windows XP SP2, and higher
0xF1       128 bytes 5.1 from Windows XP SP2, and higher

Version 5.0 is unconcerned about associativity or line size. Early builds interpret the descriptors only to learn the cache size. The build for Windows 2000 SP3 recognises additional descriptors as telling just of the processor’s capacity to prefetch for Non-Temporal Access (NTA). The NTA granularity affects what was then the newly introduced (but not yet documented) RtlPrefetchMemoryNonTemporal function. At any given time, the granularity that the kernel uses is whatever it last learnt for any processor, defaulting to 32 bytes.

Different versions deal differently with the occurrence of more than one descriptor that is interpreted as specifying a cache size. In version 5.0, the cache size that is adopted for the processor is simply the last that’s found. In version 5.1 and higher, the descriptor that counts for both cache size and associativity is the one for which the size divided by the associativity is greatest.

Of descriptors that are interpreted as specifying a line size, the one that counts is whichever gives the largest that is anyway larger than 64 bytes. The size of the largest cache line on any processor is made readily available to all kernel-mode software as the result of the documented KeGetRecommendedSharedDataAlignment function and less readily to all user-mode software as information produced by the NtQuerySystemInformation function when given the undocumented information class SystemRecommendedSharedDataAlignment.

AMD

Processors that have the vendor string AuthenticAMD are supported in version 5.1 and higher. For these, the kernel expects cache characteristics from cpuid leaves 0x80000005 and 0x80000006.

If executing cpuid leaf 0x80000000 returns at least 0x80000005 in eax, then leaf 0x80000005 is supported and executing it returns the NTA granularity in cl.

If executing cpuid function 0x80000000 returns at least 0x80000006 in eax, then leaf 0x80000006 is supported and executing it returns cache characterics in ecx:

For family 6, model 3, stepping 0, the cache size is taken to be 64KB whatever the result from cpuid.

Associativity is encoded according to the following table:

Encoding Associativity Applicable Versions
0x02 2 5.1 and higher
0x04 4 5.1 and higher
0x06 8 5.1 and higher
0x08 16 5.1 and higher
0x0F 16 5.2 and higher
else 1 5.1 and higher

Recognition of 0x0F for the high four bits is prevented until version 5.2 because of a coding error (which did not get fixed in chronologically later service packs of version 5.1).

Centaur

Not until version 6.2 are processors whose vendor string is CentaurHauls even considered for any sort of determination of their cache characteristics. Now that they are, they’re treated exactly as if they’re GenuineIntel.