The Emon Profile Interface

If the HAL’s initialisation of Hardware Performance Counters establishes that the boot processor is from Intel, has at least some support for Performance Monitoring Counters, and that this support is not masked from Windows by a Microsoft-compatible hypervisor, the HAL chooses the Emon profile interface. The name Emon appears to come from Intel’s manuals, apparently standing for Event Monitoring.

Initialisation

The cpuid leaf for learning about profiling is 0x0A. The low byte returned in eax is the “version ID of architectural performance monitoring” and is known already to be at least 1. The HAL saves this but is not known to make any use of it (nor any, yet, of the fixed-function performance counters that Intel documents as being supported if the version is greater than 1). The middle bytes in eax tell how many general-purpose performance monitoring counters are supported by each logical processor and how wide, in bits, are those counters. The high byte in eax tells how many bits are meaningful in ebx. Each bit that is both meaningful and clear confirms that a corresponding performance event is available. It is already known that at least the first bit is meaningful and clear. Depending on those bits in ebx, some performance events that the Emon profile interface might support instead become unsupported.

Profile Sources

For the purpose of interaction with the kernel, and indeed all the way to user mode through such functions as NtCreateProfile and NtCreateProfileEx, these performance events are abstracted as profile sources, represented numerically by a KPROFILE_SOURCE enumeration. Microsoft’s public definition of this enumeration goes only to 0x18 as ProfileMaximum, presumably having covered sources that are anticipated for arbitrary processor architectures. The Emon profile interface in the HAL from the original release of Windows 10 supports the following:

Value Name EBX Bit Select
0x00 ProfileTime   0x0003003C
0x02 ProfileTotalIssues 1 0x000300C0
0x06 ProfileBranchInstructions 5 0x000300C4
0x0A ProfileCacheMisses 4 0x0003412E
0x0B ProfileBranchMispredictions 6 0x000300C5
0x13 ProfileTotalCycles 0 0x0003003C
0x19 ProfileUnhaltedCoreCycles 0 0x0003003C
0x1A ProfileInstructionRetired 1 0x000300C0
0x1B ProfileUnhaltedReferenceCycles 2 0x0003013C
0x1C ProfileLLCReference 3 0x00034F2E
0x1D ProfileLLCMisses 4 0x0003412E
0x1E ProfileBranchInstructionRetired 5 0x000300C4
0x1F ProfileBranchMispredictsRetired 6 0x000300C5

Microsoft’s names for values below 0x19 are known from the enumeration’s C-language definition in WDM.H from the Windows Driver Kit (WDK). Presumably, the values from 0x19 and higher are omitted from that definition because they are processor-specific and the definition is meant to be general. Names for the Emon-specific profile sources are inferred from descriptive strings in the HAL, which can be obtained even from user mode through ZwQuerySystemInformation when given the information class SystemPerformanceTraceInformation (0x1F) and the secondary information class EventTraceProfileSourceListInformation (0x0D) as the first dword in the information buffer. For the values that Microsoft names in KPROFILE_SOURCE, each name is this descriptive string but with Profile as a prefix. Extrapolation of this relationship to the extra values seems at least a reasonable guess.

For each profile source other than ProfileTime, which is handled by a separate mechanism, if the corresponding bit shown in the column headed EBX Bit is either not meaningful according to the high byte that cpuid leaf 0x0A returned in eax or is set in what the same cpuid leaf returned in ebx, then the profile source becomes regarded as unsupported.

There also corresponds to each profile source a value that must be loaded into a Performance Event Select Register to, well, select the corresponding performance event. Each Performance Event Select Register is a model-specific register beginning at 0x0186, one for each counter that cpuid leaf 0x0A declared. The counters themselves are the model-specific registers beginning at 0xC1. Initially, the Emon profile interface loads zero into each of the declared Performance Event Select Registers.

Note that the Emon-specific profile sources 0x19 to 0x1F are the complete set and are even arranged in ascending order of the EBX Bit that indicates their support. The generally defined profile sources, numbered below 0x19, that the Emon profile interface can support are just those that map to Emon-specific profile sources. The mapping is not one-to-one. Though most of the apparently Emon-specific profile sources are more readily available as architectural sources below 0x19, there are two exceptions: whatever use numbers 0x1B and 0x1C may be, they are available only to those in the know. LLC, by the way, stands for Last Level Cache.

For the sake of completeness, note that the Emon profile interface requires 8 bytes of memory per counter per processor. The number of counters per processor is known from cpuid, as explained above. The number of processors is not known at the time and anyway can change. The HAL allows for the maximum possible number of registered processors, the meaning of which is a small topic in itself. Failure to get the memory, which is almost unthinkable, causes all profile sources to be treated as unsupported.