Geoff Chappell, Software Analyst
This function creates an object for profiling a process’s execution within a specified range of addresses.
NTSTATUS NtCreateProfile ( HANDLE *ProfileHandle, HANDLE Process, PVOID ProfileBase, SIZE_T ProfileSize, ULONG BucketSize, ULONG *Buffer, ULONG BufferSize, KPROFILE_SOURCE ProfileSource, KAFFINITY Affinity);
in version 3.51 and higher, but
NTSTATUS NtCreateProfile ( HANDLE *ProfileHandle, HANDLE Process, PVOID ProfileBase, SIZE_T ProfileSize, ULONG BucketSize, ULONG *Buffer, ULONG BufferSize);
in versions 3.10 and 3.50.
The ProfileHandle argument is the address of a variable that is to receive a handle to the created profile object. This handle can then be given to the NtStartProfile and NtStopProfile functions to start and stop the profiling that this function sets up.
The Process argument limits the profiling to a specified process. This argument can be NULL to profile globally.
The ProfileBase and ProfileSize arguments are respectively the address and size, in bytes, of a region of address space to profile. The 32-bit builds allow a special case in which the ProfileBase is instead a segment address: this applies if the BucketSize argument is zero.
The BucketSize argument selects a granularity for the profiling. Think of the profiled region as an array of buckets. Profiling produces a count of executions that are discovered within each bucket. The function supports buckets whose size in bytes is a power of two. As an argument, the BucketSize is not in bytes but is instead the logarithm base 2 of the size in bytes.
The Buffer and BufferSize arguments are respectively the address and size, in bytes, of a buffer that is to receive the ULONG execution counts for successive buckets while profiling is started but not stopped.
The ProfileSource argument limits the profiling to the specified source.
The Affinity argument limits the profiling to the specified processors in the current processor group. Modern versions require that the specified processors all be active, except that if this argument is -1 it stands for all the active processors in the current processor group, whichever they happen to be.
The function returns STATUS_SUCCESS if successful, else a negative error code.
The NtCreateProfile function and its alias ZwCreateProfile are exported by name from NTDLL in version 3.10 and higher. In kernel mode, where ZwCreateProfile is a stub and NtCreateProfile is the implementation, neither is exported.
Neither NtCreateProfile nor its alias is documented. As ZwCreateProfile, it is declared in the ZWAPI.H file from an Enterprise edition of the Windows Driver Kit (WDK) for Windows 10.
Unusually for native API functions, no repackaging of NtCreateProfile, documented or not, is known in any higher-level user-mode module that is distributed as standard with Windows.
In version 6.1 and higher, the NtCreateProfile function is superseded by NtCreateProfileEx, which is explicitly aware of processor groups. The old function is essentially the new but with the given KAFFINITY translated to a single-element GROUP_AFFINITY array.
The preceding description of the old NtCreateProfile in terms of the new NtCreateProfileEx might pass as complete for versions 6.1 and higher, except for a quirk concerning the interpretation of -1 for the Affinity in a 32-bit call on 64-bit Windows.
From its beginning in version 6.1, the new NtCreateProfileEx requires that the caller specify only active processors (else the function fails, returning STATUS_INVALID_PARAMETER). Before version 6.1, the old NtCreateProfile allows that bits in the Affinity can be set for processors that are not active. The translation from old to new in version 6.1 and higher accommodates this difference to some extent by recognising that -1, as the mask in which all bits are set, will most likely have been intended by callers not as specifying exactly the first 32 or 64 processors but as standing for all processors whether active or not. When the kernel translates the KAFFINITY to a GROUP_AFFINITY for the common implementation, it recognises -1 as having this special meaning and looks up the active processors, via KeQueryGroupAffinity, on behalf of the caller.
However, this interpretation of -1 for the Affinity argument is not built in to WOW64.DLL. When it translates a 32-bit caller’s NtCreateProfile for the 64-bit kernel, it merely widens the 32-bit KAFFINITY to a 64-bit KAFFINITY. The 32-bit 0xFFFFFFFF, intended as all processors, becomes a 64-bit 0x00000000FFFFFFFF, interpreted by the kernel as specifically the first 32 processors. Unless the current processor group actually does have those 32 active processors, the 32-bit NtCreateProfile with -1 for Affinity fails though a 64-bit call would have succeeded.
Before version 6.1, the function behaves differently from the later NtCreateProfileEx in several respects. (For behaviour that is the same, look to the separate documentation of NtCreateProfileEx and read as if the latter existed earlier.)
As noted above, the original NtCreateProfile does not require that the set bits in Affinity select only active processors. To call the function with -1 for Affinity to mean all processors is only natural, and is explicitly supported in later versions when translating the KAFFINITY to a GROUP_AFFINITY for NtCreateProfileEx. However, versions before 6.1 can end up profiling all processors for other values of Affinity too—most notably zero, which might otherwise be rejected immediately as leaving no profiling to be done.
This happens because the NtCreateProfile function does not itself interpret Affinity but merely transfers it to the created profile object. It is not acted on until each subsequent NtStartProfile. Originally, a special case was made only for zero, as meaning that profiling starts for all processors that are active at the time. This had the defect that if a non-zero Affinity selected no active processor, profiling would start but uselessly, with no chance of incrementing any execution count. The correction, in the version 4.0 from Windows NT 4.0 SP4, is that any Affinity that selects no active processor is instead interpreted as selecting all.
The version 4.0 from Windows NT SP4 also corrected two coding oversights in parameter validation.
Earlier versions do not check that BufferSize is non-zero. Implications are not known.
When zero is given as the BucketSize so that the ProfileBase is instead interpreted as a segment address, a non-zero BucketSize is computed from the ProfileSize and BufferSize. This computation depends on BufferSize to be at least four. The early versions, however, do not check. A smaller BufferSize induces a divide-by-zero.