Geoff Chappell - Software Analyst
The PERFINFO_CCSWAP_BUFFER is one of many types of fixed-size header that begin the data for an event as held in the trace buffers or flushed to an Event Trace Log (ETL) file for an NT Kernel Logger session. The event is specifically PERFINFO_LOG_TYPE_CONTEXTSWAP_BATCH (0x0525). It was introduced for Windows Vista.
The PERFINFO_LOG_TYPE_CONTEXTSWAP_BATCH event exists to trace context swaps, of course, but to do so without the expense of logging an event on every context swap. Here, context swap means a change of thread: a processor switches from running an old thread to a new thread.
For each processor, the kernel accumulates data on successive thread switches that occur on that processor and writes this batch as one event if the latest thread switch satisfies any of several conditions. How these might vary through successive versions is not (presently) accounted here. For the original Windows 10, the conditions are
This anyway is just a summary. The implementation is a little more complicated since the kernel not only tracks each processor separately but also may record each context switch in multiple batches to account for different clock types that are in use by trace sessions that are enabled for the event.
For any particular NT Kernel Logger session to be sent this event, the group masks PERF_CONTEXT_SWITCH (0x20000004) and PERF_COMPACT_CSWITCH (0x20000100) must both be enabled.
The PERFINFO_CCSWAP_BUFFER is not documented. A C-language definition is published in the NTWMI.H header from some editions of the Windows Driver Kit (WDK) for Windows 10.
Data for the PERFINFO_LOG_TYPE_CONTEXTSWAP_BATCH event comprises:
In the PERFINFO_TRACE_HEADER, the Size is the total in bytes of the trace header and all the event data. The HookId is PERFINFO_LOG_TYPE_CONTEXTSWAP_BATCH , which identifies the event.
The Marker is, at its most basic, 0xC0100002 (32-bit or 0xC0110002 (64-bit). Additional flags may be set to indicate that extended data items are inserted between the trace header and the event data. Ordinarily, however, the event data follows as the trace header’s Data array.
The event data itself begins with a fixed-size header. This PERFINFO_CCSWAP_BUFFER is 0x58 bytes in both 32-bit and 64-bit Windows:
Offset | Definition |
---|---|
0x00 |
LONGLONG FirstTimeStamp; |
0x08 |
ULONG TidTable [0x10]; |
0x48 |
SCHAR ThreadBasePriority [0x10]; |
The FirstTimeStamp tells when this batch started. The unit of measurement depends on the trace session’s clock type. Data for each thread switch records only the difference in time from the preceding thread switch.
The TidTable lists the thread ID for every threads that has been seen as the old thread in any thread switches since this batch started. Data for each thread switch identifies the old thread by indexing into this list. When a thread switch occurs and the old thread is not the idle thread, it is added to the list. If the list is full, the existing batch becomes an event and a new batch is started.
The ThreadBasePriority array gives the base priority of each thread at the time it was first switched away from. Data for each thread switch may indicate the old thread’s priority as an increment from this base priority.
The fixed-size header is followed by however much data has accumulated about thread switches since the last batch was logged as an event. The total size allowed for a batch is 0x0400 bytes. When a thread switch occurs and there is not at least eight bytes remaining, the existing batch becomes an event and a new batch is started.
The full form for the data that describes each thread switch is the 8-byte PERFINFO_CCSWAP:
Offset | Definition |
---|---|
0x00 |
ULONG DataType : 2; // 0x00000003 ULONG TimeDelta : 30; // 0xFFFFFFFC |
0x04 |
ULONG OldThreadIdIndex : 4; // 0x0000000F ULONG OldThreadStateWr : 6; // 0x000003F0 ULONG OldThreadPriority : 5; // 0x00007C00 ULONG NewThreadWaitTime : 17; // 0xFFFF8000 |
To save space, however, the data can be present in any of three reduced forms (see below), distinguished by the DataType:
The TimeDelta tells how much time, in the units of the trace session’s clock type, has passed since the preceding thread switch. If too much time passes between thread switches, such that the delta will not fit the allowed 30 bits, the existing batch becomes an event and a new batch is started.
The OldThreadIdIndex identifies the outgoing thread indirectly. It is the thread’s 0-based index into the header’s TidTable. Note that a thread can come and go multiple times in one batch.
The OldThreadStateWr is a compound of the outgoing thread’s WaitReason and State, as read from the KTHREAD. The former tells why the outgoing thread is to wait. It takes its values from the documented KWAIT_REASON enumeration, from zero up to but not including MaximumWaitReason, which is currently 0x27. Values of OldThreadStateWr that are not below this are instead a biased State, specifically the State plus MaximumWaitReason. The State takes its values from the undocumented KTHREAD_STATE enumeration (with a current maximum of 9). Note that for a WaitReason to be shown, the old thread’s State must be Waiting (5).
The NewThreadWaitTime tells how long the incoming thread was waiting, in timer ticks.
When the new thread has been waiting no more than 1 tick and the TimeDelta will fit in 17 bits and the old thread (as will almost always be true) has not increased its priority by more than 7 from the base priority that is recorded for it in the header’s ThreadBasePriority array, all that might go into the 8-byte PERFINFO_CCSWAP can instead fit in the 4-byte PERFINFO_CCSWAP_LITE:
Offset | Definition |
---|---|
0x00 |
ULONG DataType : 2; // 0x00000003 ULONG OldThreadIndex : 4; // 0x0000003C ULONG OldThreadPriInc : 3; // 0x000001C0 ULONG OldThreadStateWr : 6; // 0x00007E00 ULONG TimeDelta : 17; // 0xFFFF8000 |
The OldThreadPriInc is the increase in the old thread’s priority over the recorded base priority.
A different saving applies when the old thread is the idle thread, i.e., the one whose Thread ID is zero. This thread does not figure in the header’s ThreadId array. Data for a thread switch away from the idle thread is in general a 4-byte PERFINFO_CCSWAP_IDLE:
Offset | Definition |
---|---|
0x00 |
ULONG DataType : 2; ULONG TimeDelta : 30; |
However, it too can be compressed, to the 2-byte PERFINFO_CCSWAP_IDLE_SHORT, if only a little time has passed since the last thread switch:
Offset | Definition |
---|---|
0x00 |
USHORT DataType : 2; USHORT TimeDelta : 14; |