RtlPrefetchMemoryNonTemporal

This function suggests to the processor that specified memory be moved into the processor’s cache in anticipation of Non Temporal Access (NTA).

Declaration

VOID
FASTCALL
RtlPrefetchMemoryNonTemporal (
    PVOID Source,
    SIZE_T Length);

Behaviour

If activated for non-trivial behaviour, the RtlPrefetchMemoryNonTemporal function feeds successive addresses in the given buffer to the prefetchnta instruction, advancing each time by the number of bytes that has been determined as the Prefetch NTA Granularity. For the 32-bit kernel, the granularity is 32 bytes by default but is overrridden if suitable second-level cache support is discovered when initialising the kernel for a processor. For the 64-bit kernel, the granularity is hard-coded as 64 bytes.

In 32-bit versions before 6.2, RtlPrefetchMemoryNonTemporal is initially trivial. Its first instruction is a ret. If all processors support the FXSR and SSE features, the kernel activates the function by patching the ret to a nop. Later versions do not play at this, since both these CPU features are sure to be present: without them, the kernel would have stopped, raising the bug check UNSUPPORTED_PROCESSOR (0x5D). As an aside, note that the kernel goes to no special trouble to patch itself: the kernel’s executable image, even pages that contain only code, is writable.

Availability

The RtlPrefetchMemoryNonTemporal function is exported by name from the Windows kernel in the build of version 5.0 from Windows 2000 SP3, and higher.

The function is in the kernel’s .text section and is therefore not liable to be paged out. Indeed, it is callable at any IRQL.

Documentation Status

The function is documented, but was not immediately so. It is not mentioned in documentation from any known Device Driver Kit (DDK) for Windows XP or Windows Server 2003, though it is declared in NTDDK.H from these kits. Documentation in the Windows Driver Kit (WDK) for Windows Vista states explicitly that the function is “available in Windows Server 2003 and later versions of Windows”, though the C-language declaration, now in WDM.H, allows it if targetting Windows 2000 SP3.

Not only does the kernel from Windows 2000 SP3 export the function, it also is the first to try determining the NTA granularity by executing the cpuid instruction, specifically to recognise that some GenuineIntel processors have NTA granularity of 64 or 128 bytes. Microsoft is not known to use the function until Windows XP, so that TCPIP.SYS can prefetch addresses for a receive buffer.