Geoff Chappell - Software Analyst
Windows XP introduced support for the SYSENTER instruction as a fast way to ring 0 and SYSEXIT as a fast way back to ring 3.
In the original implementation, code for entering the system is assembled by the kernel in memory that is shared with user mode. This memory is the page that is addressed at 0xFFDF0000 in kernel mode and at 0x7FFE0000 in user mode. The kernel-mode address is defined for programming (in WDM.H) as SharedUserData and the layout of the shared data is given programmatically (in NTDDK.H) as a KUSER_SHARED_DATA structure. A substantial extension to this structure for Windows XP provided 32 bytes for a SystemCall member (at offset 0x0300) to which the kernel copies suitable code.
Any user-mode software can call a kernel function simply by putting the function’s C-style arguments on the stack, loading EAX with the function number and calling the address 0x7FFE0300, as long as it is done such that the arguments are 8 bytes above ESP when the instruction at 0x7FFE0300 executes. The kernel, through the little bit of code it placed at that address, returns with eip addressing the instruction after the CALL and with esp again pointing 8 bytes below the stacked arguments. Other registers may have changed depending on the calling convention of the function. The function numbers, as is well known, vary with the Windows version and are plainly not intended for general use. In practice, the only user-mode software that is involved so closely with calling the kernel is NTDLL, which wraps this magic into stub functions that other user-mode software may import. For instance, the stub
mov eax,funcnum mov edx,7FFE0300h call edx ret argbytes
(which is what NTDLL actually codes) looks to its caller like a __stdcall function that takes argbytes of arguments on the stack and returns with those arguments removed from the stack. How the code at 0x7FFE0300 gets to the kernel, how the kernel gets back to user mode, and how the code at 0x7FFE0300 gets back to the stub, is nobody’s business but the kernel’s.
Though neat in the sense of interface design, the original implementation did not last long. A change came with the version 5.1 from Windows XP SP2 and the version 5.2 from Windows Server 2003 SP1, and continues for version 6.0.
Instead of the shared user data containing as much as 32 bytes of code, it provides for two pointers, named SystemCall (at offset 0x0300) and SystemCallReturn (at offset 0x0304). Instead of the kernel copying its choice of its own code to the shared user data, it chooses from codings in NTDLL and sets the pointers accordingly. Whatever is chosen, the way to call the kernel is now a little different: put the function’s arguments on the stack, load EAX with the function number and call whatever address is stored at 0x7FFE0300. The following very slightly different stub
mov eax,funcnum mov edx,dword ptr [7FFE0300h] call edx ret argbytes
(which is again how NTDLL actually codes it) has exactly the same effect as before, as far as concerns its callers.
NTDLL’s code for calling the kernel and for handling the return are knowable to the kernel because they are named exports from NTDLL. There are KiFastSystemCall and KiIntSystemCall as a choice of codings for calling the kernel. The KiFastSystemCall function may use SYSENTER, and the kernel chooses it—indeed, requires it—if SYSENTER is available on all processors. The KiIntSystemCall function must be able to work without support for SYSENTER. Whichever is chosen, its absence (e.g., from an old NTDLL) is fatal to Windows, causing the bug check PROCESS1_INITIALIZATION_FAILED. Otherwise, the address of the chosen entry function goes in the SystemCall member of the shared user data. In case KiFastSystemCall does use SYSENTER, it has a companion, named KiFastSystemCallRet. If the kernel chooses to use KiFastSystemCall, then KiFastSystemCallRet must also be exported and its address goes in the SystemCallReturn member of the shared user data.
Note that KiFastSystemCall is not required to use SYSENTER, and that even if it does use SYSENTER, the kernel does not necessarily return by executing SYSEXIT. What concerns the kernel is only that if it is entered at the address it has programmed into the machine-specific registers as the ring 0 target address for SYSENTER, then it returns to user mode at whatever address is in the SystemCallReturn member. How it gets there is nobody’s business but the kernel’s. If it wants to get there by executing an IRET, it may.
That last remark is not just theoretical speculation, nor even an attempt at inferring the design of an interface from inspection of its implementation. There is a significant problem for the kernel’s actual practice: the SYSENTER instruction does not clear the trap flag on the way to ring 0. If an attempt to trace through SYSENTER from user mode is not to frustrate kernel-mode debugging, then the kernel needs to defend against a set trap flag at the instruction that first executes in ring 0. The original implementation does not notice this unless it causes a double fault. The defence in later versions acts earlier, in the debug exception handler, and is correspondingly tidier. Whichever method is used for clearing the trap flag for the kernel’s execution, there remains the problem of restoring it for the debugger in user mode. The original implementation attends to this on returning to user mode, in the code that the kernel has copied to the shared user data. The new implementation does not have this luxury, short of having NTDLL either export yet another function just for this case or vary KiFastSystemCallRet to distinguish whether the trap flag is to be restored. Instead, the kernel actually does return with an IRET, which may indeed be the only way to do it without executing more code in user mode.
Note that the new implementation requires NTDLL to know something of the machinery for reaching the kernel (as it did for Windows 2000 and earlier). The old-fashioned way is to execute interrupt 0x2E, with the function number in EAX and with EDX addressing the stacked arguments. When called from stubs such as those shown above, suitable code is
KiIntSystemCall PROC NEAR STDCALL PUBLIC lea edx,[esp+8] int 2Eh ret KiIntSystemCall ENDP
If calling through SYSENTER instead of INT, some means is needed for the kernel to know what was in ESP when SYSENTER was executed and what should be again in ESP when execution resumes in user mode. The convention is to pass this in EDX with the understanding that the stacked arguments begin 8 bytes above. Suitable code for calling from the same stubs is:
KiFastSystemCall PROC NEAR STDCALL PUBLIC mov edx,esp sysenter KiFastSystemCallRet PROC NEAR STDCALL PUBLIC ret KiFastSystemCallRet ENDP KiFastSystemCall ENDP
Note that the RET instruction for KiFastSystemCallRet could be anywhere. The nesting within KiFastSystemCall is just a neatness, to model that the return will appear to have come from KiFastSystemCall.
It probably does not escape the attention of hackers that accommodation of SYSENTER means that each process’s system calls are all made from one place and all return to another one place, and that both places are either at fixed addresses or are easily learnt from fixed addresses.
Of the two implementations, the original is more secure in one sense. The places for call and return are at fixed addresses but both are in the shared user data, which is read-only to user-mode code. Moreover, the kernel defends against attempts to change this protection. See for instance that a user-mode debugger cannot set breakpoints on this code. On the down side is that this implementation has the machine execute code on a page that is otherwise all data.
Executing data is generally not desirable and there is hardware support for protecting against it. In introducing Data Execution Prevention (DEP) as a feature for Windows XP SP2 and Windows Server 2003 SP1, Microsoft will have picked up a choice: exempt the page of shared user data from DEP or recode how user mode calls kernel mode. So, now, the shared user data is not just read-only in user mode but also no-execute (if this feature is enabled). But the places for all calls and returns are both in the NTDLL code, where they have no particular protection and can be discovered very easily from the pointers in the shared user data.
In the rush for security that consumed Microsoft’s attention in the mid-2000s, did Microsoft actually create an opportunity for hackers? It might have been better had Microsoft kept more to the first implementation but with a new page of shared user code.
Testing for the SYSENTER and SYSEXIT instructions is not quite the simple matter of executing the CPUID instruction with 1 in EAX and testing for the SEP bit (0x0800) in the feature flags that are returned in edx. Intel’s literature is plain that more is required, though vague about why. (See for instance the Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: Instruction Set Reference N-Z or section 3.1.2.4 of the application note Intel Processor Identification and the CPUID Instruction.) What is stated is that the Pentium Pro may indicate support for the feature without actually having it, and a test is recommended which suggests that the problem also affects early steppings of the Pentium II. It may now be long past mattering, but what isn’t said is whether these processors mean the set SEP bit to indicate some other feature (which was perhaps subsequently dropped) or whether the feature is implemented but defective. Either way, the Windows kernel does not use SYSENTER and SYSEXIT on a GenuineIntel processor that is not at least family 6, model 3, stepping 3.