RtlInitUnicodeString

The RtlInitUnicodeString function initialises a UNICODE_STRING structure as describing a given null-terminated Unicode string.

Declaration

VOID 
RtlInitUnicodeString (
    UNICODE_STRING *DestinationString, 
    PCWSTR SourceString);

Parameters

The required DestinationString argument provides the address of a UNICODE_STRING structure that the function is to initialise.

The optional SourceString argument is the address of a null-terminated Unicode string that is to be represented by the structure. This argument can be NULL to initialise the struture as representing no string.

Availability

The RtlInitUnicodeString function is exported by name from the kernel and from NTDLL in all known versions, i.e., 3.10 and higher.

Documentation Status

The RtlInitUnicodeString function is documented in all known editions of the Device Driver Kit (DDK) or Windows Driver Kit (WDK) since at least the DDK for Windows NT 3.51. Though this documentation is of the kernel-mode function as an export from the kernel, it is mostly applicable to the user-mode implementation too, both being plausibly compiled from the same source file.

Starting with the WDK for Windows 7, Microsoft documents the availability of RtlInitUnicodeString as “Windows 2000 and later versions of Windows.”

Documentation of RtlInitUnicodeString explicitly for user mode was added to the Software Development Kit (SDK) in 2002, concurrently with its declaration in WINTERNL.H, apparently for Microsoft’s compliance with a settlement concerning unfair use of internal Windows APIs by “middleware” products such as Internet Explorer. (For instance, it is linked to by the WININET.DLL version 6.0 from Windows versions 5.1 and 5.2 and Internet Explorer version 6.0.)

Behaviour

The intention of the UNICODE_STRING structure is to keep together both the address and size of a Unicode string, presumably to save on passing them as separate arguments for subsequent work with the string and to save on repeated re-reading of the whole string to rediscover its size. Indeed, the structure keeps two sizes. The Length member is the size in bytes of the array of Unicode characters at Buffer. If this array is null-terminated, which it explicitly need not be, then Length does not count the null terminator. The MaximumLength member is the size in bytes of the memory from Buffer onwards.

The RtlInitUnicodeString function initialises a UNICODE_STRING so that it describes a buffer that contains exactly the given null-terminated Unicode string. Microsoft’s names DestinationString and SourceString for the function’s arguments are unfortunate for suggesting some sort of data transfer. No source string is copied, only its address.

Initialisation

If the SourceString argument is NULL, the DestinationString gets a minimal initialisation to represent nothing: Length and MaximumLength are zero and Buffer is NULL. Ordinarily, SourceString is not NULL, and DestinationString gets initialised such that Length and MaximumLength are respectively the sizes in bytes of the string not counting and counting its null terminator, and Buffer is the string’s address as given.

Long Strings

A complication to the RtlInitUnicodeString function is that Length and MaximumLength are 16-bit. This is not a practical worry when initialising a UNICODE_STRING for a string literal, but is when the SourceString is received as input from some unknown caller and may be unexpectedly large, whether by accident, design or mischief. Early versions do not anticipate this at all. They simply set Length and MaximumLength to the low 16 bits of the computed 32-bit sizes. Starting with version 5.2, if the string including its null terminator exceeds 0xFFFE bytes, then Length and MaximumLength become 0xFFFC and 0xFFFE respectively. Either way, the string is not faithfully represented. The function has no success or failure to signify this misrepresentation.

Although a UNICODE_STRING in general need not have a null terminator after Length bytes of Unicode characters, programmers evidently do sometimes work with those characters as if they belong to a null-terminated string, especially if they know the UNICODE_STRING was prepared by RtlInitUnicodeString. It is true that such preparation means there is a null character somewhere after Buffer, but if the SourceString is arbitrary, then this null terminator is not certainly Length bytes from Buffer and neither is it certainly within MaximumLength bytes of Buffer.

Microsoft started dealing with this problem in version 5.1. Rather than store an incorrect Length and MaximumLength, as do all versions of RtlInitUnicodeString when given too long a string, a new function named RtlInitUnicodeStringEx fails. In version 5.1, this is a user-mode export only. The kernel has it in version 5.2 and higher. Microsoft’s own kernel-mode programming makes extensive use of it as an RtlInitUnicodeString replacement, but the new function (itself) seems never to have been documented.

IRQL

The kernel-mode implementation is in a non-paged section in all versions. Provided that both the UNICODE_STRING and the Unicode string that are addressed through the given arguments are in non-paged memory, the RtlnitUnicodeString function can safely be called at DISPATCH_LEVEL and higher. That it can has been documented by Microsoft since at least the DDK for Windows NT 3.51, but with only the one condition that “the DestinationString buffer is nonpageable.”

Alternatives

For a string whose address and size are known at compile time, as with a string literal, representation by a UNICODE_STRING may better be arranged at compile-time rather than by calling RtlInitUnicodeString at run time. Starting with the DDK for Windows XP, Microsoft supplies helpful macros in NTDEF.H (and reproduces them in some other headers).

RTL_CONSTANT_STRING

When given a compile-time constant Unicode string, the RTL_CONSTANT_STRING macro expands to an aggregate initialiser for a UNICODE_STRING to represent the string. By providing just the right-hand side of a declarator, the macro leaves to the programmer the maximum flexibility for the left side. For example,

DECLSPEC_SELECTANY
extern UNICODE_STRING const UnicodeString = RTL_CONSTANT_STRING (L"String");

and

static WCHAR const String [] = L"String";
static UNICODE_STRING const UnicodeString = RTL_CONSTANT_STRING (String);

or

DECLSPEC_SELECTANY 
extern WCHAR const String [] = L"String";
static UNICODE_STRING UnicodeString = RTL_CONSTANT_STRING (String);

all have their uses, and their pros and cons both for questions of taste and for implications regarding string pooling.

Quirks

Beware that RTL_CONSTANT_STRING is not without surprises. For instance,

PCWSTR Pointer = L"String";
UNICODE_STRING UnicodeString = RTL_CONSTANT_STRING (Pointer);

seems plausible as a mistake by a real-world programmer, yet it compiles without warning. The less plausible

UNICODE_STRING UnicodeString = RTL_CONSTANT_STRING ((PWSTR) NULL);

compiles too. Neither produces anything a programmer seems likely to want.

Documentation Status

Formal documentation of RTL_CONSTANT_STRING is known first from the WDK for Windows Vista. It is there said to replace RtlInitUnicodeString “when passing a constant string.”

DECLARE_CONST_UNICODE_STRING

Another macro, DECLARE_CONST_UNICODE_STRING, is less flexible since it doesn’t provide just the initialiser but the whole declarator. Its first argument names the UNICODE_STRING variable. Its second is a string literal. The expansion of

DECLARE_CONST_UNICODE_STRING (UnicodeString, L"String");

is much like

WCHAR const UnicodeString_buffer [] = L"String";
UNICODE_STRING const UnicodeString = RTL_CONSTANT_STRING (UnicodeString_buffer);

including to append “_buffer” to compose a name for the variable that acts as the buffer. The reverse engineer who encounters such names in public symbol files has a not unreasable inference that they signify the use of this macro.

Note, by the way, that “much like” hides that the real expansion does not use RTL_CONSTANT_STRING but has its own aggregate initialiser. It is here thought that DECLARE_CONST_UNICODE_STRING is older, even though both macros are known first from the DDK for Windows XP.

DECLARE_GLOBAL_CONST_UNICODE_STRING

By the time that the WDK for Windows Vista was released, RTL_CONSTANT_STRING was not just documented, and improved, but was apparently established well enough that yet another macro, named DECLARE_GLOBAL_CONST_UNICODE_STRING, does use RTL_CONSTANT_STRING. What

DECLARE_GLOBAL_CONST_UNICODE_STRING (UnicodeString, L"String");

expands to is very close to

DECLSPEC_SELECTANY 
extern UNICODE_STRING const UnicodeString = RTL_CONSTANT_STRING (L"String");

Either way, this is in most cases the best initialisation of a UNICODE_STRING in read-only data to describe a string literal that is also in read-only data. See especially that with string pooling and COMDAT folding, two UNICODE_STRING variables for the same Unicode string end up in the executable as only structure describing one string. Kernel-mode programmers still care about such efficiencies!