Geoff Chappell - Software Analyst
This function queries a value in an open registry key.
DWORD SHQueryValueEx ( HKEY hKey, LPCTSTR pszValue, LPDWORD pdwReserved, LPDWORD pdwType, LPVOID pvData, LPDWORD pcbData);
The function exists in ANSI and Unicode forms.
The hKey argument provides a handle to an open key.
The pszValue argument provides the address of a null-terminated string that names the value to query within the key, or is NULL to query the key’s default value.
The pdwType argument provides the address of a variable that is to receive the data type, e.g., REG_SZ or REG_DWORD. This argument can be NULL to mean that the data type is not wanted.
The pvData argument provides the address of a buffer that is to receive the data. This argument can be NULL to mean that there is no buffer and that the data is not wanted.
The pcbData argument provides the address of a variable that plays different roles on input and output. On input, the variable provides the size of the buffer, in bytes. This size is ignored if pvData is NULL, because there is then no buffer. On output, the variable receives the size of the data, in bytes. This argument can be NULL to mean that the size of the buffer is zero (if a buffer is even given) and that the size of the data is not wanted.
The function returns zero for success, else an error code.
Of particular interest are the cases in which the value is accessible but has more data than can fit in the given buffer, including because there is no buffer:
Although this function and the standard API function RegQueryValueEx have exactly the same prototype, the two do not behave identically. A call to the SHLWAPI function is essentially a call to the standard function but with post-processing, presumably with the idea of improving or even correcting the standard function. There are two general aims:
The ANSI form SHQueryValueExA post-processes RegQueryValueExA. The Unicode form SHQueryValueExW post-processes RegQueryValueExW, if running on an NT version of Windows. The other versions have no functioning RegQueryValueExW. When running on these, SHQueryValueExW translates through SHQueryValueExA, converting the value to ANSI beforehand and converting REG_SZ, REG_EXPAND_SZ and REG_MULTI_SZ data to Unicode afterwards. Constraints arising from these conversions are beyond the scope of this article.
Since the function is very much interested in the type and size of available data, the internal call to RegQueryValueEx is made with local variables to receive this type and size, even if the SHQueryValueEx caller does not want them, i.e., even if either or both of pdwType or pcbData are NULL. This has a possibly unintended consequence for the case in which pvData is not NULL but pcbData is NULL. Though this case is not completely meaningless, it is at best redundant. The standard API function, RegQueryValueEx, would reject it, returning ERROR_INVALID_PARAMETER. This is not discovered by SHQueryValueEx, which tells RegQueryValueEx of a zero-byte buffer at pvData. If data exists for the given value, it is too much for the zero-byte buffer and the function returns ERROR_MORE_DATA despite having no means to communicate how much data.
If the call to RegQueryValueEx succeeds at getting REG_SZ data, then the function follows the advice that the official documentation gives to all callers of RegQueryValueEx: it checks whether the returned data is properly null-terminated. It does this by inspecting the last whole character in the data. In the Unicode case, this is just the last aligned word, i.e., the last word that is a whole number of words into the buffer. In the ANSI case, the sense meant is simply the last byte of data (which is presumably reasonable because zero is not a trail byte in any multi-byte character).
If the last whole character is not a null, including because the data is not large enough for a whole character, then provided that the buffer has sufficient capacity, the function discards any partial character at the end of the data (applicable only to SHQueryValueExW), appends a null and arranges that the size returned at pcbData increases to include what is now the terminating null. If the buffer does not have the extra one or two bytes of space in which to make this correction, the function changes from succeeding to failing, with ERROR_MORE_DATA as the error code. Presumably as an oversight, the function does not then increase the size returned at pcbData to report what extra space is needed.
For SHQueryValueExW only, if the last whole character is a null but the data continues for another byte, then the function adjusts the size returned at pcbData to lose the extra byte.
If the call to RegQueryValueEx succeeds at getting REG_EXPAND_SZ data, the function first fixes any improper termination of the string, as above. It then expands the environment variables in the data. Whether or not the buffer is large enough to hold the whole expansion, the function adjusts the size at pcbData to match whatever is needed for the expansion, and sets the data type at pdwType as REG_SZ (if pdwType is not NULL). If the expansion would be too large for the given buffer, the function changes from succeeding to failing, with ERROR_MORE_DATA as the error code.
If the call to RegQueryValueEx reveals that there is REG_EXPAND_SZ data to get but there is too much of it to fit the given buffer, including because no buffer is given, then if pcbData is not NULL, the function aims to discover how big the buffer needs to be for the expanded data. It repeats the registry query but using a temporary buffer that is large enough for the expected size of data plus a null terminator. If this second query succeeds at getting data, the function expands the environment variables in this data. If the size needed for the expansion is greater than the size of the unexpanded data, the function sets the expanded size as the size to be returned at pcbData.
In all cases, if the given value turns out to have REG_EXPAND_SZ data and pdwType is not NULL, this function reports the data type as REG_SZ.
The preceding implementation details are from inspection of SHLWAPI version 6.0 from Windows XP SP1. The following variations are known from earlier versions. Note that the function was born fully formed in terms of its intended functionality. All its history is of fixing bugs. There are more than might be expected (and I do not swear that the list below is complete), and most of them did not get attended to until a substantial recoding for version 6.0.
In version 4.70, string-type data retrieved by SHQueryValueExW when not running on NT is still in ANSI, having been retrieved that way through SHQueryValueExA but then not converted to Unicode.
In the case where pvData is not NULL but pcbData is NULL, versions before 5.0 fault because the function dereferences a NULL pointer when learning the buffer’s size. The first builds of version 5.0, from before Windows 2000, do not fix this correctly. When the function calls RegQueryValueEx, the local variable it uses for passing the buffer’s size and learning the amount of available data is uninitialised. A more or less random amount of memory at the address given by pvData may be corrupted.
Versions before 6.0 have a very simple notion of ensuring that REG_SZ data is properly null-terminated. If the buffer has space for one more character, the function appends a null to the data. In the Unicode case, the appended null is aligned, so that any odd byte in the data is discarded. In no case is the size returned at pcbData affected.
In version 4.70, if SHQueryValueExW succeeds at getting REG_EXPAND_SZ data, then the expansion of environment variables in the data is defective. For its call to ExpandEnvironmentStringsW, the function passes the buffer’s size as a number of bytes (instead of characters) and interprets the return value as counting bytes (instead of characters). Among the consequences may be an overflow of the temporary buffer or copying only half the expansion to the given buffer.
Versions before 6.0 assume that REG_EXPAND_SZ data is null-terminated, i.e., does not need a null appended for safety before passing to ExpandEnvironmentStrings.
In versions before 6.0, if the function succeeds at getting REG_EXPAND_SZ data but the expansion would be too large for the given buffer, the return value is the last error code set by ExpandEnvironmentStrings. This is ERROR_INSUFFICIENT_BUFFER, not ERROR_MORE_DATA.
In versions before 6.0, if pvData is not NULL but there is too much REG_EXPAND_SZ data to fit the given buffer, then the size returned at pcbData is the size of the unexpanded data. No attempt is made to determine the size that would be needed for the data with environment variables expanded.
In versions before 5.0, if pvData is NULL and REG_EXPAND_SZ data is available, then the size returned at pcbData is the size needed for the data with environment variables expanded. If expansion actually shortens the data (as when variables have long names but short evaluations), then this returned size would not in fact have been large enough for the function’s success, which requires the given buffer to be large enough first for the data as obtained from the registry and then for the data after expanding environment variables.
In version 6.0 from Windows XP SP2, and higher, SHQueryValueEx is re-implemented in terms of the new function SHRegGetValue:
SHRegGetValue (hKey, NULL, pszValue, SRRF_RT_ANY | SRRF_RM_ANY, pdwType, pvData, pcbData);
Since SHQueryValueEx has exactly the same prototype as the standard API function RegQueryValueEx, either it is superfluous or the whole point to its existence is to behave a little differently from the standard function, presumably in some way that someone thought useful. Yet the documentation does not even hint at what this different behaviour might be. The most detail that Microsoft seems to offer on the point is in SHLWAPI.H, specifically in comments immediately before the declarations of SHQueryValueExA and SHQueryValueExW:
// These functions behave just like RegQueryValueEx(), except if the data // type is REG_SZ, REG_EXPAND_SZ or REG_MULTI_SZ then the string is // guaranteed to be properly null terminated. // // Additionally, if the data type is REG_EXPAND_SZ these functions will // go ahead and expand out the string, and "massage" the returned *pdwType // to be REG_SZ.
Why is this left as an obscurity?
Even after finding these comments, what is the programmer to make of this talk that a string found by RegQueryValueEx might not be properly terminated? Microsoft’s documentation of RegQueryValueEx notes that string data “may not have been stored with the proper null-terminating characters”, but it doesn’t explain how this happens, let alone advise how one should compensate for it.
For some indication of how this is non-trivial, consider the following experiment (with the Unicode versions of the functions, running on Windows XP SP1).
Imagine that the string “AB” has been set into the registry (by someone else) and that you have the job of querying it (without knowing for sure what to expect). With the terminating null, as Unicode, this string runs to six bytes.
Ordinarily, the value will have been set by calling RegSetValueExW and providing all six bytes as the REG_SZ data. If you query for fewer than six bytes, you are told that six bytes are available. If you query for six bytes or more, you get the six bytes.
It also works if the value was set by calling RegSetValueExW with a buffer containing all six bytes but with the size specified as only four bytes. This ought not to have got done. Microsoft has long documented that “if the data is of type REG_SZ, REG_EXPAND_SZ, or REG_MULTI_SZ, cbData must include the size of the terminating null character.” However, all NT versions of ADVAPI32 since at least the original Windows NT 4.0 provide for RegSetValueExW to correct a size that does not include the expected terminating null for string-type data. In this case, though only four bytes were specified, a terminating null did follow them and all six bytes got set into the registry. If you query for fewer than six bytes, you are told that six bytes are available. If you query for six bytes or more, you get the six bytes.
So far, so good. Where can it go wrong?
Suppose that whoever set the value never intended the “AB” as a null-terminated string, such that now the A and B are followed by something other than a null character, perhaps by random data or even by nothing (as when the address after the B is invalid). This would be mischievous, but the fact is that RegSetValueExW will have accepted it and set just the four bytes into the registry. What happens when you query the value?
If you call RegQueryValueExW and provide no buffer, the function reports that there are four bytes of data it could deliver to you. If you repeat the call to RegQueryValueExW but now provide four or even five bytes for the data, the four bytes you get are the A and the B. It’s good that you get back whatever it was that someone put in, but it is not null-terminated.
If you call RegQueryValueExW and allow six bytes or more for the data, then although the size is still returned as four, the buffer actually receives six bytes for the A, the B and a terminating null. What happens is that RegQueryValueExW itself notices that the data is not null-terminated but the buffer has enough unused space for slipping in a null character. True, the null character isn’t formally part of the string data, but at least the buffer does contain a null-terminated string. This safety provision is made by all NT versions of ADVAPI32 since the original Windows NT 4.0.
Suppose now that the value was set mistakenly by calling RegSetValueExW with the six-byte string “AB” in the buffer but providing the byte count as three. This may seem exotic, but it is neither mischievous nor unrealistic: it just needs that the programmer carelessly specified the size as a count of characters instead of bytes. Believe it or not, but although an odd number is surely and obviously implausible as a byte count for a Unicode string, RegSetValueExW will have accepted it and succeeded. Moreover, because the two bytes that follow the last whole character within the given size were not both zero, no correction will have been attempted: the string data will have been set into the registry as three bytes.
If you call RegQueryValueExW and provide no buffer, the function reports that there are three bytes of data it could deliver to you. If you repeat the call but now provide three or even four bytes for the data, the three bytes you get are the whole of the A and the low byte of the B. Again, it’s good that you get back whatever it was that someone put in, but you demonstrably cannot rely on registry data of type REG_SZ to be a whole number of characters, let alone for the last to be a null.
If you call RegQueryValueExW and allow five bytes or more for the data, then the three bytes you receive are now the whole of the A and a null byte. Again, RegQueryValueExW has noticed that the data has no terminating null but that one can be appended since the buffer contains enough unused space. Indeed, the buffer contains more than enough unused space, since the appending is aligned to follow the last whole character. Though the size is returned as three, the buffer actually receives four bytes and does contain a null-terminated string. But there is now an even worse problem: is there a partial B in the string data or not?
Something similar, though less problematic, occurs if the string “AB” was passed to RegSetValueExW with the size specified as five bytes. This too is most likely an error in counting, but again RegSetValueExW is not troubled by the odd size. In this case however, the two bytes immediately after the last whole character within the given size were both zero, and the function will have corrected the size by adding two. What actually will have got set into the registry is the six bytes of the string, plus whatever byte followed.
If you call RegQueryValueExW and provide no buffer, the function reports that there are seven bytes of data available. If you call RegQueryValueExW and provide seven bytes or more to receive the data, you do indeed get seven bytes: the A, the B, a null Unicode character and an undefined extra byte. A well-formed string is in there, but ought a careful programmer trust data that is supposedly a Unicode string but has an odd size?
The first problem case is perhaps the easiest. If you call SHQueryValueExW. and provide no buffer, you are told that four bytes are available. Unfortunately, if you call SHQueryValueExW and provide four or five bytes, you get ERROR_MORE_DATA and a suggestion to provide a buffer of four bytes, which is hardly helpful: this is the bug noted above. If you call SHQueryValueExW and provide six bytes or more, you get the six bytes for “AB” as a properly null-terminated string.
In the second problem case above, the data is plainly ill-defined. Inasmuch as the coding in SHQueryValueExW anticipates this case, it opts for believing the data that RegQueryValueExW returns when given more than the minimum space. If you call SHQueryValueExW and provide no buffer, you are told that three bytes are available. If you call SHQueryValueExW and provide exactly three bytes, you get told to provide a buffer of three bytes, again because of an oversight in coding. If you provide at least four bytes, then what you get is the A and a terminating null. In no case do you ever see the incomplete B.
The third problem case is handled best. If you call SHQueryValueExW and provide no buffer, you are told that seven bytes of data are available. If you provide seven bytes or more to receive the data, then SHQueryValueExW gives you six bytes of unambiguously well-formed string data: the A, the B and the terminating null.
Some may say that the best solution to this mess is that all users of RegQueryValueExW should reject any supposedly successful result that seems implausible, e.g., because the size is odd or because the last whole Unicode character is not null. The purpose of this article is not to advise on this, but to point out that retrieving Unicode strings from the registry is problematic in ways that Microsoft’s documentation hints at only vaguely, and that Microsoft has put SHQueryValueEx through several bug fixes and recodings in some attempt to standardise some sort of solution, only to have it all pass without comment in Microsoft’s documentation.
Both ANSI and Unicode forms of the SHQueryValueEx function are exported by name from SHLWAPI version 4.70 and higher. In other words, it is an original SHLWAPI function, exported from all known SHLWAPI versions.
The SHQueryValueEx function has long been documented, though tersely and with the claim that it is available only from version 4.71.
SHLWAPI version 6.0 from Windows XP SP2, and higher, have a function named SHRegGetValue, again with ANSI and Unicode forms, which arguably supersedes SHQueryValueEx.