ParseURL

This function identifies the protocol in a supposed URL.

Declaration

HRESULT
ParseURL (
    LPCTSTR pszUrl,
    PARSEDURL *ppu);

The function exists in ANSI and Unicode forms.

Since the PARSEDURL structure appears to be used only for this function, its format is as well given here:

typedef struct tagPARSEDURL {
    DWORD cbSize;
    LPCTSTR pszProtocol;
    UINT cchProtocol;
    LPCTSTR pszSuffix;
    UINT cchSuffix;
    UINT nScheme;
} PARSEDURL, *PPARSEDURL;

Though the nScheme member is formally typed as a UINT, its values are drawn from the URL_SCHEME enumeration. Since the only exposure of this enumeration through exported interfaces seems to be for this function, it too is given here:

typedef enum {
    URL_SCHEME_INVALID          = -1,
    URL_SCHEME_UNKNOWN          = 0,
    URL_SCHEME_FTP,             // 0x01
    URL_SCHEME_HTTP,            // 0x02
    URL_SCHEME_GOPHER,          // 0x03
    URL_SCHEME_MAILTO,          // 0x04
    URL_SCHEME_NEWS,            // 0x05
    URL_SCHEME_NNTP,            // 0x06
    URL_SCHEME_TELNET,          // 0x07
    URL_SCHEME_WAIS,            // 0x08
    URL_SCHEME_FILE,            // 0x09
    URL_SCHEME_MK,              // 0x0A
    URL_SCHEME_HTTPS,           // 0x0B
    URL_SCHEME_SHELL,           // 0x0C
    URL_SCHEME_SNEWS,           // 0x0D
    URL_SCHEME_LOCAL,           // 0x0E
    URL_SCHEME_JAVASCRIPT,      // 0x0F
    URL_SCHEME_VBSCRIPT,        // 0x10
    URL_SCHEME_ABOUT,           // 0x11
    URL_SCHEME_RES,             // 0x12
    URL_SCHEME_MSSHELLROOTED,   // 0x13
    URL_SCHEME_MSSHELLIDLIST,   // 0x14
    URL_SCHEME_MSHELP,          // 0x15
    URL_SCHEME_MSSHELL_DEVICE,  // 0x16
    URL_SCHEME_WILDCARD,        // 0x17
    URL_SCHEME_SEARCH_MS        // 0x18
} URL_SCHEME;

Parameters

The pszUrl argument is the address of the null-terminated string that is to be parsed as a URL.

The ppu argument is the address of a PARSEDURL structure that is to receive details of the parsing. The cbSize member should be set in advance to the size of the structure.

Return Value

The function returns zero for success, else an error code.

Behaviour

If the pszUrl argument is NULL, there is no URL to parse, and the function returns E_INVALIDARG.

If the ppu argument is NULL or the cbSize member is not equal to the size of the expected structure (0x18), then there is no way to return details of the parsing, and the function returns E_INVALIDARG.

If the supposed URL given at pszUrl does not fit the syntax that the function expects of a URL (and which is described shortly), the function returns URL_E_INVALID_SYNTAX (having changed the PARSEDURL structure only by setting NULL into the pszProtocol member).

Otherwise, the function succeeds, returning zero and setting meaningful values into all members of the PARSEDURL structure other than cbSize.

URL Syntax

The function recognises two types of element in a URL, namely a URL prefix and a URL protocol. In each, the only valid characters are the ordinary alphanumeric ASCII characters, the plus sign, the minus sign and the period. A URL prefix consists of the characters of “url” in any mixture of case, followed by any number of valid characters, terminated by a colon. A URL protocol consists of any two or more valid characters, terminated by a colon. The function parses the supposed URL as zero or more URL prefixes, then exactly one URL protocol, and then whatever remains.

The pszProtocol member is pointed into the given URL, to the first character of the protocol. The number of characters that form the protocol, up to but not including the terminating colon, is set into the cchProtocol member.

The pszSuffix member is pointed into the given URL, to the first character of what is considered to be the remainder of the URL, somewhere after the protocol. The number of characters that form this remainder, up to but not including the terminating null, is set into the cchSuffix member.

The following protocols are recognised specifically. Comparison is insensitive to case. Each has a corresponding value in the URL_SCHEME enumeration, to be set into the nScheme member:

Protocol nScheme
about URL_SCHEME_ABOUT (0x11)
file URL_SCHEME_FILE (0x09)
ftp URL_SCHEME_FTP (0x01)
gopher URL_SCHEME_GOPHER (0x03)
hcp URL_SCHEME_MSHELP (0x15)
http URL_SCHEME_HTTP (0x02)
https URL_SCHEME_HTTPS (0x0B)
javascript URL_SCHEME_JAVASCRIPT (0x0F)
local URL_SCHEME_LOCAL (0x0E)
mailto URL_SCHEME_MAILTO (0x04)
mk URL_SCHEME_MK (0x0A)
ms-shell-idlist URL_SCHEME_MSSHELLIDLIST (0x14)
ms-shell-rooted URL_SCHEME_MSSHELLROOTED (0x13)
news URL_SCHEME_NEWS (0x05)
nntp URL_SCHEME_NNTP (0x06)
res URL_SCHEME_RES (0x12)
search-ms URL_SCHEME_SEARCH_MS (0x18)
shell URL_SCHEME_SHELL (0x0C)
snews URL_SCHEME_SNEWS (0x0D)
telnet URL_SCHEME_TELNET (0x07)
vbscript URL_SCHEME_VBSCRIPT (0x10)
wais URL_SCHEME_WAIS (0x08)

If the URL fits the expected syntax but the protocol is not supported, nScheme is set to URL_SCHEME_UNKNOWN (zero). Not all protocols in the preceding list are recognised by all implementations of the function, i.e., in different SHLWAPI versions. The list here is for the version 6.00 from Windows Vista.

In the particular case where the protocol is “file”, if the colon that terminates the protocol is followed immediately by either two or three forward slashes, these do not count as part of the suffix.

Availability

The ParseURL function is exported from SHLWAPI.DLL as ordinals 1 and 2 (for ANSI and Unicode forms respectively) in version 4.70 and higher.

The ANSI and Unicode forms have parallel implementations.

Documentation Status

Though this function dates from as long ago as 1996, it was still not documented by Microsoft in the MSDN Library at least as late as the CD edition dated January 2004.

However, the function does seem to have been documented later in 2004. This article now conforms to Microsoft’s nomenclature. Even while the function was undocumented, the URL_SCHEME enumeration was semi-documented, being defined in SHLWAPI.H from the Platform SDK (e.g., the edition dated July 2002).