SKETCH OF HOW RESEARCH MIGHT CONTINUE AND RESULTS BE PRESENTED

The INDEX.DAT File Format

The large-scale structure of an INDEX.DAT file is a header followed by an array of fixed-sized blocks. The header is 0x4000 bytes. The blocks are 0x80 bytes. Blocks are the allocation units for file-map entries. The name file-map entry is here taken from the WININET symbol file, which shows FILEMAP_ENTRY as Microsoft’s name for a structure that every file-map entry begins with. A file-map entry can have any size (less than 64KB) but necessarily consumes as many consecutive whole blocks as needed to contain the entry. As more space is required for file-map entries, blocks are added to the file, always in multiples of 0x4000 bytes. There is an upper limit because the allocation state of the blocks is recorded only in the header. Indeed, the header is mostly a bitmap in which successive bits represent successive blocks. Ignoring other data in the header gives 16MB as an approximate maximum size for an INDEX.DAT file.

File-map entries fall into two broad categories. The sort of entry that the file exists for holds a URL and associates it with other information. Some of this cached information for the URL is stored in the entry itself, but a significant provision is that information to be saved about a URL may be stored as a separate file, called the local file, such that the entry in INDEX.DAT needs only to save some record of where to find the local file. Indeed, the collection of these local files is the cache. What needs to be saved in INDEX.DAT for any one URL is therefore rarely huge but is typically a few blocks big, the size being dominated by the length of strings such as the URL itself and the filename for the cached storage. Each URL, together with its associated information, is usefully saved as its own file-map entry.

The other broad category exists because the file has to support efficient searching for URL entries and also allows for grouping of URL entries. Both purposes need a multitude of small structures that would be wasteful to store in the file as entries in their own right and which may anyway need to be kept together for ready access. Since the file is memory-mapped, a good scale for the notion of ready access is the CPU page size (0x1000 bytes). The INDEX.DAT file therefore has entries that are always page-sized and which hold in turn a collection of small control structures of one sort or another. As an aside, it seems most plausible that 0x80 is chosen as the block size so that a CPU page helpfully corresponds to one dword in the allocation bitmap: free space for a page-aligned page-sized entry could be found just by scanning the bitmap for the first clear dword. File-map entries that are at least a page big are always page-aligned.

The File Header

Microsoft’s name for the file header is not recorded in the public symbols for WININET.

Offset Size Description
0x00 0x1C bytes signature, necessarily “Client UrlCache MMF Ver 5.2”, including null terminator
0x1C dword file size, in bytes
0x20 dword file offset of first page in hash table, else zero
0x24 dword total number of blocks following header
0x28 dword number of allocated blocks
0x2C 4 bytes apparently unused
0x30 qword cache limit, in bytes
0x38 qword cache size, in bytes
0x40 qword cache usage exempt from scavenging, in bytes
0x48 dword number of subdirectories in cache
0x4C 0x0180 bytes array of 0x20 structures, each of 0x0C bytes, to describe subdirectories in cache
0x01CC 0x80 bytes array of 0x20 dwords, apparently called header data
0x024C 4 bytes apparently unused
0x0250 0x3DB0 bytes allocation bitmap for blocks following header

The two members that are marked above as unused are plausibly just artefacts of the programming. This would be directly so for the unused dword at offset 0x2C, which could be compiler-generated padding for the 64-bit alignment of the next member. That the dword at offset 0x024C is unused may indicate that Microsoft’s definition of the header as a formal structure does not include the bitmap. Instead, following a typical practice at Microsoft, the programmer may have defined a single-element array (of bytes or dwords) at offset 0x024C to mark the intention of following the header with something else, even though other code then places the something else not at the marker but after the structure.

Signature

The signature at offset 0x00 is required for an existing INDEX.DAT file to be considered valid and is entered into any INDEX.DAT file that is initialised or re-initialised by this WININET version. It also appears in the registry, in each of several possible keys:

Key: HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Internet Settings\5.0\Cache
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\5.0\Cache
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\5.0\LowCache
Value: Signature
Type: REG_SZ
Data: Client UrlCache MMF Ver 5.2

Its purpose there is to mark that WININET has already determined whether the current user, at the current process’s integrity level, has a per-user Content cache or must use a shared cache. This and other matters of cache configuration are left for a separate article.

Block Allocation

Several fields in the header help with bookkeeping for the allocation bitmap at the header’s end. The size of a valid INDEX.DAT file, as saved at offset 0x1C in the header, is 0x4000 for the header plus 0x80 for each block that’s counted at offset 0x24 in the header. The dword at offset 0x28 tells how many of these blocks are allocated to file-map entries. A block is allocated if the corresponding bit at offset 0x0250 is set. The header has space for at most 0x0001ED80 such bits, for a maximum file size of 0x00F70000 bytes.

Incidentally, some parts of the WININET code allow for a configurable block size. Other parts have hard-coded assumptions about the block size and won’t work correctly unless the block size is 0x80 bytes.

Directories

Local files for the cache are potentially numerous. They can be distributed among as many as 32 randomly named subdirectories of whichever directory holds the INDEX.DAT file. The number of subdirectories yet created for this purpose is saved at offset 0x48 in the file header. The subdirectories themselves are described at offset 0x4C in an array of unnamed structures:

Offset Size Description
0x00 dword number of files in this subdirectory
0x04 8 bytes name of subdirectory, without null terminator

With these descriptions in the file header, each URL entry that has a local file in the cache need not hold a complete pathname for the local file, nor even reproduce the name of the directory that contains the local file, just the filename: a one-byte index into this array suffices for the path.

Header Data

The array at offset 0x01CC provides for indexed storage of an arbitrary dword whose interpretation varies with the index. Although this header data takes space in every INDEX.DAT file, most of it is meaningful in the Content container only.

Index Symbolic Name Interpretation
0x00 CACHE_HEADER_DATA_CURRENT_SETTINGS_VERSION number of changes to any of many WININET settings
0x01 CACHE_HEADER_DATA_CONLIST_CHANGE_COUNT number of changes to container list for same registry set
0x02 CACHE_HEADER_DATA_COOKIE_CHANGE_COUNT number of changes to Cookies container
0x03 CACHE_HEADER_DATA_NOTIFICATION_HWND window handle for cache notifications
0x04 CACHE_HEADER_DATA_NOTIFICATION_MESG window message for cache notifications
0x05 CACHE_HEADER_DATA_ROOTGROUP_OFFSET file offset of first GROUP_ENTRY, else zero
0x06 CACHE_HEADER_DATA_GID_LOW low 32 bits for generation of most recently allocated GROUPID, else zero
0x07 CACHE_HEADER_DATA_GID_HIGH high 32 bits for generation of most recently allocated GROUPID, else zero
0x0E CACHE_HEADER_DATA_SSL_STATE_COUNT potted description needed here!
0x15 CACHE_HEADER_DATA_NOTIFICATION_FILTER bit flags to filter cache notifications
0x16 CACHE_HEADER_DATA_ROOT_LEAK_OFFSET file offset of first leak entry
0x1B CACHE_HEADER_DATA_ROOT_GROUPLIST_OFFSET file offset of first GROUP_LIST_ENTRY, else zero

This list is of all that are meaningful to WININET. The header data cannot be defined exhaustively from inspection of WININET, because of exposure through the exported functions GetUrlCacheHeaderData, IncrementUrlCacheHeaderData and SetUrlCacheHeaderData. Since these functions are undocumented, external users may be few. The only ones supplied with Windows (Vista) are INETCPL.CPL and MSDRM.DLL, and they access only CACHE_HEADER_DATA_SSL_STATE_COUNT, for purposes not yet studied. These functions anyway affect only the Content container.

Perusal of earlier WININET versions confirms that more of the header data used to be meaningful, even as recently as version 6.0.

Indices 0x05, 0x16 and 0x1B point to the start of one or another chain of structures in file-map entries. Much like the dword at offset 0x20 in the header, which points to the hash table, they are essential for navigating the file. This is not so for the other indices. They appear to be kept in the file header because the file can be accessed from multiple processes concurrently and its memory-mapped image is conveniently to hand as shared memory.

The first three indices and 0x0E are global counters. Each counter governs some state that may be maintained by multiple processes but is invalidated for all if changed by one. For index zero, the relevant state is a large collection of settings, mostly loaded from the registry, that have no obvious or direct conection with URL caching.

Indices 0x03, 0x04 and 0x15 support the (undocumented) RegisterUrlCacheNotification function. No software supplied with Windows Vista imports it. That the header data is more valuable as shared memory than as persistent storage is especially marked for index 0x03: it holds a window handle, whose persistence in a file from one Windows session to another really can’t be much use.

File Map Entries

All file-map entries, meaning entries allocated as whole blocks of an INDEX.DAT file, begin with an 8-byte FILEMAP_ENTRY structure:

Offset Size Description
0x00 dword signature
0x04 dword number of blocks allocated to entry

Some types of file-map entry are distinguished by their signature:

“HASH” 0x48534148 page in the hash table
“LEAK” 0x4B41454C leak entry, actually a modified URL entry
“REDR” 0x52444552 redirection entry
“URL ” 0x204C5255 standard URL entry

Here, for easier reading, each dword signature is presented first as characters, starting from the least significant. No signature is set explicitly for the page-sized entries that hold the several structures for supporting groups of URL entries. For these, the signature is 0xDEADBEEF, this being what all dwords in all the blocks for any entry are filled with when the entry is newly allocated (before the count of blocks is recorded at offset 0x04).

Incidentally, the WININET code provides for filling blocks with 0x0BADF00D when they are deallocated from a file-map entry, but the option to do this is never exercised. Code for deleting URL entries is called with either “DEL ” or “UPD ” as an argument, presumably so that this signature can be set into the deleted entry. However, the called code never acts on this argument. Perusal of earlier WININET versions confirms that the code for both these cases used to be active.

The Hash Table

A typical problem for accessing the cache is that a URL is known and information about this URL is either to be retrieved from the cache or saved into the cache. Of course, WININET does not search the whole INDEX.DAT file, nor even just where URLs are known to be stored. Instead, a 32-bit hash is computed of the URL and a small portion of the file is searched for matching hash items. This portion is here called the hash table. It is built as page-sized file-map entries, which are typically scattered through the file.

Each page in the hash table has a 0x10-byte LIST_FILEMAP_ENTRY structure as a header. This begins in turn as a FILEMAP_ENTRY, with “HASH” as its signature and 0x20 as its block count:

Offset Size Description
0x08 dword file offset to next page of hash table, else zero
0x0C dword 0-based serial number of this page within hash table

Pages that are allocated to the hash table are never deallocated. The file offset of the first page is recorded in the file header, at offset 0x20. The hash table is always examined from the first-allocated page to the last, following the links at offset 0x08 and checking each for the correct serial number at offset 0x0C.

Hash Items

The LIST_FILEMAP_ENTRY on each page of the hash table is followed immediately by an array of 8-byte HASH_ITEM structures:

Offset Size Description
0x00 5 bits flags
0x00 1 bit apparently unused
0x00 26 bits high 26 bits of hash
0x04 dword file offset of corresponding file-map entry, else 3

The array of hash items on the page is two-dimensional. Specifically, the hash items are arranged as 64 sets of 7. The 64 comes about because the sets are indexed by the low 6 bits of the hash. That there are 7 items per set is because that’s as big as each set can be for 64 of them to fit the page. Space after the hash items, i.e., from offset 0x0E10, is unused. To enumerate all the URLs about which information is cached in an INDEX.DAT file, the hash table is examined by working upwards through all the hash items on each page of the hash table, from the first-allocated page to the last. To look up a particular URL is also to search the hash table from the first page to the last, but looking only at the seven hash items per page that are selected by the low 6 bits of the URL’s hash.

Since the low 6 bits of the hash are implied by the hash item’s position on the page, the whole hash need not be held in the hash item itself, just the high 26 bits. That leaves each hash item with 6 bits to use as flags.

The hash algorithm for URL caching is presented separately. For most practical purposes, it suffices to know just that the input for the computation is the URL exactly as given, i.e., without case conversion, up to but not including the null terminator, except to ignore at most one trailing forward slash. This last point helps when an original URL is redirected simply by appending a forward slash: with or without the slash, the lookup is the same.

Hash Item Flags

The hash table has a HASH_ITEM for every file-map entry that might be sought from a URL. Such entries come in two types: URL entries and redirection entries. What type a hash item represents is recorded in the flags, along with a few other properties that might usefully be known immediately from the hash item without following the file offset to the file-map entry (which would have to be validated and interpreted).

The low 3 bits of the flags in a HASH_ITEM are more or less, but not formally, a single field. They are sometimes examined for equality after masking by 0x07, but are sometimes tested individually. The interpretation adopted here is that if the 0x01 bit is clear, then the hash item represents a URL entry and the other bits are independent:

0x01 clear: file offset in hash item is of URL entry and hash is of URL
0x02 corresponding URL entry is locked
0x04 corresponding URL entry has trivial redirection

A URL entry is locked while its local file is being accessed, most notably for the RetrieveUrlCacheEntryFile and RetrieveUrlCacheEntryStream functions, which require a subsequent call to UnlockUrlCacheEntryFile or UnlockUrlCacheEntryStream. Nested locking is supported through a count in the URL entry itself, at offset 0x58 (see later). An entry cannot be deleted while locked. An attempt at such deletion may appear to succeed, but the entry is not actually deleted until the final unlock.

The 0x04 flag eases a common case of URL redirection. It means simply that the redirection appended a forward slash. Put another way, the URL that is saved in the URL entry is the original URL plus a forward slash. With such a simple relationship, there’s no need to save the original URL separately as a redirection entry (see later). A search for either URL produces the one hash item for the one URL entry.

When the 0x01 flag is set, the hash item represents something other than a URL entry and the low 3 bits are better interpreted as one field:

0x01 hash item is free;
whole first dword of hash item should be 1
0x03 hash item is unused;
whole first dword of hash item should be 3
0x05 file offset is of redirection entry;
hash is of original URL

Hash items in a new page for the hash table are initialised with 3 in both dwords, presumably just as a quick way to set 0x03 for the flags in the first dword. When searching the hash table, seven items per page, for a particular URL, finding an item that has 0x03 as its first dword means that the search is over and the URL is not in the hash table. If the URL is to be entered into the hash table, then the first free hash item, with 0x01 as its first dword, that was noticed on the way, is allocated to the URL. If there was no free hash item, then the unused item is allocated to the URL. If the search ended at the last page of the hash table without finding an unused item, then the hash table gets a new page.

Group Status

Other flags in a hash item are particular to the grouping of URL entries, and appear to be meaningful only in hash items for URL entries:

0x08 corresponding URL entry belongs to a group
0x10 corresponding URL entry belongs to a list of groups

These flags are not independent: the 0x10 flag is never set unless the 0x08 flag is also set. The 0x10 flag must, of course, be set if the URL entry belongs to more than one group. However, if the URL entry belongs to exactly one group, then this flag can be either set or clear. The difference is in the linkage from entry to group. If the flag is clear, the entry links directly to its group. If the flag is set, the entry links to a list of groups which happens to be a list of one.

The descriptions above are anyway offered only as interpretations of what seems to be intended, not what actually is coded. It can happen that the 0x08 flag is set even though the URL entry does not belong to any group, but this is here taken to be the consequence of a coding error—indeed, of two coding errors. The essential point for both is that when a group is deleted, any entries that belong to the group but which will not be deleted with the group would better not be left still referring to the group. The main such reference is the dword at offset 0x28 in the IE6_URL_FILEMAP_ENTRY (see later). In one case, code that supports the DeleteUrlCacheGroup function clears this dword but does nothing about the 0x08 flag in the hash item. This might not matter—indeed, the intended meaning of the 0x08 flag could be just that the corresponding URL entry may belong to a group—except that some code for enumerating entries takes for granted that if the 0x08 flag is set in the hash item then the dword at offset 0x28 in the URL entry is a meaningful file offset. It just doesn’t defend against this dword being zero. One or other piece of code is faulty, though I must admit I can find no serious consequence, just a quirk:

  1. create a group;
  2. create a URL entry (but avoid any cache entry type that is not in URLCACHE_FIND_DEFAULT_FILTER or INCLUDE_BY_DEFAULT_CACHE_ENTRY);
  3. assign the URL entry to the group;
  4. delete the group (without setting CACHEGROUP_FLAG_FLUSHURL_ONDELETE);
  5. search for URL entries in a fake group whose group ID is 0x5520746E65696C43 and be surprised to see the entry from step 2.

The cumbersome parenthesis at step 2 is not just a necessary condition for triggering the coding error. It hints at a second coding error. For entries whose cache entry type has any set bit that is not in either of the collections URLCACHE_FIND_DEFAULT_FILTER or INCLUDE_BY_DEFAULT_CACHE_ENTRY, the cleanup of links from entry to group isn’t even attempted. Both the dword and the flag persist as if the entry still belongs to the deleted group. Creating this anomalous state is as easy as:

  1. create a group;
  2. create a URL entry of type EDITED_CACHE_ENTRY or SPARSE_CACHE_ENTRY (for instance);
  3. assign the URL entry to the group;
  4. delete the group (with or without setting CACHEGROUP_FLAG_FLUSHURL_ONDELETE).

For confirmation that the state created by these steps genuinely is anomalous, create another group immediately and enumerate it for entries of whatever type you created at step 2. The entry from step 2 magically appears in the new group (whose creation reuses memory that held the definition of the old group, such that the entry’s stale references to those definitions are picked up for the new group).

URL Entries

The main type of file-map entry in an INDEX.DAT file is one that associates a URL with information that is cached for that URL. Each such entry has a fixed-sized header which is followed by variable-sized data, typically strings. The header is an IE6_URL_FILEMAP_ENTRY structure, based on FILEMAP_ENTRY. The signature is typically “URL ” but can be modified to “LEAK” as a special case.

Offset Size Description
0x08 8 bytes last modified time, as FILETIME structure
0x10 8 bytes last access time, as FILETIME structure
0x18 dword expiry time, as DOS time
0x1C dword potted description needed here!
0x20 dword size of local file, in bytes
0x24 dword apparently unused, except for explicit initialisation to zero
0x28 dword file offset of GROUP_ENTRY or LIST_GROUP_ENTRY
0x2C dword in URL entry: exempt delta
in leak entry: file offset of next leak entry
0x30 dword size of structure in excess of FILEMAP_ENTRY, in bytes
0x34 dword offset from start of structure to URL, as saved in entry after header
0x38 byte index of directory containing local file
0x39 byte synchronisation count
0x3A byte potted description needed here!
0x3B byte potted description needed here!
0x3C dword offset from start of structure to name of local file, as saved in entry after header
0x40 dword cache entry type, as bit flags
0x44 dword offset from start of structure to header information, as saved in entry after header
0x48 dword size of header information, in bytes
0x4C dword offset from start of structure to file extension, as saved in entry after header
0x50 dword last synchronisation time, as DOS time
0x54 dword number of times entry has been locked
0x58 dword nesting level of locks on entry
0x5C dword creation time, as DOS time
0x60 dword potted description needed here!
0x64 4 bytes apparently unused

That the last dword may truly be unused is again plausible as a programming artefact. The structure is perhaps defined with a a one-element character array at the end as an allowance for variable-sized data to follow the structure, even though the data actually gets placed after the structure.

Perusal of symbol files for earlier WININET versions confirms that there has been defined an IE5_URL_FILEMAP_ENTRY and, before that, a plain URL_FILEMAP_ENTRY. The byte at offset 0x3A appears to exist, nowadays, only to distinguish an IE6_URL_FILEMAP_ENTRY from an IE5_URL_FILEMAP_ENTRY. It and the byte at offset 0x3B are set to 0x10 for the newer structure and 0x00 for the older. The two structures have the same layout except that the older is only 0x60 bytes. The member at offset 0x60 is not present unless the byte at offset 0x3A is at least 0x10, and is anyway barely used in version 7.0. Indeed, the dword at offset 0x60, the bytes at offsets 0x3A and 0x3B, and even the dwords at offsets 0x24 and 0x30 are all so little used, but with the look of having been more used, that meaningful description ought not be attempted without closer inspection of earlier WININET versions.

Cache Entry Type

The cache entry type at offset 0x40 is a collection of bit flags. The following are generally meaningful:

0x00000001 NORMAL_CACHE_ENTRY set initially for all entries in Content container
0x00000004 STICKY_CACHE_ENTRY entry is exempt from scavenging
0x00000008 EDITED_CACHE_ENTRY local file need not be in cache
0x00010000 SPARSE_CACHE_ENTRY potted description needed here!
0x00100000 COOKIE_CACHE_ENTRY set initially for all entries in Cookies container
0x00200000 URLHISTORY_CACHE_ENTRY set initially for all entries in History container
0x00400000 PENDING_DELETE_CACHE_ENTRY set when deletion is attempted while entry is locked
0x10000000 INSTALLED_CACHE_ENTRY potted description needed here!
0x80000000 IDENTITY_CACHE_ENTRY potted description needed here!

These are the bits that are interpreted, set or cleared by WININET itself while managing URL entries as a file-format feature. All bits in the cache entry type are exposed to external interpretation and control, even at the risk of conflicts with WININET’s own bookkeeping. See especially that the SetUrlCacheEntryInfo function can set the cache entry type in a URL entry to anything (exactly as given in the CacheEntryType member of the INTERNET_CACHE_ENTRY_INFO structure, when either CACHE_ENTRY_ATTRIBUTE_FC or CACHE_ENTRY_TYPE_FC is specified in the dwFieldControl argument).

Perusal of earlier WININET versions suggests that some of these flags have meant more. The INSTALLED_CACHE_ENTRY and IDENTITY_CACHE_ENTRY types look to be particularly affected by a reduction of support in version 7.0, such that description ought not be attempted without closer inspection of earlier versions.

Retrieval Counts

The dwords at offsets 0x54 and 0x58 can be inspected through the GetUrlCacheEntryInfo function, in the dwHitRate and dwUseCount members of the INTERNET_CACHE_ENTRY_INFO structure. Both the hit rate and use count are incremented in the URL entry each time the entry is locked for retrieval of its local file. Only the use count is decremented each time the entry is unlocked.

Interpretation of the hit rate as the number of times the entry has been locked is, however, not strictly justfied. The SetUrlCacheEntryInfo function can set the hit rate to anything (from the dwHitRate member of the INTERNET_CACHE_ENTRY_INFO structure, when CACHE_ENTRY_HITRATE_FC is specified in the dwFieldControl argument).

Leak Entries

When a URL entry is deleted, the corresponding local file, if any, would ideally be deleted too. If it happens that the local file cannot be deleted because of an error that may just be temporary, which means specifically ERROR_ACCESS_DENIED or ERROR_SHARING_VIOLATION, then the URL entry is converted to a leak entry and is removed from the hash table. Leak entries are essentially URL entries with “LEAK” as the signature. Though they are removed from being found as URL entries, they are kept in a list so that deletion of the local file can eventually be re-attempted. The file offset of the leak entry at the head of the list is found from the header data, in the dword indexed by CACHE_HEADER_DATA_ROOT_LEAK_OFFSET. In each leak entry, the member at offset 0x2C is unnecessary (else there would have been no attempt to delete the leak entry while it was a URL entry) and is reused for linking to the next leak entry.

Redirection Entries

When a URL is entered into the cache, as through the CommitUrlCacheEntry function, it can be given together with a URL that it was redirected from, i.e., the original URL. Either URL can be searched for. When the redirection is just a matter of appending a forward slash, the redirection is accommodated by ignoring the forward slash when computing the hash and marking the URL entry’s hash item by setting its 0x04 flag. In general however, both URLs are represented in the hash table. The URL that actually is entered into the cache has a hash item which links to an IE6_URL_FILEMAP_ENTRY. The original URL has a separate hash item that links to a structure which is not named in the public symbol file but is here called a redirection entry. It too is a file-map entry, based on FILEMAP_ENTRY, but with “REDR” as its signature:

Offset Size Description
0x08 dword file offset of hash item for URL entry
0x0C dword hash of (target) URL, but with low 6 bits clear
0x10 varies original URL

The WININET code for creating a redirection entry computes the size of entry as a header of 0x14 bytes plus the original URL as a null-terminated string. Presumably, the structure is defined with a single-element character array at offset 0x10, and in this case the programmer actually does copy the characters to that placeholder instead of to the end of the structure.

Any number of redirection entries may link to one URL entry, to model that any number of original URLs redirect to the same target URL. Perhaps because of this, there is no link back from the URL entry. When a URL entry is deleted, the redirection entries that link to it are left alone. They retain the file offset of a hash item that may be reused, sooner or later, for a different URL entry or even for a redirection entry. The defence is provided by the saved hash at offset 0x0C. A redirection entry is invalid unless the hash item pointed to from offset 0x08 is plausibly still the one the redirection entry was created for. Specifically, the first dword of the supposed hash item must have the 0x01 flag clear (as expected of a hash item for a URL entry) and must have the same hash as saved at offset 0x0C in the redirection entry.

Group Entries

URL entries in a Content container can be grouped. Since groups are not much used nowadays, at least not by Microsoft in software supplied with Windows, a brief review may help. An empty group is created through the exported function CreateUrlCacheGroup, which returns a 64-bit group ID to represent the group in calls to other functions. There is also a built-in group with a preset group ID (used most notably by IEFRAME when caching FAVICON.ICO files). Properties can be set for a group by calling the exported function SetUrlCacheGroupAttribute. URL entries can be assigned to a group through the exported function SetUrlCacheEntryGroup. The most prominent merit to grouping URL entries is that enumeration of URL entries can be refined by supplying the group ID as a search parameter. A less prominent but conceivably very useful feature is that URL entries can be made sticky simply by assigning them to a sticky group. Another is that URL entries assigned to a group can be deleted en masse by assigning them to a group and then deleting the group (with a suitable flag specified). To delete a group, call the DeleteUrlCacheGroup function.

In the INDEX.DAT file format, each group is represented by a 0x28-byte GROUP_ENTRY structure:

Offset Size Description
0x00 qword group ID, else zero in a free entry, or -1 in an index entry
0x08 dword in allocated entry: group flags
in index entry: file offset of first GROUP_ENTRY on next page of such structures, else zero
0x0C dword group type
0x10 qword disk usage, in bytes
0x18 dword disk quota, in kilobytes
0x1C dword in allocated entry: file offset of GROUP_DATA_ENTRY structure containing optional attributes, else zero
in first index entry: file offset of first free GROUP_DATA_ENTRY structure, else zero
0x20 8 bytes apparently unused

The unused space at offset 0x20 may be an alignment artefact. For instance, in anticipation of variable-sized data at the end of the structure, a programmer may have thought to mark the spot with a one-element byte array. A wasteful side-effect, because of members that demand 64-bit alignment, would be that the structure acquires eight more bytes.

Note that a group entry is not a file-map entry. It is too small to justify consuming a whole block. Group entries are instead prepared collectively in page-sized file-map entries. Each such page is a FILEMAP_ENTRY followed immediately by an array of as many GROUP_ENTRY structures as fit the page. The file offset of the first group entry on the first page of group entries is saved in the file header, as the CACHE_HEADER_DATA_ROOTGROUP_OFFSET index in the header data. The last group entry on each page is marked specially as an index entry. It can never represent a group but instead provides the link to the next page of group entries. Group entries are always scanned from the first on a page up to but not including the index entry on that page, repeating for each page, starting from the first page that was ever allocated, proceeding to the most recently allocated.

A group entry is free, for representing a new group, simply because its group ID is zero. Deleting a group frees the corresponding group entry for reallocation to a subsequently created group. (Indeed, deleting a group clears all the bytes of the group entry.) Deleting all the groups that are defined on a page of group entries merely leaves a page of free group entries: once a file-map entry is allocated to hold group entries, it stays allocated.

Flags

The flags at offset 0x08 are acquired only from the dwFlags argument of the CreateUrlCacheGroup function. It would seem then that only two bits can ever be set:

0x01 CACHEGROUP_FLAG_NONPURGEABLE
0x02 CACHEGROUP_FLAG_FLUSHURL_ONDELETE

Neither is directly meaningful. The former records that the group was created to be sticky, but what matters for whether a group actually is sticky is that the 0x1000000000000000 bit is set in the group ID. The other flag can usefully be given to the DeleteUrlCacheGroup function but whether it is set or clear in the group entry appears to be entirely meaningless. Useful or not, the flags as recorded in the group entry can be retrieved through the GetUrlCacheGroupAttribute function, in the dwGroupFlags member of the INTERNET_CACHE_GROUP_INFO structure.

Disk Usage

The disk usage at offset 0x10 is maintained by WININET as the total size of local files for all URL entries that belong to the group. Its current value, converted to KB, can be retrieved through the GetUrlCacheGroupAttribute function, in the dwDiskUsage member of the INTERNET_CACHE_GROUP_INFO structure.

Not Exactly Unused

The type member at offset 0x0C is exactly as accessed through the dwGroupType member of the INTERNET_CACHE_GROUP_INFO structure given to the GetUrlCacheGroupAttribute and SetUrlCacheGroupAttribute functions. Neither function interprets this member in any way. Except for access through these functions, the type appears to be unused.

Group Data Entries

The remaining attributes that can be set for a group through the SetUrlCacheGroupAttribute function are, or can be, relatively substantial. Since they are anyway optional, it would be wasteful to provide for storing them in every group entry. If they ever are set for a group, they are held separately, in a GROUP_DATA_ENTRY structure:

Offset Size Description
0x00 GROUPNAME_MAX_LENGTH bytes group name
0x78 GROUP_OWNER_STORAGE_SIZE dwords owner storage
0x88 dword in allocated entry: zero
in free entry: file offset of next free GROUP_DATA_ENTRY, else zero

Again, GROUP_DATA_ENTRY structures are not file-map entries but are instead prepared collectively in page-sized file-map entries. Each such page is a FILEMAP_ENTRY followed immediately by an array of as many GROUP_DATA_ENTRY structures as fit the page. No page of group data entries is allocated until either a group name or owner storage is set for some group. The dword at offset 0x1C in an allocated GROUP_ENTRY is the file offset of its associated group data entry. Group data entries that are not allocated to a group, i.e., the free entries, are kept in a chain, linked through the member at offset 0x88. The current head of the chain is found from offset 0x1C in the index entry on the first page of group entries. Group data entries are allocated from the head of the chain. When a group data entry is freed, all its bytes are cleared and it is then returned to the head of the chain of free entries.

The name and owner storage at offsets 0x00 and 0x78 are exactly as accessed through the szGroupName and dwOwnerStorage members of the INTERNET_CACHE_GROUP_INFO as given to the GetUrlCacheGroupAttribute and SetUrlCacheGroupAttribute functions. The only interpretation of either member by either function is that SetUrlCacheGroupAttribute checks that a proposed group name is not too large. Except for access through these functions, the group name and owner storage appear to be unused.

List Group Entries

Importantly, granted that groups have any importance at all, a URL entry may be assigned to multiple groups. When this happens, the dword at offset 0x28 in the URL entry no longer shows the way directly to a single GROUP_ENTRY but to a list of them (and the change is marked by setting the 0x10 flag in the URL entry’s hash item). Each element of the list is a LIST_GROUP_ENTRY:

Offset Size Description
0x00 dword file offset of GROUP_ENTRY structure, else zero
0x04 dword file offset of next LIST_GROUP_ENTRY, else zero

These LIST_GROUP_ENTRY structures are prepared collectively in page-sized file-map entries. Each such page is a FILEMAP_ENTRY structure followed immediately by an array of as many LIST_GROUP_ENTRY structures as fit the page. Note that no page of list group entries is allocated until at least one URL entry is assigned to more than one group.

Each list group entry is intended to be always (for all practical purposes) in exactly one list, linked through the dword at offset 0x04. It can be in a list for a URL, in which case the dword at offset 0x28 in the URL entry gives the file offset of the first entry in the list. Otherwise, the list group entry should be in a list of free entries. This free list, once it exists, has a permanent head. The file offset of this head entry is maintained in the file header, as the CACHE_HEADER_DATA_ROOT_GROUPLIST_OFFSET index in the header data.

Applicability

Except where otherwise noted, this article is specific to the 32-bit WININET.DLL version 7.0.6000.16386 from the original Windows Vista.