Code Pages and the Windows 95 IFSMGR

The Windows 95 IFSMGR uses Unicode characters for its internal representation of pathnames. The IFSMGR can also take Unicode characters as input (though this may require use of undocumented extensions, as for instance with the IFSMgr_Ring0_FileIO service), but it is more likely that IFSMGR’s clients will provide pathnames using a more traditional character set, here called a base character set.

In every character set relevant to this discussion, each character in the set can be described by giving a 16-bit value. In the Unicode scheme, that 16-bit value is also the way the character is encoded: Unicode characters are words.

The traditional character sets allow some or possibly all of their characters to be encoded as bytes. A character whose 16-bit value in the character set is less than 0100h can be represented just as a byte and is represented as a byte. A character whose 16-bit value in the character set is ≥ 0100h is represented as a sequence of two bytes. First comes the lead byte, formed from the high 8 bits of the 16-bit value, and then the trail byte (from the low 8 bits). Note that if a character set includes double-byte characters, then some values—specifically, those that can be lead bytes for double-byte characters—become impossible as single-byte characters. Note also that 00h cannot be used as a lead byte. In practice, ASCII compatibility means that no values less than 80h are used as lead bytes.

For any given language, it is desirable that all the characters in frequent use should be representable as single bytes. There are many different base character sets, typically for different languages, often differing most significantly in the choice of the characters that can be represented as single bytes between 80h and FFh. Each of these character sets has a code page ID. A given character may be represented differently depending on the code page, and it may even have no representation at all in some code pages. Conversely, a given 16-bit value may denote different characters depending on the code page ID, and may be invalid for some code pages.

In very broad terms, the base character sets in use under Windows can be classed as ANSI or OEM according to typical usage. ANSI code pages are the ones that can be used by programs running under the Windows graphical environment. Anything else is an OEM code page. It is helpful to regard some of the ANSI code pages as OEM code pages also, since they are often used by software other than Windows programs.

To the IFSMGR however, there is nothing very fundamental about any distinction between ANSI and OEM character sets. All that counts is that at any one time, the IFSMGR supports at most two code pages, and it seems intended that one should be an ANSI code page and the other an OEM code page. The particular code pages are taken from information in either the registry or the SYSTEM.INI file, under the names ACP and OEMCP respectively. The defaults are 437 for the OEM code page and 1252 for the ANSI code page.

Character Set Conversion

The IFSMGR has six tables for the various conversions that it may need while working with characters. The roles of these tables are documented under IFSMgr_GetConversionTablePtrs. The first four tables define the primitive conversions from Unicode to the ANSI or OEM character sets and back. The last two tables support the mapping of Unicode characters to upper case.

The tables concerned with conversion to and from Unicode may be loaded from a file called UNICODE.BIN, which the IFSMGR expects to find in the Windows SYSTEM directory (that is, the directory reported by the Get_Exec_Path service). If the UNICODE.BIN file is not present, the IFSMGR loads default tables for the default code pages, loading from copies that are hard-coded in the IFSMGR initialisation segment.

If the UNICODE.BIN file is present, the IFSMGR determines the code pages it will use henceforth as the OEM and ANSI character sets by looking for the OEMCP and ACP values under the key HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Nls\Codepage in the registry or in the [Intl] section of SYSTEM.INI. Both entries must be found in the registry or both in SYSTEM.INI, else the IFSMGR reverts to the default tables for the default code pages.

Note that the choice of code pages for the interpretation of ANSI and OEM characters is fixed at initialisation. The IFSMGR provides no interface through which it may be informed that a pathname is specified in terms of characters from some different code page. This may surprise, especially since the various drivers and utilities that support the changing of code pages under DOS continue to be supplied with Windows 95.

UNICODE.BIN File Format

The UNICODE.BIN file begins with a count of code pages that the file supports:

Offset Size Description
00h dword number of code page headers that follow

followed immediately by an array of code page headers. Each has the form:

Offset Size Description
00h dword code page ID
04h dword file offset to table for conversion to Unicode
08h dword size of table for conversion to Unicode, measured in bytes
10h dword file offset to table for conversion from Unicode
14h dword size of table for conversion from Unicode, measured in bytes

Tables are read into memory obtained from whichever system heap is currently selected by the HEAPLOCKEDIFDP flag. If the OEMCP and ACP code page IDs are not both supported by the UNICODE.BIN file or if any error occurs while loading the tables from the file, then the IFSMGR reverts to its default tables for the default code pages. (Incidentally, should a file error occur, memory already obtained to hold the tables is not released.)

Note that the IFSMGR in the US edition and in some international releases of Windows 95 assumes that the UNICODE.BIN file provides tables for no more than 18 code pages. The problem is that the IFSMGR reads the code page headers from the UNICODE.BIN file in multiples of 01F8h bytes (that is, in sets of 18 code page headers). When the IFSMGR searches the code page headers for one that matches an OEMCP or ACP value, it saves only the address where the header has been read into a buffer in the IFSMGR initialisation segment. If the code page headers for different OEMCP and ACP values lie in different blocks of 01F8h bytes within the file, then the algorithm will misbehave because reading another set of headers makes nonsense of the address already saved for the first matching header. Some international releases of Windows 95 correct the problem by having the IFSMGR save copies of each matching code page header (so that the information in the header won’t be lost if more headers are read from the file).

The UNICODE.BIN supplied with the US edition of Windows 95 provides conversion tables for just the three code pages 437, 850 and 1252.

Conversion From Unicode

A Unicode character is necessarily two bytes. An ANSI or OEM character may be one byte or two. At least some, and typically many, Unicode characters will have no corresponding characters in a given ANSI or OEM code page. To deal with this efficiently, a table for conversion of Unicode characters into ANSI or OEM characters defines ranges of Unicode characters. A Unicode character that lies outside these ranges has no representation in the given ANSI or OEM character set.

Note however that a range is permitted to have holes. A character may lie within a defined range but be intended to have no corresponding character in the base character set. In simple terms, this is indicated by having the conversion table seem to map the Unicode character to an underscore.

The conversion table starts with a range definition from which a conversion routine may find more range definitions as needed. Note that the ranges can be linked in a way that supports a search by binary separation (so that for a table with 31 ranges, a character need be checked against no more than five range definitions).

Offset Size Description
00h word offset from start of table to definition for a lower range; or zero
02h word offset from start of table to definition for a higher range; or zero
04h word first character in range
06h word last character in range
08h dword mask and bit flag
00000001h Unicode characters in range map to single-byte characters or double-byte characters if bit is set or clear respectively
FFFFFFFEh offset from start of table to look-up array

Conversion of characters included in a given range proceeds by look-up. The offset to the look-up array is the numerical value inside the masked bits of the dword at offset 08h in the range definition. (Shifting the dword at offset 08h to the right by one bit gives the offset to the look-up array. Masking the dword at offset 08h by FFFFFFFEh gives twice the offset to the look-up array.)

Note that all the Unicode characters in a given range map to single-byte characters in the base character set or they all map to double-byte characters in the base character set. The look-up array consists of bytes if the Unicode characters in the range map to single-byte characters. When the Unicode character is in a range of characters that map to double-byte characters, the look-up array consists of words in which the high byte is the lead byte and the low byte is the trail byte.

Except for the underscore itself (character 005Fh), any Unicode character that lies in a range of characters that map to single-byte characters in the base character set is treated as having no conversion if it maps to the value 5Fh. Any Unicode character that lies in a range of characters that map to double-byte characters in the base character set is deemed to have no conversion if it maps to the value 5F5Fh (that is, to a pair of underscores, interpreted as one double-byte character).

Conversion to Unicode

A character in an ANSI or OEM code page may be encoded as one byte or two. A table for conversion from an ANSI or OEM code page to Unicode begins with a 20h-byte bit string to define which values between 00h and FFh represent single-byte characters and which are lead bytes for double-byte characters. The bit is set if the corresponding value is valid as a single-byte character.

The treatment of single-byte and double-byte characters in the base set can be unified by forming a 16-bit character value for use in lookup tables. For a single-byte character, the 16-bit value is the unsigned extension of the single byte. For a double-byte character, the 16-bit value is formed by taking the lead byte for the high 8 bits and the trail byte for the low 8 bits.

Immediately after the 20h-byte bit string is a table very much like the one used for conversion from Unicode. Each range definition in the table has the form:

Offset Size Description
00h word offset from start of table to definition for a lower range; or zero
02h word offset from start of table to definition for a higher range; or zero
04h word first character in range
06h word last character in range
08h dword offset from start of table to look-up array, times two

Offsets are measured from the first range definition, not from the bit string. The look-up array for each range is an array of Unicode characters (that is, of words).

If a 16-bit character value is not covered by one of the range definitions in this table, then the corresponding OEM or ANSI character has no conversion to Unicode. If an OEM or ANSI character other than the single-byte character 5Fh maps to the Unicode character 005Fh, then it is also treated as having no conversion to Unicode.

Case Mapping

The conversion of Unicode characters to upper case uses two tables. First, the Unicode character is used as an offset into an array of bytes. The byte that results from this first look-up is in turn an offset into a second table, called the delta table. The second table is an array of words. The word that results from this second look-up is a value to subtract from the given Unicode character in order to convert that character to upper case.

The IFSMGR itself provides the contents of both the tables that are used in converting Unicode characters to upper case. The delta table is hard-coded. For the first look-up table, the IFSMGR may use a hard-coded default version or it may load one of two alternative tables (loading from copies that are hard-coded in the initialisation segment). These alternatives provide for different capitalisations of certain Greek characters that would be capitalised trivially under the default table. One variant is adopted if a Turkish code page is used, that is, if either the OEM code page is 857 or the ANSI code page is 1254. Failing that, a second variant is chosen if any of the Greek code pages are used, that is, if the OEM code page is 737 or 869 or the ANSI code page is 1253.

TThe IFSMGR tables for upper-case mapping are 0587h bytes long. As far as the IFSMGR is concerned, Unicode characters must be ≤ 0586h if they are to convert to upper case non-trivially.

Reference: IFSMGR Services

The notes that follow assume some entry conditions and exit states. In particular, all services are expected to be called with DS and ES selecting a ring 0 data segment with read and write access to all of the 32-bit address space. All services are assumed to corrupt the CPU status flags, except where noted explicitly. By contrast, changes to CPU control flags (such as the direction flag) are noted.

Take care to understand that the notes presented here describe implementations as presently understood by the author of this document. These notes do not describe interfaces. For details of the interfaces, refer to Microsoft. Never depend on implementation details that are not assured by the interface documentation—and never depend on an assurance given in interface documentation if the implementation behaves differently.

IFSMGR Service 0040h: UniToBCS

Input on stack (bottom to top, left to right, as C call):

dword address of output buffer
dword address of sequence of Unicode characters given as input for conversion
dword size of input, measured in bytes
dword size of output buffer, measured in bytes
dword flags:
00000000h (BCS_WANSI) convert to Windows ANSI characters
00000001h (BCS_OEM) convert to OEM characters

Output:

eax size of output, measured in bytes
edx flags:
00000001h (MAP_FLAG_LOSS) one or more Unicode characters in input do not convert to base character set
ecx corrupt

The UniToBCS service takes as input a sequence of Unicode characters and generates a corresponding sequence of characters in the designated base character set. A choice is offered between ANSI and OEM for the base character set, but for each, the code page is implied, being whatever code page the IFSMGR has conversion tables for.

At most, the input consists of as many whole Unicode characters as fit within the given number of bytes. (Thus, in effect, an odd input length is rounded down to a multiple of two.) The input sequence need not be terminated by a null character, but if the input sequence contains a null character, then the input is deemed to terminate with the character immediately before the null character. In no cases is the output given a null terminator.

The service stops converting if the output buffer fills. If a double-byte character is to be stored as output when space for only one byte remains, then the output buffer is treated as being already full. Contrary to the documentation, no flag is returned in edx to indicate exhaustion of the output buffer before the whole input sequence could be converted.

Some Unicode characters in the input string might not convert acceptably to the base character set. This applies to:

The presence of a character in any of these classes is not fatal to conversion of the string. Instead, the character is represented as an underscore in the output, or by a pair of underscores if the character is in a range of characters that map to double-byte characters. The service continues with the conversion of subsequent characters, but indicates the problem by setting the MAP_FLAG_LOSS bit in edx when returning.

Once a character has been read from the input string into a register, the generation of the corresponding output does not depend on being able to retrieve that character from the input or any character from earlier in the input. Moreover, each character of input is a word and can generate no more than two bytes of output. If the one address is given for both the input sequence and the output buffer, then at no stage in the conversion can the output ever overwrite input that is still needed: the service will work correctly. Note however that documentation states explicitly that the buffers for input and output must not overlap.

The service’s code lies in a pageable segment. Also, the service consults tables that are either in pageable segments or in heap space that may be pageable if the system does not use DOS for paging. Note however that while a level 1 lock is applied on any volume and also on receipt of the Sys_VM_Terminate control, the IFSMGR locks all pages that contain memory used for the conversion tables or for the service’s code. Curiously, the pointers that the service uses to find these tables are not guaranteed to get locked in these cases (though they usually will get locked, due to their very close proximity to the delta table for converting Unicode characters to upper case).

The direction flag is assumed to be clear on entry. The flags that specify the character set to use for output are not checked for validity.

IFSMGR Service 0041h: UniToBCSPath

Input on stack (bottom to top, left to right, as C call):

dword address of output buffer
dword address of null-terminated sequence of PathElement structures
dword size of output buffer, measured in bytes
dword flags:
00000000h (BCS_WANSI) convert to Windows ANSI characters
00000001h (BCS_OEM) convert to OEM characters

Output:

eax size of output, measured in bytes
edx flags:
00000001h (MAP_FLAG_LOSS) one or more Unicode characters in input do not convert to base character set
ecx corrupt

The UniToBCSPath service takes as input a sequence of PathElement structures and generates a string that represents the corresponding path using characters from the designated base character set. A choice is offered between ANSI and OEM for the base character set, but for each, the code page is implied, being whatever code page the IFSMGR has conversion tables for.

The sequence of PathElement structures given as input is deemed to terminate with a null word, or equivalently, with a trivial PathElement (that is, one that gives its length as zero). Since the documentation describes the input as beginning with the pp_elements field of a ParsedPath, there is an implication that ParsedPath structures constructed by IFSMGR are sure to be followed by a null word.

For each non-trivial PathElement in the input sequence, the service produces as output one backslash character (the single-byte character 5Ch, whatever the code page) followed by a conversion of the Unicode characters in the path element. The service terminates the output with a null byte if space remains in the output buffer.

The service stops converting if the output buffer fills. If a double-byte character is to be stored as output when space for only one byte remains, then the output buffer is treated as being already full. Contrary to the documentation, no flag is returned in edx to indicate exhaustion of the output buffer before the whole input sequence could be converted.

Some Unicode characters in the input string might not convert acceptably to the base character set. This applies to:

The presence of a character in any of these classes is not fatal to conversion of the string. Instead, the character is represented as an underscore in the output, or by a pair of underscores if the character is in a range of characters that map to double-byte characters. The service continues with the conversion of subsequent characters, but indicates the problem by setting the MAP_FLAG_LOSS bit in edx when returning.

Once a character has been read from the input string into a register, the generation of the corresponding output does not depend on being able to retrieve that character from the input or any character from earlier in the input. Moreover, each character of input is a word and can generate no more than two bytes of output. If the one address is given for both the input sequence and the output buffer, then at no stage in the conversion can the output ever overwrite input that is still needed: the service will work correctly. Note however that documentation states explicitly that the buffers for input and output must not overlap.

The service’s code lies in a pageable segment. Also, the service consults tables that are either in pageable segments or in heap space that may be pageable if the system does not use DOS for paging. Note however that while a level 1 lock is applied on any volume and also on receipt of the Sys_VM_Terminate control, the IFSMGR locks all pages that contain memory used for the conversion tables or for the service’s code. Curiously, the pointers that the service uses to find these tables are not guaranteed to get locked in these cases (though they usually will get locked, due to their very close proximity to the delta table for converting Unicode characters to upper case).

The direction flag is assumed to be clear on entry. The flags that specify the character set to use for output are not checked for validity.

A PathElement is taken to consist of one word followed by as many whole Unicode characters as fit in the number of bytes obtained by subtracting two from that first word. An odd value in that first word of a PathElement is effectively rounded down to a multiple of two, except that a value of one would induce the service to consider the next FFFFFFFFh bytes as Unicode characters for the path element.

The service takes for granted that the output buffer is non-trivial: to give zero as the output buffer’s size is effectively to give one as the output buffer’s size.

IFSMGR Service 0042h: BCSToUni

Input on stack (bottom to top, left to right, as C call):

dword address of output buffer
dword address of sequence of characters in base character set given as input for conversion
dword size of input, measured in bytes
dword flags:
00000000h (BCS_WANSI) input consists of Windows ANSI characters
00000001h (BCS_OEM) input consists of OEM characters

Output:

eax size of output, measured in bytes
edx flags:
00000001h (MAP_FLAG_LOSS) one or more base characters in input do not convert to Unicode
00000002h (MAP_FLAG_TRUNCATE) input sequence incomplete
ecx corrupt

The BCSToUni service takes as input a sequence of characters in the designated base character set and generates a corresponding sequence of Unicode characters. A choice is offered between ANSI and OEM for the base character set, but for each, the code page is implied, being whatever code page the IFSMGR has conversion tables for.

The service converts however many base characters lie in the given number of bytes at the address given for input. Each base character may be one byte or two. If the last byte of input is the lead byte of a double-byte character, the service terminates the output with the invalid Unicode character FFFDh and returns with the MAP_FLAG_TRUNCATE bit set in edx.

The output buffer is not bounded explicitly by the caller. The service assumes that the output buffer is sufficiently large to hold one Unicode character for each character of input.

Some ANSI or OEM characters in the input string might not convert acceptably to Unicode characters. If the conversion table does not map a given base character to Unicode, then the service represents the character as FFFDh in the output. If a base character maps to the underscore (namely, Unicode character 005Fh) but is not originally the underscore (that is, the single-byte character 5Fh, whatever the code page), then it is treated as invalid but the mapping to Unicode character 005Fh is respected in the output. In both cases, the service continues with the conversion of subsequent characters, but indicates the problem by setting the MAP_FLAG_LOSS bit in edx when returning.

If the one address is given for both the input sequence and the output buffer, then because the service can generate more output than input, it may overwrite input that is still needed: the service will typically not behave correctly. Documentation states explicitly that the buffers for input and output must not overlap.

The service’s code lies in a pageable segment. Also, the service consults tables that are either in pageable segments or in heap space that may be pageable if the system does not use DOS for paging. Note however that while a level 1 lock is applied on any volume and also on receipt of the Sys_VM_Terminate control, the IFSMGR locks all pages that contain memory used for the conversion tables or for the service’s code. Curiously, the pointers that the service uses to find these tables are not guaranteed to get locked in these cases (though they usually will get locked, due to their very close proximity to the delta table for converting Unicode characters to upper case).

The direction flag is assumed to be clear on entry. The flags that specify the character set to use for output are not checked for validity.

IFSMGR Service 0043h: UniToUpper

Input on stack (bottom to top, left to right, as C call):

dword address of output buffer
dword address of sequence of Unicode characters given as input for conversion
dword size of input, measured in bytes

Output:

eax size of output, measured in bytes
ecx corrupt
edx corrupt

The UniToUpper service takes as input a given number of Unicode characters and generates as output a corresponding sequence in upper case.

The input consists of as many whole Unicode characters as fit within the given number of bytes. (Thus, in effect, an odd input length is rounded down to a multiple of two.)

The output buffer is not bounded explicitly by the caller. The service assumes that the output buffer is sufficiently large to hold one Unicode character for each Unicode character taken as input. Documentation states explicitly that the one address may be given for both the input sequence and the output buffer.

As far as this service is concerned, only characters ≤ 0586h are capable of mapping to upper case non-trivially.

The service’s code lies in a pageable segment. Also, the service consults tables that are either in pageable segments or in heap space that may be pageable if the system does not use DOS for paging. Note however that while a level 1 lock is applied on any volume and also on receipt of the Sys_VM_Terminate control, the IFSMGR locks all pages that contain memory used for the conversion tables or for the service’s code. Curiously, the pointers that the service uses to find these tables are not guaranteed to get locked in these cases (though they usually will get locked, due to their very close proximity to the delta table for converting Unicode characters to upper case).

The direction flag is assumed to be clear on entry.

IFSMGR Service 0044h: UniCharToOEM

Input on stack (as C call):

dword Unicode character

Output:

eax OEM character
edx corrupt

The UniCharToOEM service maps the given Unicode character to the OEM character set. The code page is implied, being whatever code page that the IFSMGR has conversion tables for.

If ah is returned as zero, then the given Unicode character maps to the single-byte character whose value is in al. If ah is returned as non-zero, then the given Unicode character corresponds to a double-byte character in the OEM set: ah gives the lead byte and al the trail byte.

The service returns 0000005Fh for any Unicode character that does not map to the OEM character set. Some Unicode characters map to the single-byte OEM character 5Fh—and the service also returns 0000005Fh for these. The string conversion services UniToBCS and UniToBCSPath treat as invalid any Unicode character other than the underscore itself (Unicode character 005Fh) that does not map to the OEM character set or which maps to either the single-byte character 5Fh or the double-byte character 5F5Fh. The return of either 0000005Fh or 00005F5Fh should therefore be regarded as a conversion failure except when the Unicode character given as input is the underscore.

The service’s code lies in a pageable segment. Also, the service consults tables that are either in pageable segments or in heap space that may be pageable if the system does not use DOS for paging. Note however that while a level 1 lock is applied on any volume and also on receipt of the Sys_VM_Terminate control, the IFSMGR locks all pages that contain memory used for the conversion tables or for the service’s code. Curiously, the pointers that the service uses to find these tables are not guaranteed to get locked in these cases (though they usually will get locked, due to their very close proximity to the delta table for converting Unicode characters to upper case).

IFSMGR Service 0051h: IFSMgr_GetConversionTablePtrs

Output:

eax address of structure (see below for format)

The IFSMgr_GetConversionTablePtrs service returns the address of a structure in which the IFSMGR keeps pointers to tables that the IFSMGR uses in various character conversions. The format of the structure is:

Offset Size Description
00h dword number of pointers that follow
04h dword address of table for conversion from Unicode to ANSI
08h dword address of table for conversion from Unicode to OEM
0Ch dword address of table for conversion from ANSI to Unicode
10h dword address of table for conversion from OEM to Unicode
14h dword address of delta table for conversion of Unicode to upper case
18h dword address of first look-up table for conversion of Unicode to upper case

The structure whose address is returned by this service is the one that the IFSMGR uses to locate the right table when converting between character sets or mapping to upper case. Note however that the IFSMGR has a second structure in which it keeps the addresses (and sizes) of these conversion tables. The IFSMGR uses this other list of addresses if ever it wants to lock the pages that hold the conversion tables.

IFSMGR Service 0070h: BcsToBcs

Input on stack (bottom to top, left to right, as C call):

dword address of output buffer
dword address of input string
dword flags for character set of output:
00000000h (BCS_WANSI) convert to Windows ANSI characters
00000001h (BCS_OEM) convert to OEM characters
dword flags for character set of input:
00000000h (BCS_WANSI) input consists of Windows ANSI characters
00000001h (BCS_OEM) input consists of OEM characters
dword size of input buffer, measured in bytes

Output:

eax size of output, measured in bytes
edx flags:
00000001h (MAP_FLAG_LOSS) one or more characters do not convert between base character sets
00000002h (MAP_FLAG_TRUNCATE) input string incomplete
ecx corrupt

The BcsToBcs service takes as input a string of characters in one base character set and generates as output a corresponding string in a possibly different base character set. A choice is offered between ANSI and OEM for the base character sets, but for each, the code page is implied, being whatever code page the IFSMGR has conversion tables for.

The service converts between base character sets by converting to Unicode and then from Unicode, character by character. In this, its effect is much like combining the BCSToUni and UniToBCS services without needing to provide an intermediate buffer.

The string given as input terminates with a null character. However, the input buffer is also given an explicit bound: if no null character is found in the given number of bytes at the address given for input, then the service returns with the MAP_FLAG_TRUNCATE bit set in edx. If a null byte is found to terminate the input, then the service terminates the output with a null byte.

The output buffer is not bounded explicitly by the caller. The service assumes that the output buffer is sufficiently large to hold however many characters are generated as output. A weak upper limit is that for an input string consisting entirely of single-byte characters that all map to double-byte characters in the other set, the service may generate as output twice the number of bytes it takes as input.

Some characters taken as input might not map to Unicode at all, and some characters that do map to Unicode might not map further to the base character set specified for output. The service represents each such character as an underscore in the output. The service continues with the conversion of subsequent characters, but indicates the problem by setting the MAP_FLAG_LOSS bit in edx when returning.

The MAP_FLAG_LOSS bit should not be thought reliable for this service. A base character that maps to an underscore (that is, to the Unicode character 005Fh) would induce the BCSToUni service to set the MAP_FLAG_LOSS bit in edx when returning. A Unicode character that maps to an underscore (namely, the single-byte character 5Fh) or to a pair of underscores (taken as a double-byte character) would induce the UniToBCS service to set the MAP_FLAG_LOSS bit in edx when returning. This service however, does not notice these cases should they occur in the intermediate mappings to and from Unicode.

If the last byte of input is the lead byte of a double-byte character, the service treats the character as one that cannot be converted faithfully. The service represents the character as an underscore in the output and the service returns with both the MAP_FLAG_LOSS and MAP_FLAG_TRUNCATE bits set in edx.

If the one address is given for both the input sequence and the output buffer, then because the service can generate more output than input, it may overwrite input that is still needed: the service will typically not behave correctly.

The service’s code lies in a pageable segment. Also, the service consults tables that are either in pageable segments or in heap space that may be pageable if the system does not use DOS for paging. Note however that while a level 1 lock is applied on any volume and also on receipt of the Sys_VM_Terminate control, the IFSMGR locks all pages that contain memory used for the conversion tables or for the service’s code. Curiously, the pointers that the service uses to find these tables are not guaranteed to get locked in these cases (though they usually will get locked, due to their very close proximity to the delta table for converting Unicode characters to upper case).

The direction flag is assumed to be clear on entry.

IFSMGR Service 0074h: BcsToBcsUpper

Input on stack (bottom to top, left to right, as C call):

dword address of output buffer
dword address of input string
dword flags for character set of output:
00000000h (BCS_WANSI) convert to Windows ANSI characters
00000001h (BCS_OEM) convert to OEM characters
dword flags for character set of input:
00000000h (BCS_WANSI) input consists of Windows ANSI characters
00000001h (BCS_OEM) input consists of OEM characters
dword size of input buffer, measured in bytes

Output:

eax size of output, measured in bytes
edx flags:
00000001h (MAP_FLAG_LOSS) one or more characters do not convert between base character sets
00000002h (MAP_FLAG_TRUNCATE) input string incomplete
ecx corrupt

The BcsToBcsUpper service takes as input a string of characters in one base character set and generates as output a corresponding string in a possibly different base character set, having also mapped the characters to upper case. A choice is offered between ANSI and OEM for the base character sets, but for each, the code page is implied, being whatever code page the IFSMGR has conversion tables for.

The service converts between base character sets by converting to Unicode, then to upper case, and then from Unicode, character by character. In this, its effect is much like combining the BCSToUni, UniToUpper and UniToBCS services without needing to provide an intermediate buffer.

The string given as input terminates with a null character. However, the input buffer is also given an explicit bound: if no null character is found in the given number of bytes at the address given for input, then the service returns with the MAP_FLAG_TRUNCATE bit set in edx. If a null byte is found to terminate the input, then the service terminates the output with a null byte.

The output buffer is not bounded explicitly by the caller. The service assumes that the output buffer is sufficiently large to hold however many characters are generated as output. A weak upper limit is that for an input string consisting entirely of single-byte characters that all map to double-byte characters in the other set, the service may generate as output twice the number of bytes it takes as input.

Some characters taken as input might not map to Unicode at all, and some characters that do map to Unicode might not map further to the base character set specified for output. The service represents each such character as an underscore in the output. The service continues with the conversion of subsequent characters, but indicates the problem by setting the MAP_FLAG_LOSS bit in edx when returning.

The MAP_FLAG_LOSS bit should not be thought reliable for this service. A base character that maps to an underscore (that is, to the Unicode character 005Fh) would induce the BCSToUni service to set the MAP_FLAG_LOSS bit in edx when returning. A Unicode character that maps to an underscore (namely, the single-byte character 5Fh) or to a pair of underscores (taken as a double-byte character) would induce the UniToBCS service to set the MAP_FLAG_LOSS bit in edx when returning. This service however, does not notice these cases should they occur in the intermediate mappings to and from Unicode.

If the last byte of input is the lead byte of a double-byte character, the service treats the character as one that cannot be converted faithfully. The service represents the character as an underscore in the output and the service returns with both the MAP_FLAG_LOSS and MAP_FLAG_TRUNCATE bits set in edx.

If the one address is given for both the input sequence and the output buffer, then because the service can generate more output than input, it may overwrite input that is still needed: the service will typically not behave correctly.

The service’s code lies in a pageable segment. Also, the service consults tables that are either in pageable segments or in heap space that may be pageable if the system does not use DOS for paging. Note however that while a level 1 lock is applied on any volume and also on receipt of the Sys_VM_Terminate control, the IFSMGR locks all pages that contain memory used for the conversion tables or for the service’s code. Curiously, the pointers that the service uses to find these tables are not guaranteed to get locked in these cases (though they usually will get locked, due to their very close proximity to the delta table for converting Unicode characters to upper case).

The direction flag is assumed to be clear on entry.