Geoff Chappell - Software Analyst
This function gets the DOS version and related identifiers.
The function is implemented in version 2.00 and higher.
The source code for version 2.11 that Microsoft published in 2014 at the Computer History Museum and in 2018 on GitHub tells something of the function’s development in apparently unreleased builds of version 1. According to a comment in GETSET.ASM, the function was introduced for a version 1.28. A comment in MSHEAD.ASM suggests the function number was first brought into use for a version 1.27 but with “Assign CON AUX LIST” as its very different purpose.
The function uses registers for both input and output.
ah | 30h |
al | 00h to query OEM number; 01h to query for flags |
That querying the OEM number is selected by zero in al is only what Microsoft documents: the implementation accepts anything other than 01h.
Earlier versions do not interpret al on input:
ah | 30h |
Since Microsoft is not known ever to have documented a meaning for al on input before DOS 5.00, older software exists that does not explicitly set al on input and which may therefore get flags for what they expected would be an OEM number.
ax | version number; major version in low byte |
bl:cx | user serial number |
bh | OEM number; or, depending on input in version 5.00 and higher, flags |
Before version 4.00, the version number is returned from a corresponding word in the kernel’s data. The published source code for version 2.11 names this word as MSVERS, defined as two bytes named MSMAJOR and MSMINOR. No callable interface is known for changing this variable, which is in effect the kernel’s hard-coded version number.
Starting with version 4.00, the version number reported by this function can be just about anything. The scheme for version 4.00 in particular may fairly be described as bizarre. In version 5.00 and higher, the function reliably returns the version number from the word at offset 40h in the current Program Segment Prefix (PSP).
All known implementations return bx:cx from four consecutive bytes in the kernel’s data. The published source code for version 2.11 names these as USERNUM, defined as a word with an unlabelled byte to follow, and then OEMNUM for the last byte. No callable interface is known for changing any of these four bytes. For the last to be meaningful as distinguishing one OEM build from any other of the same version, it might be set by the OEM’s editing of a source file, but this would be impractical for setting the low 24 bits as some sort of serial number, its point being presumably that it varies from copy to copy. In the published source code, a file named DOSPATCH.TXT has a paragraph that directs OEMs to patch the binary. History shows that many OEMs didn’t bother patching in an OEM number. Whether any OEM ever patched different serial numbers into different copies, I have no idea.
Starting with version 5.00, the function can be called as if it has two subfunctions: 3000h to return the OEM number in bh, but 3001h to return flags instead. These flags look to be a hard-coded description of different types of build of what is otherwise the same kernel. Notably, the kernel can be built slightly differently so that it can be loaded from ROM and its code segment can execute in ROM (all data within it being read-only). For such a build, the flags in bh have the 08h bit set.
Early versions do not implement int 21h function 30h and thus return zero in al:
al | 00h |
This behaviour for all unimplemented int 21h functions is conveniently consistent with indicating a major version that is less than any that ever existed (assuming a start at version 1.0).
Whatever may have motivated the configurability that PC DOS 4.00 introduced to what this function returns as the DOS version number, it was done, it’s a fact of history, and it can’t be avoided. In version 5.00 and higher, callers who want a DOS version number that’s hard-coded into the kernel can instead call int 21h function 3306h.
In version 4.00, the version number that int 21h function 30h returns in ax ordinarily does come from the unchanging internal variable MSVERS, as with earlier versions, but it can instead come from a different variable that can have been changed and can keep changing. Moreover, whether the function returns this fake version number or the true version can change too, including to depend on how often the function is called.
The function does still default to returning the true version. It may never learn of a fake version to return. If a fake version was known but is now cleared, the function returns the true version. (The function cannot be made to return version zero.) The kernel learns of a fake version in two ways. One is when loading a program or overlay. If the filename is in a table (see below), then the kernel adopts the corresponding fake version and a duration for its use:
The other way of learning a fake version number is through int 2Fh function 122Fh. Curiously, this sets the fake version but cannot change the duration. The only known use of this is by the system initialisation in IO.SYS to clear the fake version after loading device drivers (which, remember, are overlays).
A table of filenames, fake versions and durations is hard-coded at the end of the kernel, immediately after a null-terminated string “ADD SPECIAL ENTRIES” which is here thought to be some sort of marker for finding the table in the kernel as a file:
File Name | Version | Duration |
---|---|---|
IBMCACHE.COM | 3.40 | until next process termination |
IBMCACHE.SYS | 3.40 | until next process termination |
DXMA0MOD.SYS | 3.40 | until next process termination |
WIN200.BIN | 3.40 | until next process termination |
PSCPG.COM | 3.40 | until next process termination |
DCJSS02.EXE | 3.40 | until next process termination |
ISAM.EXE | 3.40 | until next process termination |
ISAM2.EXE | 3.40 | until next process termination |
DFIA0MOD.SYS | 3.40 | until next process termination |
Yes, all are set to show the same fake version, but they needn’t have been. This configuration is only what’s hard-coded into known releases of MS-DOS and PC DOS that report as version 4.00. Whether Microsoft or IBM ever provided a tool for reconfiguring the fake version, I don’t know, but certainly the feature’s design allows for reconfiguration.
The kernel’s initialisation moves the table of fake version numbers from the end of the kernel as loaded to the end of what memory the kernel retains. It then keeps the address in the SYSINITVAR structure that the kernel exposes through int 21h function 52h. Specifically, the table is pointed to by the word at offset 35h. See that this is only a near pointer. Most pointers in this structure are far pointers because the structures and tables that they point to are built by the system initialisation in IO.SYS after interpreting the CONFIG.SYS file for configurable settings. That the table of fake version numbers is addressed relative to the kernel’s code and data limits its configurability after the kernel’s initialisation. The only gain from keeping its address in this structure, in contrast to an internal variable, is that it can be consulted by an external tool but not changed for immediate effect.
More likely perhaps is that the table is intended to be changed, if at all, either by an OEM when building the kernel (including to patch it after assembling and linking) or by some tool that patches the table at the end of the kernel as a file (to take effect when DOS next starts). This is here thought to be the reason that the table is introduced by what looks to be an identifying string.
Whether the table is in memory for active use or is still in the file to be inspected or patched, it is a sequence of variable-size entries, each of the form:
Offset | Size | Description |
---|---|---|
00h | byte | number of characters, cch, in name
that follows; else zero to end the table |
01h | cch | name of program or overlay |
cch + 01h | word | fake version number, major version in low byte; but zero to stand for true version |
cch + 03h | byte | count of queries to be answered with fake version; but FFh to use fake version until next process termination; 00h to use true version |
Whether the scheme is configurable or not, it’s defective even for a single-tasking operating system that only ever runs on single-processor computers. Because the countdown is not stopped by process termination, a fake version that’s picked up for one process can apply also to arbitrary other processes just in their ordinary execution. It is unsafe to specify except for a program that is known to call the function exactly this number of times in all the possible circumstances of its execution. Perhaps the best that can be said is that the configuration as hard-coded into the kernel, which is perhaps the only configuration in real-world use, does not specify any countdown.
Even the specification of a fake version to last only until the next process termination risks making nonsense of int 21h function 30h unless this function is only ever queried in ordinary execution in contrast to intercepted and interrupted execution. In practice, of course, programs that intercept or interrupt the execution of other programs plausibly always do confine their use of int 21h function 30h to their own initialisation, and so perhaps nothing was ever seen to go very wrong.
Whether anything was seen to go wrong in version 4.00, the next version reworked the design in several ways.
As noted above, version 5.00 added int 21h function 3306h which the kernel implements by returning the hard-coded true version—not from a variable but from immediate data in the code that handles the function. The choice of this function number may or may not be significant. Returning the true DOS version might have been built into DOS as a new function or into any old function that already has subfunctions for returning simple information—37h would seem as good a candidate as 33h—but as a subfunction of function 33h specifically, it can be called at essentially any time in any situation, including from code that intercepts or interrupts other execution.
The most notable change for int 21h function 30h is that if a fake version is determined for a program or for an overlay (though now specialised to the loading of device drivers), then instead of remembering the fake version as kernel data and trying to judge how long to keep reporting it, the fake version is placed in the current Program Segment Prefix (PSP) as per-process data. It is held specifically at offset 40h. If there is no fake version for the process, then this word at offset 40h in the PSP is defaulted to the true DOS version. In version 5.00 and higher, the version number that int 21h function 30h returns in ax is always this word from offset 40h in the current PSP. It couldn’t be simpler.
Or could it? There may have been a bit of a bother over where exactly to place this version number in the PSP. Roughly, the defined members near the start of the PSP had been growing organically. Version 2.00 had extended the defined members to a far pointer at offset 2Eh. Version 3.00 added a word and a far pointer at offsets 32h and 34h. Version 3.10 added a far pointer at offset 38h. According to the Interrupt List, the byte at offset 3Ch is managed by int 21h function 6301h in version 4.00 or even some 3.xx, in editions that have Double Byte Character Set (DBCS) support, not that I have any to check. The word at offset 3Dh is used by Microsoft’s APPEND.EXE tool in version 4.00 and 5.00, if not also higher. Only the lowest bit is used and it’s not impossible that what’s defined for APPEND is just one byte and that accessing the whole word is a side-effect of some macro. Development for version 5.00 thus looks to have started with either offset 3Eh or 3Fh as next in line for definition. Yet the per-process version number was placed at offset 40h. It could be that some other prior definition by Microsoft is not yet known. Perhaps skipping offset 3Fh signifies nothing more than the neatness of word alignment for a word. But there is a plausible alternative explanation.
As related in Undocumented DOS, Second Edition, ISBN 0-201-63287-X, by Andrew Schulman and others for Addison-Wesley in 1994, on pages 199 to 204, the NETX.EXE program from Novell Netware treats the two bytes at offset 3Eh as its own. At least one of the book’s authors and presumably also a programmer at Novell thought “this is all fine and dandy”, not just for claiming seemingly free space in a system structure but even for writing to it while intercepting other software’s use of the re-entrant int 21h function 50h that is plainly designed to be free of side-effects. The book speculates that the word at offset 3Eh “was apparently reserved for Novell by Microsoft” (the book’s emphasis), this being inferred from Microsoft’s name PDB_Novell_Used and a corresponding comment, as known from an OEM Adaptation Kit (OAK). A different interpretation is that “Used” is Microsoft recording the fact of Novell’s use, not a reservation by Microsoft. However it came about, it was done, and the per-process version could not safely have been defined any lower than offset 40h.
More visible to users was the configurability’s elevation to a documented tool, named SETVER.EXE, that was supplied in the DOS package to load as a device driver with a device statement in CONFIG.SYS and to run as a program with command-line options for inspecting or editing the version table. Underneath, the implementation is not so very much different from version 4.00. Instead of having the table in the kernel, perhaps to be edited with a separate tool, version 5.00 has the table in the separate tool and makes a point of giving this tool to users. Running SETVER.EXE to edit the version table rewrites SETVER.EXE. Within SETVER.EXE, the table has the same form as in the kernel for version 4.00 except that each entry is one byte smaller for not specifying a count.
What’s coded into the SETVER.EXE for MS-DOS 5.00 on the installation discs is:
File Name | Version |
---|---|
WIN200.BIN | 3.40 |
WIN100.BIN | 3.40 |
WINWORD.EXE | 4.10 |
EXCEL.EXE | 4.10 |
HITACHI.SYS | 4.00 |
MSCDEX.EXE | 4.00 |
REDIR4.EXE | 4.00 |
NET.EXE | 4.00 |
NET.COM | 3.30 |
NETWKSTA.EXE | 4.00 |
DXMA0MOD.SYS | 3.30 |
BAN.EXE | 4.00 |
BAN.COM | 4.00 |
MSREDIR.EXE | 4.00 |
METRO.EXE | 3.31 |
IBMCACHE.SYS | 3.40 |
REDIR40.EXE | 4.00 |
DD.EXE | 4.01 |
DD.BIN | 4.01 |
LL3.EXE | 4.01 |
REDIR.EXE | 4.00 |
SYQ55.EXE | 4.00 |
SSTDRIVE.SYS | 4.00 |
ZDRV.SYS | 4.01 |
ZFMT.SYS | 4.01 |
TOPSRDR.EXE | 4.00 |
Whatever’s configured for the fake versions, the kernel learns of it when SETVER.EXE executes as a device driver. Wherever SETVER gets loaded is where its table remains in memory. SETVER itself sets the address into the SYSINITVAR structure that the kernel exposes through int 21h function 52h. Now, however, the address is kept as a far pointer at offset 37h, not a near pointer at offset 35h.
Depending on the version, an Original Equipment Manufacturer (OEM) who supplied MS-DOS to use with the computer equipment had more or less freedom to modify this MS-DOS. Most obviously, the OEM could re-badge MS-DOS as the OEM’s own. Famously, the first version was only ever distributed by IBM, not as MS-DOS but as IBM Personal Computer DOS, widely called PC DOS. This rebadging extends to the system files too, which are IBMBIO.COM and IBMDOS.COM instead of the generic IO.SYS and MSDOS.SYS. These IBM names were used by other OEMs. Some rebadging kept the MS-DOS name for the product but invented yet more names for the system files, as with Toshiba’s TBIOS.SYS and TDOS.SYS.
Less obviously to modern eyes, early MS-DOS versions needed that IO.SYS could be specially written for computer hardware that was not PC-compatible. Above this hardware-specific layer, the kernel ideally needs little or no change, but adaptation here and even at higher levels certainly did happen, at least for branding and even for adding value in the form of utility programs. Each OEM got source code to the DOS kernel to assemble and link. How much this provision of source code licensed an OEM to edit the source code before building the kernel is unclear. IBM was a special case from the start: the kernel in PC DOS 1.00 has code that is specific to the IBM PC. For other OEMs, the published source code for early versions confirms that Microsoft provided for some variation through macros for conditional assembly, but some OEMs plainly did go further and edit the source code, even substantially, as did COMPAQ for its version 2.11.
The plan for distinguishing DOS not just by its version but by which OEM it was built for was that Microsoft would assign to each OEM an 8-bit identifier to stamp into all the OEM’s adaptations of the kernel. This is confirmed by a DOSPATCH.TXT file in the directory of binaries that comes with the published source code for version 2.11:
The user number is 3 bytes starting at debug location 683, The OEM number is one byte at debug location 686. The user number is initialized to 0, the OEM number to -1 and they immediatly follow the Microsoft Copyright message. If these bytes are not zero, look for the four bytes following the Copyright message which should be in the vacinity of 683. OEMs should request an OEM number from Microsoft if they want one of their very own, this prevents selecting one someone else already has.
Put aside that the highly variable line length suggests the text was edited, as if there may be context for historians to find. See that the directions for patching an OEM number into the binary take as their starting point that FFh is the OEM number in the binary as assembled and linked by Microsoft or by the OEM who has source-code access but doesn’t edit it. The published source code for version 2.11 gives OEMs a choice of just 00h or FFh through conditional assembly. With IBM defined as non-zero, assembling GETSET.ASM sets 00h for the OEM number. Otherwise, the OEM number defaults to FFh. Plainly, IBM builds the kernel (or has it built for them) with a non-zero IBM. Every known build of PC DOS has 00h as the OEM number: DOSPATCH.TXT is not for IBM. Everyone else gets FFh unless they edit the source code or build it like IBM or patch the binary.
Among the outcomes is that 00h for the OEM number does not imply PC DOS from IBM. At least some OEMs understood a non-zero IBM as doing double duty to signify that the OEM’s computers are PC-compatible. Kernels from COMPAQ at least as far back as version 2.11 are so built, and thus have 00h as the OEM number.
Though the DOSPATCH.TXT directions quoted above were revealed to the world with source code for version 2.11, they wre written for version 2.00. They match exactly the version 2.00 MSDOS.SYS binary that’s in the same directory. Its USERNUM is at file offset 0583h, which would show in DEBUG as 0683h from loading the kernel at offset 0100h as if to execute a .COM program. That this variable and MSVERS “immediately follow the Microsoft Copyright message” continued to version 2.10 but not to version 2.11, which places both variables immediately in front of the code for handling int 21h function 30h. (Comparison of the binaries shows that this code had only recently been separated from MSCODE.ASM to GETSET.ASM. The variables moved with the code, breaking the relationship with the copyright notice that was left behind.)
What the directions changed to for version 2.11 and beyond is not known with certainty. Among the possibilities is that they didn’t change, no reason having been noticed. That OEMs would then have been left with directions that did not in fact identify the patch site would go some way to explaining why so many OEM builds were left with FFh as their OEM number. Still, that directions for patching did get updated remains likely, if not immediately then eventually. It will have become easier in version 3.10, since the sharing of kernel data with extensions such as SHARE.EXE stabilised the first several KB of kernel data, including USERNUM, which is reliably at offset 0352h through the rest of version 3 and then changes to being just as reliably at offset 03B2h.
For the intervening versions, USERNUM and MSVERS are kept together, and so a default build will have produced a sequence to search for, e.g., 00 00 00 FF 02 0B for version 2.11. That such a search-and-patch was done for version 2.11 is supported by inspection of OEM builds such as can be found as abandon-ware at WinWorld. I haven’t examined exhaustively, but at least six OEM packages have an identical kernel with FFh for the OEM number in a USERNUM at offset 0BEFh. This kernel looks to be the default build, produced from assembling and linking Microsoft’s source code exactly as given, without a non-zero IBM. A few others differ by only one or two bytes, just as would be expected if a handful of OEMs followed directions in an updated DOSPATCH.TXT, for patching not just the OEM number but the so-called template definition too.
Microsoft is not known ever to have published a list of OEM numbers. They have instead been left to circulate as foklore. I don’t mean to add to folklore and neither can I hope for completeness. If nothing else, the Internet’s record of early OEM builds does not make it easy. The retro-computing hobbyists who keep them, circulate them, and even install them for fun, understandably prefer file formats that are well suited to creating disks that are special to the corresponding hardware, which need not be PC-compatible. These disk images, inevitably in a variety of file formats, are something of a chore when all that’s wanted is one file to inspect with a hex editor! Still, an occasional rainy afternoon spent collecting OEM builds of MS-DOS from an abandon-ware site turns up enough examples to be usefully illustrative. Unlike with folklore, the following list of kernels with OEM numbers other than 00h and FFh does at least tell where the numbers came from:
OEM Number | Package | DOS Version | USERNUM Offset | File Date | File Size |
---|---|---|---|---|---|
01h | 2.10 [Wang Professional Computer OEM r1. | 2.01 | 0801h | 22nd December 1983 | 17,521 |
2.10a [Wang Professional Computer OEM r2 | 2.01 | 0801h | 11th November 1984 | 17,521 | |
02h | 2.00 [SCP OEM] [SCP Tarbell S-100] | 2.00 | 0583h | 22nd March 1983 | 20,480 |
05h | 3.10 [Zenith Z-100 PC OEM] | 3.10 | 0352h | 28th May 1985 | 27,808 |
3.20 [Zenith PC OEM] | 3.20 | 0352h | 30th July 1986 | 28,480 | |
3.20 [Zenith Eazy PC OEM] | 3.20 | 0352h | 4th June 1987 | 28,480 | |
3.21 [Zenith Z100 PC] | 3.21 | 0352h | 28th September 1987 | 28,480 | |
3.30 Plus [Zenith Z100 PC] | 3.30 | 0352h | 26th August 1988 | 30,576 | |
4.01 [Zenith OEM] | 4.00 | 03B2h | 11th January 1990 | 37,376 | |
16h | 2.05 [DEC Rainbow OEM] | 2.05 | 05ABh | 28th September 1983 | 17,126 |
2.11 [DEC Rainbow] | 2.11 | 0C7Eh | 5th September 1984 | 17,339 | |
1Eh | 2.11 [NCR Decision Mate V] | 2.11 | 0CD3h | 29th August 1984 | 17,404 |
1Fh | 2.11 [NEC APC r01] | 2.11 | 0B41h | 4th October 1983 | 17,152 |
23h | 2.11 [Olivetti OEM] | 2.11 | 0BEFh | 3rd May 1985 | 17,176 |
3.20 [Olivetti OEM] | 3.20 | 0352h | 22nd December 1986 | 28,480 | |
3.30 [Olivetti OEM] | 3.30 | 0352h | 12th February 1988 | 30,144 | |
25h | 2.11 [ITT XTRA OEM r2.00] | 2.11 | 10DEh | 29th April 1985 | 17,257 |
28h | 2.11 [TI Professional Computer OEM r2.11 | 2.11 | 0B4Bh | 17th November 1983 | 17,012 |
29h | 2.11 [Toshiba OEM R2A] | 2.11 | 0C99h | 17th March 1987 | 17,353 |
2Eh | 3.30 [GRiD OEM] | 3.30 | 0352h | 27th May 1988 | 30,128 |
3Bh | 2.11 [Corona Data Systems OEM] | 2.11 | 0BEFh | 29th May 1984 | 17,176 |
3Ch | 2.11 [DATAVUE OEM] | 2.11 | 0BEFh | 17,176 | |
4Dh | 2.11 [NCR OEM] | 2.11 | 0C59h | 17,282 | |
3.30 [HP OEM] | 3.30 | 0352h | 10th June 1988 | 30,272 | |
4.01 [HP Vectra OEM] | 4.00 | 03B2h | 25th July 1989 | 37,552 |
The DOS Version shown for each kernel is from its implementation of int 21h function 30h. It is well known that some early OEM builds were badged as one version but are really another. It also looks as if all builds that are badged as DOS 4.01 report as 4.00. Possibly less well known is that the report from int 21h function 30h can be wrong. For instance, the Wang kernel that has OEM number 01h was built from source code for version 2.11—it implements int 21h function 58h, which is in no known version 2.10—but it reports as version 2.01, not just through the interface but in its Microsoft copyright notice.
The date for each kernel is from the file’s directory entry in the disk image. Where none is shown, it’s because none looks to have been set, perhaps but not certainly as an oversight in the OEM’s preparation, and the default of 1st January 1980 is better dismissed as meaningless.
The NEC build with OEM number 1Fh is remarkable for having a non-zero user serial number: 010000h.