SKETCH OF HOW RESEARCH MIGHT CONTINUE AND RESULTS BE PRESENTED

Character Types in JScript

Each script is received by JSCRIPT.DLL as an array of Unicode characters. Many of the ASCII-compatible characters, i.e., those with values < 0x0080, each have a highly specific meaning in JScript. Some < 0x0080 and all ≥ 0x0080 are of interest to JScript only for whether they are line terminators or white space or can be used in naming things.

As far as concerns JScript as a language, characters ≥ 0x0080 are classified by reference to categories in the Unicode Standard. The JSCRIPT implementation mostly classifies according to the CT_CTYPE1 flags produced for each character by the Windows API function GetStringType. If this function reports the character as defined but with no other property, then JSCRIPT resorts to its own tables.

Line Terminators

JSCRIPT recognises four line-terminating characters:

White Space

The white-space characters are:

JSCRIPT applies a slightly different notion of white space when scanning for the termination of an HTML comment, but this is not strictly a matter of JScript interpretation, and details are left for elsewhere.

Letters

JScript leaves many characters for naming things. Any character in any of several categories of letter may appear anywhere in an identifier, as may the dollar sign (0x0024) and underline (0x005F). To JSCRIPT, the letters are:

and any character listed below

for which C1_DEFINED is set but all other flags are clear.

Digits and Marks

Some more characters are also left for naming things except that they cannot begin an identifier:

and any of the numerous characters listed below

for which C1_DEFINED is set but all other flags are clear.

Digression on Unicode Support

Script writers who think to use characters ≥ 0x0080 should appreciate that JSCRIPT’s interpretation of these characters is suspect. The algorithms presented above are unchanged from JSCRIPT version 5.6.0.6626 (in the original Windows XP) through to version 5.7.0.6000 (in Windows Vista), but the classification of characters can vary because of the information that JSCRIPT gets from Windows.

It is ordinarily desirable, of course, that Windows software should use the Windows API to get system-wide information about the properties of Unicode characters, rather than depend on its own understanding. In practice, however, those flags reported by GetStringType vary from one Windows version to another. Indeed, since they come ultimately from tables in a file (LOCALE.NLS in the System directory), and it must be supposed that the file is updatable, it may be that the flags vary even within a Windows version.

So, if you really must use some such character as 0x037B (Greek Small Reversed Lunate Sigma Symbol) when naming some variable, you take your chance on behalf of your script’s users. Windows Vista recognises this character as a lower-case letter (with the C1_DEFINED, C1_ALPHA and C1_LOWER flags), but Windows XP doesn’t recognise the character at all.