|
January 26, 2000 Unicode Tips: January 2000
Yehuda Shiran, Ph.D.
|
|
The Unicode standard is a fixed-width uniform encoding scheme. Its target usage is for interchange and display of many different languages, as well as historic scripts, technical and mathematical symbols, and multilingual texts. The Unicode standard specifies the identity of the character and its numeric value. The 16-bit numeric value is defined by a hexadecimal number and a prefix \u (backslash followed by a lowercase u). The Unicode value \u0041, for example, represents the character A. The Unicode unique name for this character is LATIN CAPITAL LETTER A.
Unicode is compatible with ASCII characters. The first 128 Unicode characters correspond to the ASCII characters and have the same numeric value. ASCII's
The calculator below accepts a Unified code value (just the four hexadecimal characters, no
Here are some common special characters and their Unicode value:
You can play with our Unicode calculator above and find many Unicode values that yield unexpected characters. Although Unicode can support more than 65,000 different characters, it is up to your browser to provide the Unicode fonts. Often, Unicode fonts do not display all the Unicode characters. In addition to the client's (browser's) support, the client platform must support Unicode as well. Some platforms, such as Windows 95, provide only partial support for Unicode. The other problem with Unicode is how to enter Non-ASCII characters. Often, the only way to specify Unicode characters is by using Unicode escape sequences as shown in the table above. Unicode specification, though, requires that composite characters must be specified by a sequence of Unicode characters led by the base one. Many French characters, for example, are built on top of the Latin character set with additional hyphens, carets, apostrophes, etc. The Unicode specification requires that such characters must be specified by the Latin character, followed by the apostrophes' (for example) Unicode value. The JavaScript implementation, like other ones, does not support this option. No combining sequences are interpreted by JavaScript. A Unicode escape sequence for each French character is used instead. Unicode support was introduced in JavaScript 1.3. Learn more about the features of JavaScript 1.3 in Column 25, JavaScript 1.3 Overview, Part I, and Column 26, JavaScript 1.3 Overview, Part II.
People who read this tip also read these tips: Look for similar tips by subject: |