Relationship to ASCIIThe basis of the IBM PC code pages is ASCII, a 7-bit code representing 128 characters and control codes. In the past, 8-bit extensions to the ASCII code often either set the top bit to zero, or used it as a parity bit in network data transmissions. When this bit was instead made available for representing character data, another 128 characters and control codes could be represented. IBM used this extended range to encode characters used by various languages. No formal standard existed for these ‘extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings. IBM PC (OEM) code pagesThese code pages are most often used under MS-DOS-like operating systems. They include a lot of box drawing characters. Since the original IBM PC code page (number 437) was not really designed for international use, several incompatible variants emerged. Microsoft refers to these as the OEM code pages. Examples include:
In modern applications, operating systems and programming languages, the IBM code pages have been rendered obsolete by newer & better international standards, such as Unicode. Other code pages of noteThe following code page numbers are specific to Microsoft Windows. IBM uses different numbers for these code pages.
Windows (ANSI) code pagesMicrosoft defined a number of code pages known as the ANSI code pages (as the first one, 1252 was based on an apocryphal ANSI draft of what became ISO 8859-1). Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in ISO-8859-1. Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252.
CriticismMany code pages, except Unicode, suffer from several problems.
Due to Unicode's extensive documentation, vast repertoire of characters and stability policy of characters, these problems are rarely a concern. Applications may also mislabel text in Windows-1252 as ISO-8859-1, the default character set for HTML. Fortunately the only difference between these code pages is that the range ISO-8859-1 reserves for control characters, Windows-1252 uses for additional printable characters. Since control characters have no function in HTML, web browsers tend to use Windows-1252 rather than ISO-8859-1. Private code pagesWhen, early in the history of personal computers, users didn't find their character encoding requirements met, private or local code pages were created using Terminate and Stay Resident utilities or by re-programming BIOS EPROMs. In some cases, unofficial code page numbers were invented (e.g., cp895). When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenický or KEYBCS2 encoding for the Czech and Slovak alphabets. Another character set is Iran System encoding standard that was created by Iran System corporation for Persian language support. This standard was in use in Iran in DOS-based programs and after introduction of Microsoft code page 1256 this standard became obsolete. However some Windows and DOS programs using this encoding are still in use and some Windows fonts with this encoding exist. See also
ReferencesExternal links
| | |||||||||||||||||||||||