DOS codepages (and their history)
DOS has supported numerous character sets, also called codepages. This article documents official MS-DOS codepages and also Windows "OEM" codepages and some rare Arabic codepages.
From their very beginning, PCs have run with a 8-bit character set with 256 characters. Compared to 7-bit ASCII, which only offered 95 visible characters, this was a great advantage. PC software could easily support many Latin-based languages. As PCs became widely popular it turned out that 256 characters were not enough. More national characters were needed. Codepages were introduced to DOS in 1987 to meet this need.
This article is intended for computing experts who already know what character sets and codepages are. We look into MS-DOS and PC-DOS codepages, and also the codepages found in Windows command line mode ("DOS box"). We attempt to differentiate between "real" DOS codepages and "DOS-like" codepages. We compare codepages to other codepages. We point out differences in documented and actual behavior. We also document old Arabic codepages, for which no other online documentation existed as of 2014.
Standalone DOS codepages
The DOS operating system originally supported just one character set, or code page. That was the 437 codepage, also known as PC-ASCII. Later on, several alternatives were released as DOS went into widespread international use. That happened by the release of DOS 3.3 in 1987.
The following table summarizes the code pages officially supported by standalone versions of PC-DOS (IBM) and MS-DOS (Microsoft). The information is primarily based on MS-DOS versions from 3.3 to 6.22.
Additional OEM codepages
More codepages do exist. The following Microsoft-documented "OEM" codepages do not appear in any of the standalone PC-DOS or MS-DOS versions reviewed (up to MS-DOS 6.22 from 1994). Most of them seem to be supported by the command prompt under the Windows operating system.
Euro codepages (IBM)
The European Union introduced the euro currency symbol (€), which had consequences to codepages in 1998. Based on the existing DOS codepages, several updated and new codepages were defined. Either the new euro symbol was added to an unused slot in an existing page, or a new page was created where an old symbol was replaced by €.
The following table is based on documentation by IBM. Microsoft documentation does not mention any of these changes, except for codepages 858 and 874.
Note that the IBM and Microsoft Thai codepages are different despite similar numbering.
It remains unclear which systems actually supported any of these euro updated codepages.
DOS codepage charts
The following codepage charts list all official Latin-based DOS codepages, and also Greek, Hebrew and Cyrillic. Arabic codepages appear in their own chapter. 874 Thai and 1258 Vietnamese are presented among other Windows codepages.
Asian double-byte codepages are missing due to technical reasons. Unless otherwise mentioned, the codepages are screenshots that have been captured in MS-DOS.
Common area (00-7F)
Characters 00-7F (hex) in the following chart are common to all DOS codepages listed here.
The chart is similar to ASCII except for control characters. Codepoints 00-1F and 7F (hex), marked with pink, have a dual nature. They can be used both as invisible ASCII control characters and displayed on the screen. Because of this, DOS codepages are downwards compatible with ASCII.
Exception: Codepage 864 Arabic is different from all other DOS codepages. It supports different symbols in the control character area. We are not going any further into the differences in this article.
Codepage 437 United States
Alternative names: Personal Computer, MS-DOS United States, MS-DOS Latin US, OEM United States, DOS Extended ASCII (United States), PC-ASCII
Codepage 437 is the original IBM "PC-ASCII" codepage. It's the basis for all other codepages. Differences exist in the 80-FF (hex) range.
In the charts that follow, differences to 437 are highlighted in green. Click the images to compare to 437.
Codepage 737 Greek II
Alternative names: 437 G, MS-DOS Greek, OEM Greek
This codepage has formerly been known as 437G.
Codepage 775 Baltic Rim
Alternative names: MS-DOS Baltic Rim, OEM Baltic
Codepage 775 is not a DOS codepage in the strictest sense. It never appeared in standalone MS-DOS. The page covers Estonian, Lithuanian and Latvian (and even Polish). It conforms to Lithuanian Standard LST 1590-1.
Codepage 850 Multilingual (Latin I)
Alternative names: Personal Computer - Multilingual Page, MS-DOS Multilingual (Latin 1), OEM Multilingual Latin 1, Western European
This codepage covered most of Western Europe, Latin America and also Canada.
Euro version: Codepage 858 was formed by changing dotless ı to € (hex D5).
Codepage 852 Slavic/Eastern European (Latin II)
Alternative names: Latin 2 - Personal Computer, MS-DOS Slavic (Latin 2), OEM Latin 2, Central European
According to MS-DOS 6.22, this codepage covered Albania, Bosnia/Herzegovina, Croatia, Czech Republic, Hungary, Poland, Romania, (Russia), Slovakia, Slovenia and Yugoslavia (Latin).
According to IBM, a euro version exists where € was added to unused position hex AA.
Codepage 855 Cyrillic I
Alternative names: Cyrillic - Personal Computer, IBM Cyrillic, MS-DOS Cyrillic, OEM Cyrillic (primarily Russian)
According to MS-DOS 6.22, this codepage covered Yugoslavia (Serbia/Montenegro, Macedonia), Bulgaria and Russia.
According to IBM, a euro version (codepage 872) exists where € appears in place of ¤ (hex CF).
Codepage 857 Turkish
Alternative names: Latin #5, Turkey - Personal Computer, IBM Turkish, MS-DOS Turkish, OEM Turkish
According to IBM, a euro version exists where € was added to unused position hex D5.
Codepage 858 Multilingual Latin I + euro
Alternative names: Personal Computer - Multilingual with euro, OEM Multilingual Latin 1 + Euro symbol, Multilingual Latin I + Euro
This is the euro version of codepage 850. The difference is that the dotless ı was changed to € (hex D5). In order to use codepage 858 in Windows 2000 and later, one needs to install it via Control Panel, type
Codepage 860 Portuguese
Alternative names: Portugal - Personal Computer, MS-DOS Portuguese, OEM Portuguese
According to MS-DOS 6.22, this codepage was for Portugal (but not Brazil).
Codepage 861 Icelandic
Alternative names: Iceland - Personal Computer, MS-DOS Icelandic, OEM Icelandic
Codepage 862 Hebrew
Alternative names: Israel - Personal Computer, MS-DOS Hebrew, OEM Hebrew
Codepage 863 Canadian-French
Alternative names: Canadian French - Personal Computer, MS-DOS Canadian French, MS-DOS French Canada, OEM French Canadian
Codepage 865 Nordic
Alternative names: Nordic - Personal Computer, MS-DOS Nordic, OEM Nordic
This codepage was for Denmark and Norway.
Codepage 866 Russian (Cyrillic II)
Alternative names: PC Data, Cyrillic, Russian; MS-DOS Cyrillic CIS 1; MS-DOS Russian; OEM Russian
This codepage was developed for the Russian language version of MS-DOS 4.01. It doesn't cover all Cyrillic languages such as Ukrainian.
According to IBM, a euro version (codepage 808) exists where € appears in place of ¤ (hex FD).
Codepage 869 Greek
Alternative names: Greece - Personal Computer, IBM Modern Greek, MS-DOS Greek 2, OEM Modern Greek
According to IBM, a euro version exists where € was added to unused position hex 87.
Arabic codepages are inadequately documented in online sources. The codepages are not well supported by English versions of either MS-DOS or Windows. Documentation is lacking or inaccurate. As per our knowledge, codepages 709, 710 and 711 have not been documented online prior to this article in 2014.
The following Arabic codepages have been captured in Arabic Windows 98 Second Edition (Arabic command line). Arabic command line means there is a special built-in utility in Windows that adds Arabic script support, such as right-to-left writing and joining of Arabic letters.
Online documentation, published by Microsoft, exists for codepages 708, 720 and 864. A multitude of characters appear to differ in Windows 98, however. Differences between Windows 98 and Microsoft documentation have been highlighted in pink. These differences may be due to Windows 98.
Codepage 708 Arabic (ASMO 708)
Codepage 708 in Arabic Windows 98 SE differs from Microsoft documentation of 708 from 1995. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.
Codepage 708 is downwards compatible with standards ASMO 708 (1988) and ISO 8859-6 (Arabic). Codepage 708 adds characters to positions unused in the standards (for comparison see the ASMO 708 set in ISO-IR 127 and ECMA-114). A reference to codepage 708 appears in the RTF file format specification, where it was added during 1989–1993.
Codepage 709 Arabic (ASMO 449+, BCON V4)
Codepage 709 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993.
Codepage 709 appears to have been built on the ASMO 449 standard. ASMO 449 is a 7-bit ASCII-like encoding (see ISO-IR 089) that has Arabic letters in place of letters A-Z, and also some symbols. Codepage 709 has lifted ASMO 449 characters to the area 80-FF (hex) and added some extra characters to unused positions. The tilde (~) at FE (hex) is incompatible with ASMO 449, though. See also: ASMO 449+.
Codepage 709 is quite similar to 708 what comes to Arabic letters and Arabic symbols, but not what comes to Latin letters, ASCII symbols and digits.
Codepage 710 Transparent Arabic
Codepage 710 was introduced in Arabic MS-DOS 3.3. It is inadequately documented in online sources.
Codepage 711 Arabic (Nafitha Enhanced)
Codepage 711 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993. Nafitha was a program that added Arabic support to DOS.
Codepages 710 and 711 are somewhat similar but not compatible with each other.
Codepage 720 Arabic (Transparent ASMO)
Alternative name: MS-DOS Arabic (Transparent ASMO)
Codepage 720 in Arabic Windows 98 SE differs from Microsoft documentation of 720. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.
Codepage 720 was added to MS-DOS 6.22 (1994). A reference to it appears in the RTF file format specification, where it was added during 1989–1993.
Codepage 864 Arabic (MS-DOS)
Alternative names: Arabic - Personal Computer, OEM Arabic
Codepage 864 in Arabic Windows 98 SE differs from Microsoft documentation of 864 from 1996. The differences are in pink. The 1996 documented version, which is a conversion table, supports more characters than the ones actually implemented in Windows 98.
According to MS-DOS 6.22, this was the only Arabic codepage available.
DOS codepages (and their history)