DOS codepages (and their history)

DOS has supported numerous character sets, also called codepages. This article documents official MS-DOS and PC DOS codepages, Windows "OEM" codepages and some rare Arabic codepages.

From their very beginning, PCs have run with a 8-bit character set with 256 characters. Compared to 7-bit ASCII, which only offered 95 visible characters, this was a great advantage. PC software could easily support many Latin-based languages. As PCs became widely popular it turned out that 256 characters were not enough. More national characters were needed. Codepages were introduced to DOS in 1987 to meet this need.

This article is intended for computing experts who already know what character sets and codepages are. We look into MS-DOS and PC DOS codepages, and also the codepages found in Windows command line mode ("DOS box"). We attempt to differentiate between "real" DOS codepages and "DOS-like" codepages. We compare codepages to other codepages. We point out differences in documented and actual behavior. We also document old Arabic codepages, for which no other online documentation existed as of 2014.

See also: Character setsWindows codepages (and their history)

Contents

Standalone DOS codepages

The DOS operating system originally supported just one character set, or code page. That was the 437 codepage, also known as PC-ASCII. Later on, several alternatives were released as DOS went into widespread international use. That happened by the release of DOS 3.3 in 1987.

The following table summarizes the code pages officially supported by standalone versions of PC DOS (IBM) and MS-DOS (Microsoft). The information is primarily based on MS-DOS versions from 3.3 to 6.22 and PC DOS versions 7 and 2000.

Standalone DOS codepages
PageNameDOSIBMNotes
437United States ("PC-ASCII")1.019811984
710Arabic (Transparent Arabic)3.31988* Arabic MS-DOS 3.3
720Arabic (Transparent ASMO)6.2219941997*
737Greek II6.219931996First a hardware solution (MDA & Hercules graphics cards)
850Multilingual (Latin I)3.319871986Original version
850 (€)Multilingual (Latin I), euro version20001998PC DOS 2000 version with euro symbol
852Slavic/Eastern European (Latin II)519911993
855Cyrillic I6.2219941988
857Turkish6.219931989
860Portuguese3.319871986
861Icelandic6.219931986
862Hebrew419881986*
863Canadian-French3.319871986
864Arabic419881986*
865Nordic3.319871986
866Russian (Cyrillic II)4.0119901991Russian MS-DOS 4.01. General support in MS-DOS 6.22 (1994).
869Greek6.219931987
912ISO 8859-2 (Latin)719951987PC DOS 7
915ISO 8859-5 (Cyrillic)719951988PC DOS 7
874Thai6.2219941992† probably in Windows 3.x Thai edition
932Japanese41988*†
934Korean41988*
936Chinese (Simplified)41988*†
938Taiwan41988*†
949Korean51991*†

Additional OEM codepages

Codepages supported by the standalone versions of DOS are not the only DOS-like codepages. The following Microsoft-documented "OEM" codepages do not appear in any of the standalone PC DOS or MS-DOS versions reviewed (up to PC DOS 2000 from 1998). Most of them seem to be supported by the command prompt under the Windows operating system.

Additional OEM codepages
OEMFirst documentedNotes
851Greek 11986 IBMlisted as "Obsolete" by IBM, "MS-DOS" by Microsoft
Windows OEM
708Arabic (ASMO 708)1989–1993 Microsoft* "MS-DOS Arabic ASMO", supported in Arabic Windows
709Arabic (ASMO 449+, BCON V4)1989–1993 Microsoft* supported in Arabic Windows
711Arabic (Nafitha Enhanced)1989–1993 Microsoft* supported in Arabic Windows
775Baltic Rim1995 Microsoft* apparently in Windows 95 (Pan European) and later
858 Multilingual Latin I + euro1998 Microsoft*
Windows ANSI and OEM
950Traditional Chinese Big5† Supported by Windows 3.1 and later
1258Vietnam1996 Microsoft† Supported by Windows 95 and later

* "Windows OEM" codepages are apparently supported by Windows command prompt.
† "Windows ANSI and OEM" codepages are supported by Windows. The same page is used in both Windows GUI and command prompt.
"First documented" refers to year when the earliest reference to the codepage has been found when writing this article.

Euro codepages (IBM)

The European Union introduced the euro currency symbol (€), which had consequences to codepages in 1998. Based on the existing DOS codepages, several updated and new codepages were defined. Either the new euro symbol was added to an unused slot in an existing page, or a new page was created where an old symbol was replaced by €.

The following table is based on documentation by IBM. Microsoft documentation doesn't mention any of these changes except for codepages 858 and 874.

Euro codepages (IBM)
Original codepageEuro codepageNotes
437United States
737Greek II
850Multilingual (Latin I)850 euro versiondotless ı ⇒ €
858 Multilingual Latin I + euro
852Slavic/Eastern European⇢ 852unused position AA ⇒ €
855Cyrillic I⇒ 872 Cyrillic with euro¤ ⇒ €
857Turkish⇢ 857unused position D5 ⇒ €
860Portuguese
861Icelandic
862Hebrew⇒ 867 Israel€ + several other changes
863Canadian-French
864Arabic⇢ 864unused position A7 ⇒ €
865Nordic
866Russian (Cyrillic II)⇒ 808 Cyrillic, Russian with euro¤ ⇒ €
869Greek⇢ 869unused position 87 ⇒ €
874Thai (Microsoft)⇢ 874 Thai (Microsoft)unused position 80 ⇒ €
874Thai (IBM)⇒1161 Thai (IBM)position DE ⇒ €
912ISO 8859-2 (Latin)
915ISO 8859-5 (Cyrillic)

Note: IBM and Microsoft Thai codepages 874 are different despite the number.

It remains unclear which systems actually supported any of these euro updated codepages.

DOS codepage charts

The following codepage charts list all official Latin-based DOS codepages, and also Greek, Hebrew and Cyrillic. Arabic codepages appear in their own chapter. 874 Thai and 1258 Vietnamese are presented among other Windows codepages.

Asian double-byte codepages are missing due to technical reasons. Unless otherwise mentioned, the codepages are screenshots that have been captured in MS-DOS.

Common area (00-7F)

Characters 00-7F (hex) in the following chart are common to all DOS codepages listed here.

DOS codepage common area

The chart is similar to ASCII except for control characters. Codepoints 00-1F and 7F (hex), marked with pink, have a dual nature. They can be used both as invisible ASCII control characters and displayed on the screen. Because of this, DOS codepages are downwards compatible with ASCII.

Exception: Codepage 864 Arabic is different from all other DOS codepages. It supports different symbols in the control character area. We are not going any further into the differences in this article.

Codepage 437 United States

Alternative names: Personal Computer, MS-DOS United States, MS-DOS Latin US, OEM United States, DOS Extended ASCII (United States), PC-ASCII

Codepage 437

437

Codepage 437 is the original IBM "PC-ASCII" codepage. It's the basis for all other codepages. Differences exist in the 80-FF (hex) range.

In the charts that follow, differences to 437 are highlighted in green. Click the images to compare to 437.

Codepage 737 Greek II

Alternative names: 437 G, MS-DOS Greek, OEM Greek

Codepage 737

737

This codepage has formerly been known as 437G.

Codepage 775 Baltic Rim

Alternative names: MS-DOS Baltic Rim, OEM Baltic

Codepage 775

775

(captured in Windows 2000 SP4)

Codepage 775 is not a DOS codepage in the strictest sense. It never appeared in standalone MS-DOS. The page covers Estonian, Lithuanian and Latvian (and even Polish). It conforms to Lithuanian Standard LST 1590-1.

Codepage 850 Multilingual (Latin I)

Alternative names: Personal Computer - Multilingual Page, MS-DOS Multilingual (Latin 1), OEM Multilingual Latin 1, Western European

Codepage 850

850

This codepage covered most of Western Europe, Latin America and also Canada.

A euro version of 850 exists. It was formed by changing dotless ı to € (hex D5). It is confusingly known with two different codepage numbers, namely 850 and 858.

Codepage 852 Slavic/Eastern European (Latin II)

Alternative names: Latin 2 - Personal Computer, MS-DOS Slavic (Latin 2), OEM Latin 2, Central European

Codepage 852

852

According to MS-DOS 6.22, this codepage covered Albania, Bosnia/Herzegovina, Croatia, Czech Republic, Hungary, Poland, Romania, (Russia), Slovakia, Slovenia and Yugoslavia (Latin).

According to IBM, a euro version exists where € was added to unused position hex AA.

Codepage 855 Cyrillic I

Alternative names: Cyrillic - Personal Computer, IBM Cyrillic, MS-DOS Cyrillic, OEM Cyrillic (primarily Russian)

Codepage 855

855

According to MS-DOS 6.22, this codepage covered Yugoslavia (Serbia/Montenegro, Macedonia), Bulgaria and Russia.

According to IBM, a euro version (codepage 872) exists where € appears in place of ¤ (hex CF).

Codepage 857 Turkish

Alternative names: Latin #5, Turkey - Personal Computer, IBM Turkish, MS-DOS Turkish, OEM Turkish

Codepage 857

857

According to IBM, a euro version exists where € was added to unused position hex D5.

Codepage 860 Portuguese

Alternative names: Portugal - Personal Computer, MS-DOS Portuguese, OEM Portuguese

Codepage 860

860

According to MS-DOS 6.22, this codepage was for Portugal (but not Brazil).

Codepage 861 Icelandic

Alternative names: Iceland - Personal Computer, MS-DOS Icelandic, OEM Icelandic

Codepage 861

861

Codepage 862 Hebrew

Alternative names: Israel - Personal Computer, MS-DOS Hebrew, OEM Hebrew

Codepage 862

862

Codepage 863 Canadian-French

Alternative names: Canadian French - Personal Computer, MS-DOS Canadian French, MS-DOS French Canada, OEM French Canadian

Codepage 863

863

Codepage 865 Nordic

Alternative names: Nordic - Personal Computer, MS-DOS Nordic, OEM Nordic

Codepage 865

865

This codepage was for Denmark and Norway.

Codepage 866 Russian (Cyrillic II)

Alternative names: PC Data, Cyrillic, Russian; MS-DOS Cyrillic CIS 1; MS-DOS Russian; OEM Russian

Codepage 866

866

This codepage was developed for the Russian language version of MS-DOS 4.01. It doesn't cover all Cyrillic languages such as Ukrainian.

According to IBM, a euro version (codepage 808) exists where € appears in place of ¤ (hex FD).

Codepage 869 Greek

Alternative names: Greece - Personal Computer, IBM Modern Greek, MS-DOS Greek 2, OEM Modern Greek

Codepage 869

869

According to IBM, a euro version exists where € was added to unused position hex 87.

Codepage 912 ISO 8859-2 (Latin)

Codepage 912

912

(captured in PC DOS 2000)

Introduced by IBM to PC DOS 7 for countries of Eastern Europe. Not a codepage in MS-DOS or Windows. Positions hex 80–9F are empty as they are reserved for invisible control characters in the ISO 8859-2 standard, even though it doesn't appear DOS actually supported these control characters.

Codepage 915 ISO 8859-5 (Cyrillic)

Codepage 915

915

(captured in PC DOS 2000)

Introduced by IBM to PC DOS 7 for Cyrillic alphabets of (the former) Yugoslavia. Not a codepage in MS-DOS or Windows. Positions hex 80–9F are empty as they are reserved for invisible control characters in the ISO 8859-5 standard, even though it doesn't appear DOS actually supported these control characters.

Euro version of codepage 850

Codepage 850 also exists as a euro version. It has the dotless ı changed to € in position hex D5. The same codepage appears with two different codepage numbers. The usual number is 858, but PC DOS 2000 calls it 850 instead.

Codepage 850 Multilingual (Latin I), euro version

Codepage 850, euro version

850 €

(captured in PC DOS 2000)

This is how codepage 850 looks in PC DOS 2000, which was released in 1998. Confusingly, the codepage number is still 850 even though it's different from the original version of 850.

Codepage 858 Multilingual Latin I + euro

Alternative names: Personal Computer - Multilingual with euro, OEM Multilingual Latin 1 + Euro symbol, Multilingual Latin I + Euro

Codepage 858

858

(captured in Windows 2000 SP4)

Codepage 858 is exactly the same as the euro version of 850. It is supported by Windows command line mode.

In order to use codepage 858 in Windows 10, one needs to select a TrueType font for the command line box and issue the command chcp 858. In Windows 2000, one needs to first install 858 support via the Control Panel, then do the other steps mentioned. Raster fonts don't appear to have the € symbol.

Arabic codepages

Arabic codepages are inadequately documented in online sources. The codepages are not well supported by English versions of either MS-DOS or Windows. Documentation is lacking or inaccurate. As per our knowledge, codepages 709, 710 and 711 have not been documented online prior to this article in 2014.

The following Arabic codepages have been captured in Arabic Windows 98 Second Edition (Arabic command line). Arabic command line means there is a special built-in utility in Windows that adds Arabic script support, such as right-to-left writing and joining of Arabic letters.

Online documentation, published by Microsoft, exists for codepages 708, 720 and 864. A multitude of characters appear to differ in Windows 98, however. Differences between Windows 98 and Microsoft documentation have been highlighted in pink. These differences may be due to Windows 98.

Codepage 708 Arabic (ASMO 708)

Codepage 708

708

(captured in Arabic Windows 98 SE)

Codepage 708 in Arabic Windows 98 SE differs from Microsoft documentation of 708 from 1995. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.

Codepage 708 is downwards compatible with standards ASMO 708 (1988) and ISO 8859-6 (Arabic). Codepage 708 adds characters to positions unused in the standards (for comparison see the ASMO 708 set in ISO-IR 127 and ECMA-114). A reference to codepage 708 appears in the RTF file format specification, where it was added during 1989–1993.

Codepage 709 Arabic (ASMO 449+, BCON V4)

Codepage 709

709

(captured in Arabic Windows 98 SE)

Codepage 709 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993.

Codepage 709 appears to have been built on the ASMO 449 standard. ASMO 449 is a 7-bit ASCII-like encoding (see ISO-IR 089) that has Arabic letters in place of letters A-Z, and also some symbols. Codepage 709 has lifted ASMO 449 characters to the area 80-FF (hex) and added some extra characters to unused positions. The tilde (~) at FE (hex) is incompatible with ASMO 449, though. See also: ASMO 449+.

Codepage 709 is quite similar to 708 what comes to Arabic letters and Arabic symbols, but not what comes to Latin letters, ASCII symbols and digits.

Codepage 710 Transparent Arabic

Codepage 710

710

(captured in Arabic Windows 98 SE)

Codepage 710 was introduced in Arabic MS-DOS 3.3. It is inadequately documented in online sources.

Codepage 711 Arabic (Nafitha Enhanced)

Codepage 711

711

(captured in Arabic Windows 98 SE)

Codepage 711 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993. Nafitha was a program that added Arabic support to DOS.

Codepages 710 and 711 are somewhat similar but not compatible with each other.

Codepage 720 Arabic (Transparent ASMO)

Alternative name: MS-DOS Arabic (Transparent ASMO)

Codepage 720

720

(captured in Arabic Windows 98 SE)

Codepage 720 in Arabic Windows 98 SE differs from Microsoft documentation of 720. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.

Codepage 720 was added to MS-DOS 6.22 (1994). A reference to it appears in the RTF file format specification, where it was added during 1989–1993.

Codepage 864 Arabic (MS-DOS)

Alternative names: Arabic - Personal Computer, OEM Arabic

Codepage 864

864

(captured in Arabic Windows 98 SE)

Codepage 864 in Arabic Windows 98 SE differs from Microsoft documentation of 864 from 1996. The differences are in pink. The 1996 documented version, which is a conversion table, supports more characters than the ones actually implemented in Windows 98.

According to MS-DOS 6.22, this was the only Arabic codepage available.


Article updated in August 2021: Added codepages 912, 915 and euro version of 850.

DOS codepages (and their history)
URN:NBN:fi-fe201401011002