DOS codepages (and their history)

DOS has supported numerous character sets, also called codepages. This article documents official MS-DOS codepages and also Windows "OEM" codepages and some rare Arabic codepages.

From their very beginning, PCs have run with a 8-bit character set with 256 characters. Compared to 7-bit ASCII, which only offered 95 visible characters, this was a great advantage. PC software could easily support many Latin-based languages. As PCs became widely popular it turned out that 256 characters were not enough. More national characters were needed. Codepages were introduced to DOS in 1987 to meet this need.

This article is intended for computing experts who already know what character sets and codepages are. We look into MS-DOS and PC-DOS codepages, and also the codepages found in Windows command line mode ("DOS box"). We attempt to differentiate between "real" DOS codepages and "DOS-like" codepages. We compare codepages to other codepages. We point out differences in documented and actual behavior. We also document old Arabic codepages, for which no other online documentation existed as of 2014.

See also: Character setsWindows codepages (and their history)

Contents

Standalone DOS codepages

The DOS operating system originally supported just one character set, or code page. That was the 437 codepage, also known as PC-ASCII. Later on, several alternatives were released as DOS went into widespread international use. That happened by the release of DOS 3.3 in 1987.

The following table summarizes the code pages officially supported by standalone versions of PC-DOS (IBM) and MS-DOS (Microsoft). The information is primarily based on MS-DOS versions from 3.3 to 6.22.

PageNameDOSIBMNotes
437United States ("PC-ASCII")1.019811984
710Arabic (Transparent Arabic)3.31988* Arabic MS-DOS 3.3
720Arabic (Transparent ASMO)6.2219941997*
737Greek II6.219931996First a hardware solution (MDA & Hercules graphics cards)
850Multilingual (Latin I)3.319871986
852Slavic/Eastern European (Latin II)519911993
855Cyrillic I6.2219941988
857Turkish6.219931989
860Portuguese3.319871986
861Icelandic6.219931986
862Hebrew419881986*
863Canadian-French3.319871986
864Arabic419881986*
865Nordic3.319871986
866Russian (Cyrillic II)4.0119901991Russian MS-DOS 4.01. General support in MS-DOS 6.22 (1994).
869Greek6.219931987
874Thai6.2219941992† probably in Windows 3.x Thai edition
932Japanese41988*†
934Korean41988*
936Chinese (Simplified)41988*†
938Taiwan41988*†
949Korean51991*†
  • DOS: First DOS version supporting this codepage with year released.
  • IBM: Year of first appearance in IBM registry (Graphic Character Sets and Code Pages).
  • * Support required a special language version of MS-DOS.
  • † "Windows ANSI and OEM" codepage (used both in DOS and Windows).

Additional OEM codepages

More codepages do exist. The following Microsoft-documented "OEM" codepages do not appear in any of the standalone PC-DOS or MS-DOS versions reviewed (up to MS-DOS 6.22 from 1994). Most of them seem to be supported by the command prompt under the Windows operating system.

Additional OEM codepagesFirst documentedNotes
851Greek 11986 IBMlisted as "Obsolete" by IBM, "MS-DOS" by Microsoft
Windows OEM
708Arabic (ASMO 708)1989–1993 Microsoft* "MS-DOS Arabic ASMO", supported in Arabic Windows
709Arabic (ASMO 449+, BCON V4)1989–1993 Microsoft* supported in Arabic Windows
711Arabic (Nafitha Enhanced)1989–1993 Microsoft* supported in Arabic Windows
775Baltic Rim1995 Microsoft* apparently in Windows 95 (Pan European) and later
858 Multilingual Latin I + euro1998 Microsoft*
Windows ANSI and OEM
950Traditional Chinese Big5** Supported by Windows 3.1 and later
1258Vietnam1996 Microsoft** Supported by Windows 95 and later

* "Windows OEM" codepages are apparently supported by Windows command prompt.
** "Windows ANSI and OEM" codepages are supported by Windows. The same page is used in both Windows GUI and command prompt.
"First documented" refers to year when the earliest reference to the codepage has been found when writing this article.

Euro codepages (IBM)

The European Union introduced the euro currency symbol (€), which had consequences to codepages in 1998. Based on the existing DOS codepages, several updated and new codepages were defined. Either the new euro symbol was added to an unused slot in an existing page, or a new page was created where an old symbol was replaced by €.

The following table is based on documentation by IBM. Microsoft documentation does not mention any of these changes, except for codepages 858 and 874.

Original codepageEuro codepageNotes
437United States
737Greek II
850Multilingual (Latin I)858 Multilingual Latin I + eurodotless ı ⇒ €.
852Slavic/Eastern European⇢ 852€ added (hex AA)
855Cyrillic I⇒ 872 Cyrillic with euro¤ ⇒ €
857Turkish⇢ 857€ added (hex D5)
860Portuguese
861Icelandic
862Hebrew⇒ 867 Israel€ + several other changes
863Canadian-French
864Arabic⇢ 864€ added (hex A7)
865Nordic
866Russian (Cyrillic II)⇒ 808 Cyrillic, Russian with euro¤ ⇒ €
869Greek⇢ 869€ added (hex 87)
874Thai (Microsoft)⇢ 874 Thai (Microsoft)€ added (hex 80)
874Thai (IBM)⇒1161 Thai (IBM)hex DE ⇒ €

Note that the IBM and Microsoft Thai codepages are different despite similar numbering.

It remains unclear which systems actually supported any of these euro updated codepages.

DOS codepage charts

The following codepage charts list all official Latin-based DOS codepages, and also Greek, Hebrew and Cyrillic. Arabic codepages appear in their own chapter. 874 Thai and 1258 Vietnamese are presented among other Windows codepages.

Asian double-byte codepages are missing due to technical reasons. Unless otherwise mentioned, the codepages are screenshots that have been captured in MS-DOS.

Common area (00-7F)

Characters 00-7F (hex) in the following chart are common to all DOS codepages listed here.

The chart is similar to ASCII except for control characters. Codepoints 00-1F and 7F (hex), marked with pink, have a dual nature. They can be used both as invisible ASCII control characters and displayed on the screen. Because of this, DOS codepages are downwards compatible with ASCII.

Exception: Codepage 864 Arabic is different from all others DOS codepages. It supports different symbols in the control character area. We are not going any further into that in this article.

Codepage 437 United States

Alternative names: Personal Computer, MS-DOS United States, MS-DOS Latin US, OEM United States, DOS Extended ASCII (United States), PC-ASCII

Codepage 437 is the original IBM "PC-ASCII" codepage. Other codepages differ in the 80-FF (hex) range.

437

In the following charts, differences to 437 are highlighted in green. Click the images to compare to 437.

Codepage 737 Greek II

Alternative names: 437 G, MS-DOS Greek, OEM Greek

Codepage 737

737 - Click to swap

This codepage has formerly been known as 437G.

Codepage 775 Baltic Rim

Alternative names: MS-DOS Baltic Rim, OEM Baltic

Codepage 775

775 - Click to swap

(captured in Windows 2000 SP4)

Codepage 775 is not a DOS codepage in the strictest sense. It never appeared in standalone MS-DOS. The page covers Estonian, Lithuanian and Latvian (and even Polish). It conforms to Lithuanian Standard LST 1590-1.

Codepage 850 Multilingual (Latin I)

Alternative names: Personal Computer - Multilingual Page, MS-DOS Multilingual (Latin 1), OEM Multilingual Latin 1, Western European

Codepage 850

850 - Click to swap

This codepage covered most of Western Europe, Latin America and also Canada.

Euro version: Codepage 858 was formed by changing dotless ı to € (hex D5).

Codepage 852 Slavic/Eastern European (Latin II)

Alternative names: Latin 2 - Personal Computer, MS-DOS Slavic (Latin 2), OEM Latin 2, Central European

Codepage 852

852 - Click to swap

According to MS-DOS 6.22, this codepage covered Albania, Bosnia/Herzegovina, Croatia, Czech Republic, Hungary, Poland, Romania, (Russia), Slovakia, Slovenia and Yugoslavia (Latin).

According to IBM, a euro version exists where € was added to unused position hex AA.

Codepage 855 Cyrillic I

Alternative names: Cyrillic - Personal Computer, IBM Cyrillic, MS-DOS Cyrillic, OEM Cyrillic (primarily Russian)

Codepage 855

855 - Click to swap

According to MS-DOS 6.22, this codepage covered Yugoslavia (Serbia/Montenegro, Macedonia), Bulgaria and Russia.

According to IBM, a euro version (codepage 872) exists where € appears in place of ¤ (hex CF).

Codepage 857 Turkish

Alternative names: Latin #5, Turkey - Personal Computer, IBM Turkish, MS-DOS Turkish, OEM Turkish

Codepage 857

857 - Click to swap

According to IBM, a euro version exists where € was added to unused position hex D5.

Codepage 858 Multilingual Latin I + euro

Alternative names: Personal Computer - Multilingual with euro, OEM Multilingual Latin 1 + Euro symbol, Multilingual Latin I + Euro

Codepage 858

858

(captured in Windows 2000 SP4)

This is the euro version of codepage 850. In order to use it in Windows 2000 and later, one needs to install it via Control Panel, type chcp 858 and select a TrueType font for the command line box.

Codepage 860 Portuguese

Alternative names: Portugal - Personal Computer, MS-DOS Portuguese, OEM Portuguese

Codepage 860

860 - Click to swap

According to MS-DOS 6.22, this codepage was for Portugal (but not Brazil).

Codepage 861 Icelandic

Alternative names: Iceland - Personal Computer, MS-DOS Icelandic, OEM Icelandic

Codepage 861

861 - Click to swap

Codepage 862 Hebrew

Alternative names: Israel - Personal Computer, MS-DOS Hebrew, OEM Hebrew

Codepage 862

862 - Click to swap

Codepage 863 Canadian-French

Alternative names: Canadian French - Personal Computer, MS-DOS Canadian French, MS-DOS French Canada, OEM French Canadian

Codepage 863

863 - Click to swap

Codepage 865 Nordic

Alternative names: Nordic - Personal Computer, MS-DOS Nordic, OEM Nordic

Codepage 865

865 - Click to swap

This codepage was for Denmark and Norway.

Codepage 866 Russian (Cyrillic II)

Alternative names: PC Data, Cyrillic, Russian; MS-DOS Cyrillic CIS 1; MS-DOS Russian; OEM Russian

Codepage 866

866 - Click to swap

This codepage was developed for the Russian language version of MS-DOS 4.01. It doesn't cover all Cyrillic languages such as Ukrainian.

According to IBM, a euro version (codepage 808) exists where € appears in place of ¤ (hex FD).

Codepage 869 Greek

Alternative names: Greece - Personal Computer, IBM Modern Greek, MS-DOS Greek 2, OEM Modern Greek

Codepage 869

869 - Click to swap

According to IBM, a euro version exists where € was added to unused position hex 87.

Arabic codepages

Arabic codepages are inadequately documented in online sources. The codepages are not well supported by English versions of either MS-DOS or Windows. Documentation is lacking or inaccurate. As per our knowledge, codepages 709, 710 and 711 have not been documented online prior to this article in 2014.

The following Arabic codepages have been captured in Arabic Windows 98 Second Edition (Arabic command line). Arabic command line means there is a special built-in utility in Windows that adds Arabic script support, such as right-to-left writing and joining of Arabic letters.

Online documentation, published by Microsoft, exists for codepages 708, 720 and 864. A multitude of characters appear to differ in Windows 98, however. Differences between Windows 98 and Microsoft documentation have been highlighted in pink. These differences may be due to Windows 98.

Codepage 708 Arabic (ASMO 708)

Codepage 708

708 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 708 in Arabic Windows 98 SE differs from Microsoft documentation of 708 from 1995. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.

Codepage 708 is downwards compatible with standards ASMO 708 (1988) and ISO 8859-6 (Arabic). Codepage 708 adds characters to positions unused in the standards (for comparison see the ASMO 708 set in ISO-IR 127 and ECMA-114). A reference to codepage 708 appears in the RTF file format specification, where it was added during 1989–1993.

Codepage 709 Arabic (ASMO 449+, BCON V4)

Codepage 709

709 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 709 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993.

Codepage 709 appears to have been built on the ASMO 449 standard. ASMO 449 is a 7-bit ASCII-like encoding (see ISO-IR 089) that has Arabic letters in place of letters A-Z, and also some symbols. Codepage 709 has lifted ASMO 449 characters to the area 80-FF (hex) and added some extra characters to unused positions. The tilde (~) at FE (hex) is incompatible with ASMO 449, though. See also: ASMO 449+.

Codepage 709 is quite similar to 708 what comes to Arabic letters and Arabic symbols, but not what comes to Latin letters, ASCII symbols and digits.

Codepage 710 Transparent Arabic

Codepage 710

710 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 710 was introduced in Arabic MS-DOS 3.3. It is inadequately documented in online sources.

Codepage 711 Arabic (Nafitha Enhanced)

Codepage 711

711 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 711 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993. Nafitha was a program that added Arabic support to DOS.

Codepages 710 and 711 are somewhat similar, but not compatible with each other.

Codepage 720 Arabic (Transparent ASMO)

Alternative name: MS-DOS Arabic (Transparent ASMO)

Codepage 720

720 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 720 in Arabic Windows 98 SE differs from Microsoft documentation of 720. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.

Codepage 720 was added to MS-DOS 6.22 (1994). A reference to it appears in the RTF file format specification, where it was added during 1989–1993.

Codepage 864 Arabic (MS-DOS)

Alternative names: Arabic - Personal Computer, OEM Arabic

Codepage 864

864 - Click to swap

(captured in Arabic Windows 98 SE)

Codepage 864 in Arabic Windows 98 SE differs from Microsoft documentation of 864 from 1996. The differences are in pink. The 1996 documented version, which is a conversion table, supports more characters than the ones actually implemented in Windows 98.

According to MS-DOS 6.22, this was the only Arabic codepage available.

Article updated in September 2014: Additional OEM codepages, OEM euro codepages, Arabic codepages. Windows codepages moved to a separate article.

DOS codepages (and their history)
URN:NBN:fi-fe201401011002

©Aivosto Oy -