fbpx
Wikipedia

C0 and C1 control codes

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

C0 codes are the range 00HEX–1FHEX and the default C0 set was originally defined in ISO 646 (ASCII). C1 codes are the range 80HEX–9FHEX and the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used.

C0 controls edit

ASCII defined 32 control characters, plus a necessary extra character for the DEL character, 7FHEX or 01111111BIN (needed to punch out all the holes on a paper tape and erase it).

This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals.

Only a few codes have maintained their use: BEL, ESC, and the "Format Effector" (FEn) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the C string terminator. Some data transfer protocols such as ANPA-1312, Kermit, and XMODEM do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (ISn) such as the Unix info format[1] and Python's splitlines string method.[2]

The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for the renamed controls (the old name is the one matching the abbreviation).

ASCII control codes, originally defined in ANSI X3.4.[3]
Decimal
Hexadecimal
Abbreviations Name Description
^@ 0 00 NUL Null \0 Does nothing. The code of blank paper tape, and also used for padding to slow transmission.
^A 1 01 TC1, SOH Start of Heading First character of the heading of a message.[4]
^B 2 02 TC2, STX Start of Text Terminates the header and starts the message text.
^C 3 03 TC3, ETX End of Text Ends the message text, starts a footer (up to the next TC character).[4][5]
^D 4 04 TC4, EOT End of Transmission Ends the transmission of one or more messages.[4][5] May place terminals on standby.[5]
^E 5 05 TC5, ENQ, WRU[a] Enquiry Trigger a response at the receiving end, to see if it is still present.
^F 6 06 TC6, ACK Acknowledge Indication of successful receipt of a message.
^G 7 07 BEL[b] Bell, Alert \a Call for attention from an operator.
^H 8 08 FE0, BS Backspace \b Move one position leftwards. Next character may overprint or replace the character that was there.
^I 9 09 FE1, HT Character Tabulation,
Horizontal Tabulation
\t Move right to the next tab stop.
^J 10 0A FE2, LF Line Feed \n Move down to the same position on the next line (some devices also moved to the left column).
^K 11 0B FE3, VT Line Tabulation,
Vertical Tabulation
\v Move down to the next vertical tab stop.
^L 12 0C FE4, FF Form Feed \f Move down to the top of the next page.
^M 13 0D FE5, CR Carriage Return \r Move to column zero while staying on the same line.
^N 14 0E SO, LS0[c] Shift Out Switch to an alternative character set.
^O 15 0F SI, LS1[c] Shift In Return to regular character set after SO.
^P 16 10 TC7, DC0,[d] DLE Data Link Escape Cause a limited number of contiguously following characters to be interpreted in some different way.[14][15]
^Q 17 11 DC1, XON Device Control One Turn on (DC1 and DC2) or off (DC3 and DC4) devices.

Teletype[6] used these for the paper tape reader and the paper tape punch. The first use became the de facto standard for software flow control.[16]

^R 18 12 DC2, TAPE Device Control Two
^S 19 13 DC3, XOFF Device Control Three
^T 20 14 DC4, TAPE Device Control Four
^U 21 15 TC8, NAK Negative Acknowledge Negative response to a sender, such as a detected error.
^V 22 16 TC9, SYN Synchronous Idle Sent in synchronous transmission systems when no other character is being transmitted.
^W 23 17 TC10, ETB End of Transmission Block End of a transmission block of data when data are divided into such blocks for transmission purposes.
^X 24 18 CAN Cancel Indicates that the data preceding it are in error or are to be disregarded.
^Y 25 19 EM End of medium Indicates on paper or magnetic tapes that the end of the usable portion of the tape had been reached.[3]
^Z 26 1A SUB Substitute Replaces a character that was found to be invalid or in error. Should be ignored.
^[ 27 1B ESC Escape \e
[e]
Alters the meaning of a limited number of following bytes.
Nowadays this is almost always used to introduce an ANSI escape sequence.
^\ 28 1C IS4, FS File Separator Can be used as delimiters to mark fields of data structures. US is the lowest level, while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it. SP (space) could be considered an even lower level.
^] 29 1D IS3, GS Group Separator
^^ 30 1E IS2, RS Record Separator
^_ 31 1F IS1, US Unit Separator
While not technically part of the C0 control character range, the following two characters can be thought of as having some characteristics of control characters.
  32 20 SP Space Move right one character position.
^? 127 7F DEL Delete Should be ignored. Used to delete characters on punched tape by punching out all the holes.
  1. ^ Teletype labelled the key WRU for 'who are you?'[6]
  2. ^ The name BELL is assigned by Unicode to the unrelated emoji character 🔔 (U+1F514). While C0 and C1 control characters were not formally named by the Unicode standard itself at the time, this collided with existing use of BELL as the name of this control character in software following the previous versions of UTS#18 (the Unicode Regular Expressions standard),[7] e.g. in Perl.[8] Unicode now accepts ALERT and BEL (but not BELL) as formal aliases for the control character,[9] although the code chart still lists BELL as the ISO 6429 alias,[10] and the corresponding control picture code point is called SYMBOL FOR BELL. Perl subsequently switched to using BELL for the emoji in version 5.18.[11]
  3. ^ a b ISO/IEC 2022 (ECMA-35) refers to these as LS0 and LS1 in 8-bit environments, and as SI and SO in 7-bit environments.[12]
  4. ^ The first, 1963 edition of ASCII classified DLE as a device control, rather than a transmission control, and gave it the abbreviation DC0 ("device control reserved for data link escape").[13]
  5. ^ The '\e' escape sequence is not part of ISO C and many other language specifications. However, it is understood by several compilers, including GCC.

C1 controls edit

In 1973, ECMA-35 and ISO 2022[17] attempted to define a method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and vice versa.[18] In a 7-bit environment, the Shift Out (SO) would change the meaning of the 96 bytes 0x20 through 0x7F[a][20] (i.e. all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range 0x80 through 0x9F could not be printed in a 7-bit environment,[18] thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes. To allow a 7-bit environment to use these new controls, the sequences ESC @ through ESC _ were to be considered equivalent.[18] The later ISO 8859 standards abandoned support for 7-bit codes, but preserved this range of control characters.

The first C1 control code set to be registered for use with ISO 2022 was DIN 31626,[21] a specialised set for bibliographic use which was registered in 1979.[22]

The more common general-use ISO/IEC 6429 set was registered in 1983,[23] although the ECMA-48 specification upon which it was based had been first published in 1976[24] and JIS X 0211 (formerly JIS C 6323).[25] Symbolic names defined by RFC 1345 and early drafts of ISO 10646, but not in ISO/IEC 6429 (PAD, HOP and SGC) are also used.[8][26]

Except for SS2 and SS3 in EUC-JP text, and NEL in text transcoded from EBCDIC, the 8-bit forms of these codes were almost never used. CSI, DCS and OSC are used to control text terminals and terminal emulators, but almost always by using their 7-bit escape code representations. Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of Windows-1252 or Mac OS Roman.

ISO/IEC 6429 and RFC 1345 C1 control codes
ESC+
Decimal
Hex
Abbr Name Description[27]
@ 128 80 PAD[9] Padding Character[b] Proposed as a "padding" or "high byte" for single-byte characters to make them two bytes long for easier interoperability with multiple byte characters. Extended Unix Code (EUC) occasionally uses this.[28]
A 129 81 HOP[9] High Octet Preset[b] Proposed to set the high byte of a sequence of multiple byte characters so they only need one byte each, as a simple form of data compression.
B 130 82 BPH Break Permitted Here[c] Follows a graphic character where a line break is permitted. Roughly equivalent to a soft hyphen or zero-width space except it does not define what is printed at the line break.
C 131 83 NBH No Break Here[c] Follows the graphic character that is not to be broken. See also word joiner.
D 132 84 IND Index[d] Move down one line without moving horzontally, to eliminate ambiguity about the meaning of LF.
E 133 85 NEL Next Line Equivalent to CR+LF, to match the EBCDIC control character.
F 134 86 SSA Start of Selected Area Used by block-oriented terminals. In xterm ESC F moves to the lower-left corner of the screen, since certain software assumes this behaviour.[29]
G 135 87 ESA End of Selected Area
H 136 88 HTS
  • Character Tabulation Set
  • Horizontal Tabulation Set
Set a tab stop at the current position.
I 137 89 HTJ
  • Character Tabulation With Justification
  • Horizontal Tabulation With Justification
Right-justify the text since the last tab against the next tab stop.
J 138 8A VTS
  • Line Tabulation Set
  • Vertical Tabulation Set
Set a vertical tab stop.
K 139 8B PLD
  • Partial Line Forward
  • Partial Line Down
To produce subscripts and superscripts in ISO/IEC 6429.
Subscripts use PLD text PLU while superscripts use PLU text PLD.
L 140 8C PLU
  • Partial Line Backward
  • Partial Line Up
M 141 8D RI
  • Reverse Line Feed
  • Reverse Index
Move up one line.
N 142 8E SS2 Single-Shift 2 Next character is from the G2 or G3 sets, respectively.
O 143 8F SS3 Single-Shift 3
P 144 90 DCS Device Control String Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C). Xterm defined a number of these.[30]
Q 145 91 PU1 Private Use 1 Reserved for private function agreed on between the sender and the recipient of the data.
R 146 92 PU2 Private Use 2
S 147 93 STS Set Transmit State
T 148 94 CCH Cancel character Destructive backspace, to eliminate ambiguity about meaning of BS.
U 149 95 MW Message Waiting
V 150 96 SPA Start of Protected Area Used by block-oriented terminals.
W 151 97 EPA End of Protected Area
X 152 98 SOS Start of String[c] Followed by a control string terminated by ST (0x9C) which (unlike DCS, OSC, PM or APC) may contain any character except SOS or ST.
Y 153 99 SGC,[9] SGCI[31] Single Graphic Character Introducer[b] Intended to allow an arbitrary Unicode character to be printed; it would be followed by that character, most likely encoded in UTF-1.[31]
Z 154 9A SCI Single Character Introducer[c] To be followed by a single printable character (0x20 through 0x7E) or format effector (0x08 through 0x0D), and to print it as ASCII no matter what graphic or control sets were in use.
[ 155 9B CSI Control Sequence Introducer Used to introduce control sequences that take parameters. Used for ANSI escape sequences.
\ 156 9C ST String Terminator Terminates a string started by DCS, SOS, OSC, PM or APC.
] 157 9D OSC Operating System Command Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C), intended for use to allow in-band signaling of protocol information, but rarely used for that purpose.

Some terminal emulators, including xterm, use OSC sequences for setting the window title and changing the colour palette. They may also support terminating an OSC sequence with BEL instead of ST.[32] Kermit used APC to transmit commands.[33]

^ 158 9E PM Privacy Message
_ 159 9F APC Application Program Command
  1. ^ In early versions the range excluded SP and DEL[19]
  2. ^ a b c Not part of ISO/IEC 6429 (ECMA-48)[8][26]
  3. ^ a b c d Not part of the first edition of ISO/IEC 6429.[23]
  4. ^ Deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429 (1986 and 1991 respectively for ECMA-48).[citation needed]

Other control code sets edit

The ISO/IEC 2022 (ECMA-35) extension mechanism allowed escape sequences to change the C0 and C1 sets. The standard C0 control character set shown above is chosen with the sequence ESC ! @ and the above C1 set chosen with the sequence ESC " C.[23]

Several official and unofficial alternatives have been defined, but this is pretty much obsolete. Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability. The standard makes ESC,[34][35] SP and DEL[a] "fixed" coded characters, which are available in their ASCII locations in all encodings that conform to the standard.[37] It also specifies that if a C0 set included transmission control (TCn) codes, they must be encoded at their ASCII locations[34] and could not be put in a C1 set,[38] and any new transmission controls must be in a C1 set.[34]

Other C0 control code sets edit

  • ANPA-1312, a text markup language used for news transmission, replaces several C0 control characters.
  • IPTC 7901, the newer international version of the above, has its own variations.
  • Videotex has a completely different set.
  • Teletext also defines a set similar to Videotex.
  • T.61/T.51,[39] and others[40] replaced EM and GS with SS2 and SS3 so these functions could be used in a 7-bit environment.
  • Some sets replaced FS with SS2,[41] (same as ANPA-1312).
  • The now-withdrawn JIS C 6225, designated JIS X 0207 in later sources.[42] replaced FS with CEX or "Control Extension"[43] which introduces control sequences for vertical text behaviour, superscripts and subscripts[44] and for transmitting custom character graphics.[42]

Replacement C1 character sets edit

  • A specialized C1 control code set is registered for bibliographic use (including string collation), such as by MARC-8.[22][45][46]
  • Various specialised C1 control code sets are registered for use by Videotex formats.[21]
  • EBCDIC defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to Unicode (or to ISO 8859), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA).[47][48] Although the New Line (NL) does translate to the ISO/IEC 6429 NEL (although it is often swapped with LF, following UNIX line ending convention),[47] the remainder of the control codes do not correspond. For example, the EBCDIC control SPS and the ECMA-48 control PLU are both used to begin a superscript or end a subscript, but are not mapped to one another. Extended-ASCII-mapped EBCDIC can therefore be regarded as having its own C1 set, although it is not registered with the ISO-IR registry for ISO/IEC 2022.[21]

Unicode edit

Unicode inherits its first 256 code points from ISO 8859-1, hence also the 65 code points described above, giving them the general category Cc (control). These are:

Unicode only specifies semantics for the C0 format controls HT, LF, VT, FF, and CR, (note BS is missing); the C0 information separators FS, GS, RS, US (and SP); and the C1 control NEL.[49] The rest of the codes are transparent to Unicode and their meanings are left to higher-level protocols, with ISO/IEC 6429 suggested as a default.[49]

Unicode includes many additional format effector characters besides these, such as marks, embeds, isolates and pops for explicit bidirectional formatting, and the zero-width joiner and non-joiner for controlling ligature use. However these are given the general category Cf (format) rather than Cc.

See also edit

Footnotes edit

  1. ^ ISO/IEC 4873 extends this requirement to the C1 SS2 and SS3,[36] although ISO/IEC 2022 itself does not.

References edit

  1. ^ Fox, Brian. "Adding a new node to Info". Info: The online, menu-driven GNU documentation system. GNU Project.
  2. ^ "Built-in Types § str.splitlines". The Python Standard Library. Python Software Foundation.
  3. ^ a b ISO/TC 97/SC 2 (1975). The set of control characters of the ISO 646 (PDF). ITSCJ/IPSJ. ISO-IR-1.{{citation}}: CS1 maint: numeric names: authors list (link)
  4. ^ a b c IPTC (1995). The IPTC Recommended Message Format (PDF) (5th ed.). IPTC TEC 7901.
  5. ^ a b c . Federal Standard 1037C. 1996. Archived from the original on 2016-03-09.
  6. ^ a b Robert McConnell; James Haynes; Richard Warren (December 2002). "Understanding ASCII Codes". NADCOMM.
  7. ^ Williamson, Karl. "Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0".
  8. ^ a b c Ken Whistler (July 20, 2011). "Formal Name Aliases for Control Characters, L2/11-281". Unicode Consortium.
  9. ^ a b c d "Name Aliases". Unicode Character Database. Unicode Consortium.
  10. ^ "C0 Controls and Basic Latin" (PDF). Unicode Consortium.
  11. ^ "charnames". Perl Programming Documentation.
  12. ^ ECMA (1994). "7.3: Invocation of character-set code elements". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 14. ECMA-35.
  13. ^ American Standards Association (1963). American Standard Code for Information Interchange: 4. Legend. p. 6. ASA X3.4-1963.
  14. ^ . Federal Standard 1037C. 1996. Archived from the original on 2016-08-01.
  15. ^ "Supplementary transmission control functions (an extension of the basic mode control procedures for data communication systems)". European Computer Manufacturers Association. 1972. ECMA-37.
  16. ^ "What is the point of Ctrl-S?". Unix and Linux Stack exchange. Retrieved 14 February 2019.
  17. ^ ECMA/TC 1 (1973). "Brief History". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)
  18. ^ a b c ECMA/TC 1 (1971). "8.2: Correspondence between the 7-bit Code and an 8-bit Code". Extension of the 7-bit Coded Character Set (PDF) (1st ed.). ECMA. pp. 21–24. ECMA-35:1971.{{citation}}: CS1 maint: numeric names: authors list (link)
  19. ^ ECMA/TC 1 (1973). "4.2: Specific Control Characters". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. p. 16. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)
  20. ^ ECMA/TC 1 (1985). "5.3.8: Sets of 96 graphic characters". Code Extension Techniques (PDF) (4th ed.). ECMA. pp. 17–18. ECMA-35:1985.{{citation}}: CS1 maint: numeric names: authors list (link)
  21. ^ a b c ISO/IEC International Register of Coded Character Sets To Be Used With Escape Sequences (PDF), ITSCJ/IPSJ, ISO-IR
  22. ^ a b DIN (1979-07-15). Additional Control Codes for Bibliographic Use according to German Standard DIN 31626 (PDF). ITSCJ/IPSJ. ISO-IR-40.
  23. ^ a b c ISO/TC97/SC2 (1983-10-01). C1 Control Set of ISO 6429:1983 (PDF). ITSCJ/IPSJ. ISO-IR-77.{{citation}}: CS1 maint: numeric names: authors list (link)
  24. ^ ECMA/TC 1 (1979). "Brief History". Additional Control Functions for Character-Imaging I/O Devices (PDF) (2nd ed.). ECMA. ECMA-48:1979.{{citation}}: CS1 maint: numeric names: authors list (link)
  25. ^ "JIS X 02xx 符号" (in Japanese).
  26. ^ a b Ken Whistler (2015-10-05). "Why Nothing Ever Goes Away". Unicode Mailing List.
  27. ^ ECMA (1991). Control Functions for Coded Character Sets. Standard ECMA-48.
  28. ^ Lunde, Ken (2008). CJKV Information Processing: Chinese, Japanese, Korean, and Vietnamese Computing. O'Reilly. p. 244. ISBN 9780596800925.
  29. ^ "VT100 Widget Resources (§ hpLowerleftBugCompat)". xterm - terminal emulator for X.
  30. ^ Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Device-Control functions". XTerm Control Sequences.
  31. ^ a b Brender, Ronald F. (1989). "Ada 9x Project Report: Character Set Issues for Ada 9x". Carnegie Mellon University.
  32. ^ Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Operating System Commands". XTerm Control Sequences.
  33. ^ Frank da Cruz; Christine Gianone (1997). Using C-Kermit. Digital Press. p. 278. ISBN 978-1-55558-164-0.
  34. ^ a b c ECMA (1994). "6.4.2: Primary sets of coded control functions". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 11. ECMA-35.
  35. ^ ISO/TC97/SC2/WG-7; ECMA (1985-08-01). Minimum C0 set for ISO 4873 (PDF). ITSCJ/IPSJ. ISO-IR-104.{{citation}}: CS1 maint: numeric names: authors list (link)
  36. ^ ISO/TC97/SC2/WG-7; ECMA (1985-08-01). Minimum C1 Set for ISO 4873 (PDF). ITSCJ/IPSJ. ISO-IR-105.{{citation}}: CS1 maint: numeric names: authors list (link)
  37. ^ ECMA (1994). "6.2: Fixed coded characters". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 7. ECMA-35.
  38. ^ ECMA (1994). "6.4.3: Supplementary sets of coded control functions". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 11. ECMA-35.
  39. ^ ITU (1985). Teletex Primary Set of Control Functions (PDF). ITSCJ/IPSJ. ISO-IR-106.
  40. ^ Úřad pro normalizaci a měřeni (1987). The set of control characters of ISO 646, with EM replaced by SS2 (PDF). ITSCJ/IPSJ. ISO-IR-140.
  41. ^ ISO/TC 97/SC 2 (1977). The set of control characters of ISO 646, with IS4 replaced by Single Shift for G2 (SS2) (PDF). ITSCJ/IPSJ. ISO-IR-36.{{citation}}: CS1 maint: numeric names: authors list (link)
  42. ^ a b ISO/TC97/SC2/WG6. (PDF). ISO/TC97/SC2/WG6 N317.rev. Archived from the original (PDF) on 2020-10-26.{{cite web}}: CS1 maint: numeric names: authors list (link)
  43. ^ ISO/TC 97/SC 2 (1982). The C0 set of Control Characters of Japanese Standard JIS C 6225-1979 (PDF). ITSCJ/IPSJ. ISO-IR-74.{{citation}}: CS1 maint: numeric names: authors list (link)
  44. ^ Printronix (2012). OKI® Programmer's Reference Manual (PDF). p. 26.
  45. ^ ISO/TC 46 (1983-06-01). Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 (PDF). ITSCJ/IPSJ. ISO-IR-67.{{citation}}: CS1 maint: numeric names: authors list (link)
  46. ^ ISO/TC 46 (1986-02-01). Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 (PDF). ITSCJ/IPSJ. ISO-IR-124.{{citation}}: CS1 maint: numeric names: authors list (link)
  47. ^ a b Umamaheswaran, V.S. (1999-11-08). "3.3 Step 2: Byte Conversion". UTF-EBCDIC. Unicode Consortium. Unicode Technical Report #16. The 64 control characters […], the ASCII DELETE character (U+007F)[…] are mapped respecting EBCDIC conventions, as defined in IBM Character Data Representation Architecture, CDRA, with one exception -- the pairing of EBCDIC Line Feed and New Line control characters are swapped from their CDRA default pairings to ISO/IEC 6429 Line Feed (U+000A) and Next Line (U+0085) control characters
  48. ^ Steele, Shawn (1996-04-24). cp037_IBMUSCanada to Unicode table. Microsoft/Unicode Consortium.
  49. ^ a b "23.1: Control Codes" (PDF). The Unicode Standard (12.0.0 ed.). Unicode Consortium. 2019. pp. 868–870. ISBN 978-1-936213-22-1.
  • The Unicode Standard
    • C0 Controls and Basic Latin
    • C1 Controls and Latin-1 Supplement
    • Control Pictures
    • The Unicode Standard, Version 6.1.0, Chapter 16: Special Areas and Format Characters
  • ATIS Telecom Glossary 2007
  • De litteris regentibus C1 quaestiones septem or Are C1 characters legal in XHTML 1.0?
  • W3C I18N FAQ: HTML, XHTML, XML and Control Codes
  • International register of coded character sets to be used with escape sequences

control, codes, redirects, here, other, uses, disambiguation, group, separator, redirects, here, delimiter, used, group, digits, large, numbers, decimal, separator, digit, grouping, this, article, needs, additional, citations, verification, please, help, impro. DC1 redirects here For other uses see DC1 disambiguation Group separator redirects here For the delimiter used to group digits in large numbers see Decimal separator Digit grouping This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources C0 and C1 control codes news newspapers books scholar JSTOR March 2010 Learn how and when to remove this template message The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII The codes represent additional information about the text such as the position of a cursor an instruction to start a new line or a message that the text has been received C0 codes are the range 00HEX 1FHEX and the default C0 set was originally defined in ISO 646 ASCII C1 codes are the range 80HEX 9FHEX and the default C1 set was originally defined in ECMA 48 harmonized later with ISO 6429 The ISO IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications but they are rarely used Contents 1 C0 controls 2 C1 controls 3 Other control code sets 3 1 Other C0 control code sets 3 2 Replacement C1 character sets 4 Unicode 5 See also 6 Footnotes 7 ReferencesC0 controls editASCII defined 32 control characters plus a necessary extra character for the DEL character 7FHEX or 01111111BIN needed to punch out all the holes on a paper tape and erase it This large number of codes was desirable at the time as multi byte controls would require implementation of a state machine in the terminal which was very difficult with contemporary electronics and mechanical terminals Only a few codes have maintained their use BEL ESC and the Format Effector FEn characters BS TAB LF VT FF and CR Others are unused or have acquired different meanings such as NUL being the C string terminator Some data transfer protocols such as ANPA 1312 Kermit and XMODEM do make extensive use of SOH STX ETX EOT ACK NAK and SYN for purposes approximating their original definitions and some file formats use the Information Separators ISn such as the Unix info format 1 and Python s splitlines string method 2 The names of some codes were changed in ISO 6429 1992 or ECMA 48 1991 to be neutral with respect to writing direction The abbreviations used were not changed as the standard had already specified that those would remain unchanged when the standard is translated to other languages In this table both new and old names are shown for the renamed controls the old name is the one matching the abbreviation ASCII control codes originally defined in ANSI X3 4 3 Caret notation Decimal Hexadecimal Abbreviations Symbol Name C escape Description 0 00 NUL Null 0 Does nothing The code of blank paper tape and also used for padding to slow transmission A 1 01 TC1 SOH Start of Heading First character of the heading of a message 4 B 2 02 TC2 STX Start of Text Terminates the header and starts the message text C 3 03 TC3 ETX End of Text Ends the message text starts a footer up to the next TC character 4 5 D 4 04 TC4 EOT End of Transmission Ends the transmission of one or more messages 4 5 May place terminals on standby 5 E 5 05 TC5 ENQ WRU a Enquiry Trigger a response at the receiving end to see if it is still present F 6 06 TC6 ACK Acknowledge Indication of successful receipt of a message G 7 07 BEL b Bell Alert a Call for attention from an operator H 8 08 FE0 BS Backspace b Move one position leftwards Next character may overprint or replace the character that was there I 9 09 FE1 HT Character Tabulation Horizontal Tabulation t Move right to the next tab stop J 10 0A FE2 LF Line Feed n Move down to the same position on the next line some devices also moved to the left column K 11 0B FE3 VT Line Tabulation Vertical Tabulation v Move down to the next vertical tab stop L 12 0C FE4 FF Form Feed f Move down to the top of the next page M 13 0D FE5 CR Carriage Return r Move to column zero while staying on the same line N 14 0E SO LS0 c Shift Out Switch to an alternative character set O 15 0F SI LS1 c Shift In Return to regular character set after SO P 16 10 TC7 DC0 d DLE Data Link Escape Cause a limited number of contiguously following characters to be interpreted in some different way 14 15 Q 17 11 DC1 XON Device Control One Turn on DC1 and DC2 or off DC3 and DC4 devices Teletype 6 used these for the paper tape reader and the paper tape punch The first use became the de facto standard for software flow control 16 R 18 12 DC2 TAPE Device Control Two S 19 13 DC3 XOFF Device Control Three T 20 14 DC4 TAPE Device Control Four U 21 15 TC8 NAK Negative Acknowledge Negative response to a sender such as a detected error V 22 16 TC9 SYN Synchronous Idle Sent in synchronous transmission systems when no other character is being transmitted W 23 17 TC10 ETB End of Transmission Block End of a transmission block of data when data are divided into such blocks for transmission purposes X 24 18 CAN Cancel Indicates that the data preceding it are in error or are to be disregarded Y 25 19 EM End of medium Indicates on paper or magnetic tapes that the end of the usable portion of the tape had been reached 3 Z 26 1A SUB Substitute Replaces a character that was found to be invalid or in error Should be ignored 27 1B ESC Escape e e Alters the meaning of a limited number of following bytes Nowadays this is almost always used to introduce an ANSI escape sequence 28 1C IS4 FS File Separator Can be used as delimiters to mark fields of data structures US is the lowest level while RS GS and FS are of increasing level to divide groups made up of items of the level beneath it SP space could be considered an even lower level 29 1D IS3 GS Group Separator 30 1E IS2 RS Record Separator 31 1F IS1 US Unit SeparatorWhile not technically part of the C0 control character range the following two characters can be thought of as having some characteristics of control characters 32 20 SP Space Move right one character position 127 7F DEL Delete Should be ignored Used to delete characters on punched tape by punching out all the holes Teletype labelled the key WRU for who are you 6 The name BELL is assigned by Unicode to the unrelated emoji character U 1F514 While C0 and C1 control characters were not formally named by the Unicode standard itself at the time this collided with existing use of BELL as the name of this control character in software following the previous versions of UTS 18 the Unicode Regular Expressions standard 7 e g in Perl 8 Unicode now accepts ALERT and BEL but not BELL as formal aliases for the control character 9 although the code chart still lists BELL as the ISO 6429 alias 10 and the corresponding control picture code point is called SYMBOL FOR BELL Perl subsequently switched to using BELL for the emoji in version 5 18 11 a b ISO IEC 2022 ECMA 35 refers to these as LS0 and LS1 in 8 bit environments and as SI and SO in 7 bit environments 12 The first 1963 edition of ASCII classified DLE as a device control rather than a transmission control and gave it the abbreviation DC0 device control reserved for data link escape 13 The e escape sequence is not part of ISO C and many other language specifications However it is understood by several compilers including GCC C1 controls editIn 1973 ECMA 35 and ISO 2022 17 attempted to define a method so an 8 bit extended ASCII code could be converted to a corresponding 7 bit code and vice versa 18 In a 7 bit environment the Shift Out SO would change the meaning of the 96 bytes 0x20 through 0x7F a 20 i e all but the C0 control codes to be the characters that an 8 bit environment would print if it used the same code with the high bit set This meant that the range 0x80 through 0x9F could not be printed in a 7 bit environment 18 thus it was decided that no alternative character set could use them and that these codes should be additional control codes which become known as the C1 control codes To allow a 7 bit environment to use these new controls the sequences ESC through ESC were to be considered equivalent 18 The later ISO 8859 standards abandoned support for 7 bit codes but preserved this range of control characters The first C1 control code set to be registered for use with ISO 2022 was DIN 31626 21 a specialised set for bibliographic use which was registered in 1979 22 The more common general use ISO IEC 6429 set was registered in 1983 23 although the ECMA 48 specification upon which it was based had been first published in 1976 24 and JIS X 0211 formerly JIS C 6323 25 Symbolic names defined by RFC 1345 and early drafts of ISO 10646 but not in ISO IEC 6429 PAD HOP and SGC are also used 8 26 Except for SS2 and SS3 in EUC JP text and NEL in text transcoded from EBCDIC the 8 bit forms of these codes were almost never used CSI DCS and OSC are used to control text terminals and terminal emulators but almost always by using their 7 bit escape code representations Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of Windows 1252 or Mac OS Roman ISO IEC 6429 and RFC 1345 C1 control codes ESC Decimal Hex Abbr Name Description 27 128 80 PAD 9 Padding Character b Proposed as a padding or high byte for single byte characters to make them two bytes long for easier interoperability with multiple byte characters Extended Unix Code EUC occasionally uses this 28 A 129 81 HOP 9 High Octet Preset b Proposed to set the high byte of a sequence of multiple byte characters so they only need one byte each as a simple form of data compression B 130 82 BPH Break Permitted Here c Follows a graphic character where a line break is permitted Roughly equivalent to a soft hyphen or zero width space except it does not define what is printed at the line break C 131 83 NBH No Break Here c Follows the graphic character that is not to be broken See also word joiner D 132 84 IND Index d Move down one line without moving horzontally to eliminate ambiguity about the meaning of LF E 133 85 NEL Next Line Equivalent to CR LF to match the EBCDIC control character F 134 86 SSA Start of Selected Area Used by block oriented terminals In xterm ESC F moves to the lower left corner of the screen since certain software assumes this behaviour 29 G 135 87 ESA End of Selected AreaH 136 88 HTS Character Tabulation SetHorizontal Tabulation Set Set a tab stop at the current position I 137 89 HTJ Character Tabulation With JustificationHorizontal Tabulation With Justification Right justify the text since the last tab against the next tab stop J 138 8A VTS Line Tabulation SetVertical Tabulation Set Set a vertical tab stop K 139 8B PLD Partial Line ForwardPartial Line Down To produce subscripts and superscripts in ISO IEC 6429 Subscripts use PLD i text i PLU while superscripts use PLU i text i PLD L 140 8C PLU Partial Line BackwardPartial Line UpM 141 8D RI Reverse Line FeedReverse Index Move up one line N 142 8E SS2 Single Shift 2 Next character is from the G2 or G3 sets respectively O 143 8F SS3 Single Shift 3P 144 90 DCS Device Control String Followed by a string of printable characters 0x20 through 0x7E and format effectors 0x08 through 0x0D terminated by ST 0x9C Xterm defined a number of these 30 Q 145 91 PU1 Private Use 1 Reserved for private function agreed on between the sender and the recipient of the data R 146 92 PU2 Private Use 2S 147 93 STS Set Transmit StateT 148 94 CCH Cancel character Destructive backspace to eliminate ambiguity about meaning of BS U 149 95 MW Message WaitingV 150 96 SPA Start of Protected Area Used by block oriented terminals W 151 97 EPA End of Protected AreaX 152 98 SOS Start of String c Followed by a control string terminated by ST 0x9C which unlike DCS OSC PM or APC may contain any character except SOS or ST Y 153 99 SGC 9 SGCI 31 Single Graphic Character Introducer b Intended to allow an arbitrary Unicode character to be printed it would be followed by that character most likely encoded in UTF 1 31 Z 154 9A SCI Single Character Introducer c To be followed by a single printable character 0x20 through 0x7E or format effector 0x08 through 0x0D and to print it as ASCII no matter what graphic or control sets were in use 155 9B CSI Control Sequence Introducer Used to introduce control sequences that take parameters Used for ANSI escape sequences 156 9C ST String Terminator Terminates a string started by DCS SOS OSC PM or APC 157 9D OSC Operating System Command Followed by a string of printable characters 0x20 through 0x7E and format effectors 0x08 through 0x0D terminated by ST 0x9C intended for use to allow in band signaling of protocol information but rarely used for that purpose Some terminal emulators including xterm use OSC sequences for setting the window title and changing the colour palette They may also support terminating an OSC sequence with BEL instead of ST 32 Kermit used APC to transmit commands 33 158 9E PM Privacy Message 159 9F APC Application Program Command In early versions the range excluded SP and DEL 19 a b c Not part of ISO IEC 6429 ECMA 48 8 26 a b c d Not part of the first edition of ISO IEC 6429 23 Deprecated in 1988 and withdrawn in 1992 from ISO IEC 6429 1986 and 1991 respectively for ECMA 48 citation needed Other control code sets editThe ISO IEC 2022 ECMA 35 extension mechanism allowed escape sequences to change the C0 and C1 sets The standard C0 control character set shown above is chosen with the sequence ESC and the above C1 set chosen with the sequence ESC C 23 Several official and unofficial alternatives have been defined but this is pretty much obsolete Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability The standard makes ESC 34 35 SP and DEL a fixed coded characters which are available in their ASCII locations in all encodings that conform to the standard 37 It also specifies that if a C0 set included transmission control TCn codes they must be encoded at their ASCII locations 34 and could not be put in a C1 set 38 and any new transmission controls must be in a C1 set 34 Other C0 control code sets edit ANPA 1312 a text markup language used for news transmission replaces several C0 control characters IPTC 7901 the newer international version of the above has its own variations Videotex has a completely different set Teletext also defines a set similar to Videotex T 61 T 51 39 and others 40 replaced EM and GS with SS2 and SS3 so these functions could be used in a 7 bit environment Some sets replaced FS with SS2 41 same as ANPA 1312 The now withdrawn JIS C 6225 designated JIS X 0207 in later sources 42 replaced FS with CEX or Control Extension 43 which introduces control sequences for vertical text behaviour superscripts and subscripts 44 and for transmitting custom character graphics 42 Replacement C1 character sets edit A specialized C1 control code set is registered for bibliographic use including string collation such as by MARC 8 22 45 46 Various specialised C1 control code sets are registered for use by Videotex formats 21 EBCDIC defines up to 29 additional control codes besides those present in ASCII When translating EBCDIC to Unicode or to ISO 8859 these codes are mapped to C1 control characters in a manner specified by IBM s Character Data Representation Architecture CDRA 47 48 Although the New Line NL does translate to the ISO IEC 6429 NEL although it is often swapped with LF following UNIX line ending convention 47 the remainder of the control codes do not correspond For example the EBCDIC control SPS and the ECMA 48 control PLU are both used to begin a superscript or end a subscript but are not mapped to one another Extended ASCII mapped EBCDIC can therefore be regarded as having its own C1 set although it is not registered with the ISO IR registry for ISO IEC 2022 21 Unicode editMain article Unicode control characters Unicode inherits its first 256 code points from ISO 8859 1 hence also the 65 code points described above giving them the general category Cc control These are U 0000 U 001F C0 controls and U 007F DEL assigned to the Basic Latin block and U 0080 U 009F C1 controls assigned to the Latin 1 Supplement block Unicode only specifies semantics for the C0 format controls HT LF VT FF and CR note BS is missing the C0 information separators FS GS RS US and SP and the C1 control NEL 49 The rest of the codes are transparent to Unicode and their meanings are left to higher level protocols with ISO IEC 6429 suggested as a default 49 Unicode includes many additional format effector characters besides these such as marks embeds isolates and pops for explicit bidirectional formatting and the zero width joiner and non joiner for controlling ligature use However these are given the general category Cf format rather than Cc See also editControl Pictures ANSI escape codeFootnotes edit ISO IEC 4873 extends this requirement to the C1 SS2 and SS3 36 although ISO IEC 2022 itself does not References edit Fox Brian Adding a new node to Info Info The online menu driven GNU documentation system GNU Project Built in Types str splitlines The Python Standard Library Python Software Foundation a b ISO TC 97 SC 2 1975 The set of control characters of the ISO 646 PDF ITSCJ IPSJ ISO IR 1 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link a b c IPTC 1995 The IPTC Recommended Message Format PDF 5th ed IPTC TEC 7901 a b c end of transmission character EOT Federal Standard 1037C 1996 Archived from the original on 2016 03 09 a b Robert McConnell James Haynes Richard Warren December 2002 Understanding ASCII Codes NADCOMM Williamson Karl Re PRI 202 Extensions to NameAliases txt for Unicode 6 1 0 a b c Ken Whistler July 20 2011 Formal Name Aliases for Control Characters L2 11 281 Unicode Consortium a b c d Name Aliases Unicode Character Database Unicode Consortium C0 Controls and Basic Latin PDF Unicode Consortium charnames Perl Programming Documentation ECMA 1994 7 3 Invocation of character set code elements Character Code Structure and Extension Techniques PDF ECMA Standard 6th ed p 14 ECMA 35 American Standards Association 1963 American Standard Code for Information Interchange 4 Legend p 6 ASA X3 4 1963 data link escape character DLE Federal Standard 1037C 1996 Archived from the original on 2016 08 01 Supplementary transmission control functions an extension of the basic mode control procedures for data communication systems European Computer Manufacturers Association 1972 ECMA 37 What is the point of Ctrl S Unix and Linux Stack exchange Retrieved 14 February 2019 ECMA TC 1 1973 Brief History 7 bit Input Output Coded Character Set PDF 4th ed ECMA ECMA 6 1973 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link a b c ECMA TC 1 1971 8 2 Correspondence between the 7 bit Code and an 8 bit Code Extension of the 7 bit Coded Character Set PDF 1st ed ECMA pp 21 24 ECMA 35 1971 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link ECMA TC 1 1973 4 2 Specific Control Characters 7 bit Input Output Coded Character Set PDF 4th ed ECMA p 16 ECMA 6 1973 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link ECMA TC 1 1985 5 3 8 Sets of 96 graphic characters Code Extension Techniques PDF 4th ed ECMA pp 17 18 ECMA 35 1985 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link a b c ISO IEC International Register of Coded Character Sets To Be Used With Escape Sequences PDF ITSCJ IPSJ ISO IR a b DIN 1979 07 15 Additional Control Codes for Bibliographic Use according to German Standard DIN 31626 PDF ITSCJ IPSJ ISO IR 40 a b c ISO TC97 SC2 1983 10 01 C1 Control Set of ISO 6429 1983 PDF ITSCJ IPSJ ISO IR 77 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link ECMA TC 1 1979 Brief History Additional Control Functions for Character Imaging I O Devices PDF 2nd ed ECMA ECMA 48 1979 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link JIS X 02xx 符号 in Japanese a b Ken Whistler 2015 10 05 Why Nothing Ever Goes Away Unicode Mailing List ECMA 1991 Control Functions for Coded Character Sets Standard ECMA 48 Lunde Ken 2008 CJKV Information Processing Chinese Japanese Korean and Vietnamese Computing O Reilly p 244 ISBN 9780596800925 VT100 Widget Resources hpLowerleftBugCompat xterm terminal emulator for X Moy Edward Gildea Stephen Dickey Thomas Device Control functions XTerm Control Sequences a b Brender Ronald F 1989 Ada 9x Project Report Character Set Issues for Ada 9x Carnegie Mellon University Moy Edward Gildea Stephen Dickey Thomas Operating System Commands XTerm Control Sequences Frank da Cruz Christine Gianone 1997 Using C Kermit Digital Press p 278 ISBN 978 1 55558 164 0 a b c ECMA 1994 6 4 2 Primary sets of coded control functions Character Code Structure and Extension Techniques PDF ECMA Standard 6th ed p 11 ECMA 35 ISO TC97 SC2 WG 7 ECMA 1985 08 01 Minimum C0 set for ISO 4873 PDF ITSCJ IPSJ ISO IR 104 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link ISO TC97 SC2 WG 7 ECMA 1985 08 01 Minimum C1 Set for ISO 4873 PDF ITSCJ IPSJ ISO IR 105 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link ECMA 1994 6 2 Fixed coded characters Character Code Structure and Extension Techniques PDF ECMA Standard 6th ed p 7 ECMA 35 ECMA 1994 6 4 3 Supplementary sets of coded control functions Character Code Structure and Extension Techniques PDF ECMA Standard 6th ed p 11 ECMA 35 ITU 1985 Teletex Primary Set of Control Functions PDF ITSCJ IPSJ ISO IR 106 Urad pro normalizaci a mereni 1987 The set of control characters of ISO 646 with EM replaced by SS2 PDF ITSCJ IPSJ ISO IR 140 ISO TC 97 SC 2 1977 The set of control characters of ISO 646 with IS4 replaced by Single Shift for G2 SS2 PDF ITSCJ IPSJ ISO IR 36 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link a b ISO TC97 SC2 WG6 Liaison statement to ISO TC97 SC2 WG8 and ISO TC97 SC18 WG8 PDF ISO TC97 SC2 WG6 N317 rev Archived from the original PDF on 2020 10 26 a href Template Cite web html title Template Cite web cite web a CS1 maint numeric names authors list link ISO TC 97 SC 2 1982 The C0 set of Control Characters of Japanese Standard JIS C 6225 1979 PDF ITSCJ IPSJ ISO IR 74 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link Printronix 2012 OKI Programmer s Reference Manual PDF p 26 ISO TC 46 1983 06 01 Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 PDF ITSCJ IPSJ ISO IR 67 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link ISO TC 46 1986 02 01 Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 PDF ITSCJ IPSJ ISO IR 124 a href Template Citation html title Template Citation citation a CS1 maint numeric names authors list link a b Umamaheswaran V S 1999 11 08 3 3 Step 2 Byte Conversion UTF EBCDIC Unicode Consortium Unicode Technical Report 16 The 64 control characters the ASCII DELETE character U 007F are mapped respecting EBCDIC conventions as defined in IBM Character Data Representation Architecture CDRA with one exception the pairing of EBCDIC Line Feed and New Line control characters are swapped from their CDRA default pairings to ISO IEC 6429 Line Feed U 000A and Next Line U 0085 control characters Steele Shawn 1996 04 24 cp037 IBMUSCanada to Unicode table Microsoft Unicode Consortium a b 23 1 Control Codes PDF The Unicode Standard 12 0 0 ed Unicode Consortium 2019 pp 868 870 ISBN 978 1 936213 22 1 The Unicode Standard C0 Controls and Basic Latin C1 Controls and Latin 1 Supplement Control Pictures The Unicode Standard Version 6 1 0 Chapter 16 Special Areas and Format Characters ATIS Telecom Glossary 2007 De litteris regentibus C1 quaestiones septem or Are C1 characters legal in XHTML 1 0 W3C I18N FAQ HTML XHTML XML and Control Codes International register of coded character sets to be used with escape sequences Retrieved from https en wikipedia org w index php title C0 and C1 control codes amp oldid 1205083220 Device control, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.