fbpx
Wikipedia

Substitute character

In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unrepresentable on a given device. It is also used as an escape sequence in some programming languages.

In the ASCII character set, this character is encoded by the number 26 (1A hex). Standard keyboards transmit this code when the Ctrl and Z keys are pressed simultaneously (Ctrl+Z, often documented by convention as ^Z).[1] Unicode inherits this character from ASCII, but recommends that the replacement character (�, U+FFFD) be used instead to represent un-decodable inputs, when the output encoding is compatible with it.

Uses Edit

End of file Edit

Historically, under PDP-6 monitor,[2] RT-11, VMS, and TOPS-10,[3] and in early PC CP/M 1 and 2 operating systems (and derivatives like MP/M) it was necessary to explicitly mark the end of a file (EOF) because the native filesystem could not record the exact file size by itself; files were allocated in extents (records) of a fixed size, typically leaving some allocated but unused space at the end of each file.[4][5][6][7] This extra space was filled with 1A16 (hex) characters under CP/M. The extended CP/M filesystems used by CP/M 3 and higher (and derivatives like Concurrent CP/M, Concurrent DOS, and DOS Plus) did support byte-granular files,[8][9] so this was no longer a requirement, but it remained as a convention (especially for text files) in order to ensure backward compatibility.

In CP/M, 86-DOS, MS-DOS, PC DOS, DR-DOS, and their various derivatives, the SUB character was also used to indicate the end of a character stream, and thereby used to terminate user input in an interactive command line window (and as such, often used to finish console input redirection, e.g. as instigated by the command COPY CON: TYPEDTXT.TXT).

While no longer technically required to indicate the end of a file, as of 2017 many text editors and program languages still support this convention, or can be configured to insert this character at the end of a file when editing, or at least properly cope with them in text files.[citation needed] In such cases, it is often termed a "soft" EOF, as it does not necessarily represent the physical end of the file, but is more a marker indicating that "there is no useful data beyond this point". In reality, more data may exist beyond this character up to the actual end of the data in the file system, thus it can be used to hide file content when the file is entered at the console or opened in editors. Many file format standards (e.g. PNG or GIF) include the SUB character in their headers to perform precisely this function. Some modern text file formats (e.g. CSV-1203[10]) still recommend a trailing EOF character to be appended as the last character in the file. However, typing Control+Z does not embed an EOF character into a file in either DOS or Windows, nor do the APIs of those systems use the character to denote the actual end of a file.

Some programming languages (e.g. Visual Basic) will not read past a "soft" EOF when using the built-in text file reading primitives (INPUT, LINE INPUT etc.), and alternate methods must be adopted, e.g. opening the file in binary mode or using the File System Object to progress beyond it.

Character 26 was used to mark "End of file" even though ASCII calls this character Substitute, and has other characters to indicate "End of file". Number 28 which is called "File Separator" has also been used for similar purposes.

Other uses Edit

In Unix-like operating systems, this character is typically used in shells as a way for the user to suspend the currently executing interactive process.[11] The suspended process can then be resumed in foreground (interactive) mode, or be made to resume execution in background mode, or be terminated. When entered by a user at their computer terminal, the currently running foreground process is sent a "terminal stop" (SIGTSTP) signal, which generally causes the process to suspend its execution. The user can later continue the process execution by using the "foreground" command (fg) or the "background" command (bg).

The Unicode Security Considerations report[12] recommends this character as a safe replacement for unmappable characters during character set conversion.

In many GUIs and applications, Control+Z (⌘ Command+Z on macOS) can be used to undo the last action. In many applications, earlier actions than the last one can also be undone by pressing Control+Z multiple times. Control+Z was one of a handful of keyboard sequences chosen by the program designers at Xerox PARC to control text editing.

Representation Edit

ASCII and Unicode representation of "substitute":

  • Octal code: 32
  • Decimal code: 26
  • Hexadecimal code: 1A, U+001A
  • Mnemonic symbol: SUB
  • Binary value: 11010

See also Edit

References Edit

  1. ^ "Keyboard shortcuts for Windows". Microsoft Support. Microsoft. Retrieved 2012-06-02.
  2. ^ "Table of IO Device Characteristics - Console or Teletypewriters". PDP-6 Multiprogramming System Manual (PDF). Maynard, Massachusetts, USA: Digital Equipment Corporation (DEC). 1965. p. 43. DEC-6-0-EX-SYS-UM-IP-PRE00. (PDF) from the original on 2014-07-14. Retrieved 2014-07-10. (1+84+10 pages)
  3. ^ "5.1.1.1. Device Dependent Functions - Data Modes - Full-Duplex Software A(ASCII) and AL(ASCII Line)". PDP-10 Reference Handbook: Communicating with the Monitor - Time-Sharing Monitors (PDF). Vol. 3. Digital Equipment Corporation (DEC). 1969. pp. 5-3 – 5-6 [5-5 (431)]. (PDF) from the original on 2011-11-15. Retrieved 2014-07-10. (207 pages)
  4. ^ Elliott, John C. (1998). "CP/M 1.4 disc formats". from the original on 2020-11-14. Retrieved 2021-11-18.
  5. ^ Elliott, John C. (1998). "CP/M 2.2 disc formats". from the original on 2020-11-05. Retrieved 2021-11-18.
  6. ^ "2. Operating System Call Conventions". CP/M 2.0 Interface Guide (PDF) (1 ed.). Pacific Grove, California, USA: Digital Research. 1979. p. 5. (PDF) from the original on 2020-02-28. Retrieved 2020-02-28. [...] The end of an ASCII file is denoted by a control-Z character (1AH) or a real end of file, returned by the CP/M read operation. Control-Z characters embedded within machine code files (e.g., COM files) are ignored, however, and the end of file condition returned by CP/M is used to terminate read operations. [...] (56 pages)
  7. ^ Hogan, Thom (1982). "3. CP/M Transient Commands". Osborne CP/M User Guide - For All CP/M Users (2 ed.). Berkeley, California, USA: A. Osborne/McGraw-Hill. p. 74. ISBN 0-931988-82-9. Retrieved 2020-02-28. [...] CP/M marks the end of an ASCII file by placing a CONTROL-z character in the file after the last data character. If the file contains an exact multiple of 128 characters, in which case adding the CONTROL-Z would waste 127 characters, CP/M does not do so. Use of the CONTROL-Z character as the end-of-file marker is possible because CONTROL-z is seldom used as data in ASCII files. In a non-ASCII file, however, CONTROL-Z is just as likely to occur as any other character. Therefore, it cannot be used as the end-of-file marker. CP/M uses a different method to mark the end of a non-ASCII file. CP/M assumes it has reached the end of the file when it has read the last record (basic unit of disk space) allocated to the file. The disk directory entry for each file contains a list of the disk records allocated to that file. This method relies on the size of the file, rather than its content, to locate the end of the file. [...] [1][2]
  8. ^ Elliott, John C. (1998). "CP/M 3.1 disc formats". from the original on 2021-10-26. Retrieved 2021-11-18.
  9. ^ Elliott, John C. (1998). "CP/M 4.1 disc formats". from the original on 2020-11-05. Retrieved 2021-11-18.
  10. ^ CSV-1203 format specification Archived 2016-05-16 at the Portuguese Web Archive
  11. ^ "Quick Reference: Unix Commands". IT Connect. University of Washington. Retrieved 2012-06-02.
  12. ^ Unicode Security Considerations report

Further reading Edit

substitute, character, redirects, here, arabic, question, mark, rhetorical, question, irony, mark, ctrl, redirects, here, undo, function, undo, computer, data, substitute, character, control, character, that, used, transmitted, data, order, send, blocks, fixed. redirects here For the Arabic question mark see For the rhetorical question or irony mark see Ctrl Z redirects here For for the undo function see Undo In computer data a substitute character is a control character that is used to pad transmitted data in order to send it in blocks of fixed size or to stand in place of a character that is recognized to be invalid erroneous or unrepresentable on a given device It is also used as an escape sequence in some programming languages In the ASCII character set this character is encoded by the number 26 1A hex Standard keyboards transmit this code when the Ctrl and Z keys are pressed simultaneously Ctrl Z often documented by convention as Z 1 Unicode inherits this character from ASCII but recommends that the replacement character U FFFD be used instead to represent un decodable inputs when the output encoding is compatible with it Contents 1 Uses 1 1 End of file 1 2 Other uses 2 Representation 3 See also 4 References 5 Further readingUses EditEnd of file Edit Main article End of file Historically under PDP 6 monitor 2 RT 11 VMS and TOPS 10 3 and in early PC CP M 1 and 2 operating systems and derivatives like MP M it was necessary to explicitly mark the end of a file EOF because the native filesystem could not record the exact file size by itself files were allocated in extents records of a fixed size typically leaving some allocated but unused space at the end of each file 4 5 6 7 This extra space was filled with 1A 16 hex characters under CP M The extended CP M filesystems used by CP M 3 and higher and derivatives like Concurrent CP M Concurrent DOS and DOS Plus did support byte granular files 8 9 so this was no longer a requirement but it remained as a convention especially for text files in order to ensure backward compatibility In CP M 86 DOS MS DOS PC DOS DR DOS and their various derivatives the SUB character was also used to indicate the end of a character stream and thereby used to terminate user input in an interactive command line window and as such often used to finish console input redirection e g as instigated by the command span class k COPY span CON TYPEDTXT TXT While no longer technically required to indicate the end of a file as of 2017 many text editors and program languages still support this convention or can be configured to insert this character at the end of a file when editing or at least properly cope with them in text files citation needed In such cases it is often termed a soft EOF as it does not necessarily represent the physical end of the file but is more a marker indicating that there is no useful data beyond this point In reality more data may exist beyond this character up to the actual end of the data in the file system thus it can be used to hide file content when the file is entered at the console or opened in editors Many file format standards e g PNG or GIF include the SUB character in their headers to perform precisely this function Some modern text file formats e g CSV 1203 10 still recommend a trailing EOF character to be appended as the last character in the file However typing Control Z does not embed an EOF character into a file in either DOS or Windows nor do the APIs of those systems use the character to denote the actual end of a file Some programming languages e g Visual Basic will not read past a soft EOF when using the built in text file reading primitives INPUT LINE INPUT etc and alternate methods must be adopted e g opening the file in binary mode or using the File System Object to progress beyond it Character 26 was used to mark End of file even though ASCII calls this character Substitute and has other characters to indicate End of file Number 28 which is called File Separator has also been used for similar purposes Other uses Edit In Unix like operating systems this character is typically used in shells as a way for the user to suspend the currently executing interactive process 11 The suspended process can then be resumed in foreground interactive mode or be made to resume execution in background mode or be terminated When entered by a user at their computer terminal the currently running foreground process is sent a terminal stop SIGTSTP signal which generally causes the process to suspend its execution The user can later continue the process execution by using the foreground command a href Fg Unix html class mw redirect title Fg Unix fg a or the background command a href Bg Unix html class mw redirect title Bg Unix bg a The Unicode Security Considerations report 12 recommends this character as a safe replacement for unmappable characters during character set conversion In many GUIs and applications Control Z Command Z on macOS can be used to undo the last action In many applications earlier actions than the last one can also be undone by pressing Control Z multiple times Control Z was one of a handful of keyboard sequences chosen by the program designers at Xerox PARC to control text editing Representation EditASCII and Unicode representation of substitute Octal code 32 Decimal code 26 Hexadecimal code 1A U 001A Mnemonic symbol SUB Binary value 11010See also EditC0 and C1 control codes ISO 646 U FFFD Unicode replacement character Access key Control C Control G Control V Control X Control Keyboard shortcut List of file signatures notdef a symbol sometimes called by the slang term tofu used to represent a missing character Noto fonts a Google project to eliminate missing charactersReferences Edit Keyboard shortcuts for Windows Microsoft Support Microsoft Retrieved 2012 06 02 Table of IO Device Characteristics Console or Teletypewriters PDP 6 Multiprogramming System Manual PDF Maynard Massachusetts USA Digital Equipment Corporation DEC 1965 p 43 DEC 6 0 EX SYS UM IP PRE00 Archived PDF from the original on 2014 07 14 Retrieved 2014 07 10 1 84 10 pages 5 1 1 1 Device Dependent Functions Data Modes Full Duplex Software A ASCII and AL ASCII Line PDP 10 Reference Handbook Communicating with the Monitor Time Sharing Monitors PDF Vol 3 Digital Equipment Corporation DEC 1969 pp 5 3 5 6 5 5 431 Archived PDF from the original on 2011 11 15 Retrieved 2014 07 10 207 pages Elliott John C 1998 CP M 1 4 disc formats Archived from the original on 2020 11 14 Retrieved 2021 11 18 Elliott John C 1998 CP M 2 2 disc formats Archived from the original on 2020 11 05 Retrieved 2021 11 18 2 Operating System Call Conventions CP M 2 0 Interface Guide PDF 1 ed Pacific Grove California USA Digital Research 1979 p 5 Archived PDF from the original on 2020 02 28 Retrieved 2020 02 28 The end of an ASCII file is denoted by a control Z character 1AH or a real end of file returned by the CP M read operation Control Z characters embedded within machine code files e g COM files are ignored however and the end of file condition returned by CP M is used to terminate read operations 56 pages Hogan Thom 1982 3 CP M Transient Commands Osborne CP M User Guide For All CP M Users 2 ed Berkeley California USA A Osborne McGraw Hill p 74 ISBN 0 931988 82 9 Retrieved 2020 02 28 CP M marks the end of an ASCII file by placing a CONTROL z character in the file after the last data character If the file contains an exact multiple of 128 characters in which case adding the CONTROL Z would waste 127 characters CP M does not do so Use of the CONTROL Z character as the end of file marker is possible because CONTROL z is seldom used as data in ASCII files In a non ASCII file however CONTROL Z is just as likely to occur as any other character Therefore it cannot be used as the end of file marker CP M uses a different method to mark the end of a non ASCII file CP M assumes it has reached the end of the file when it has read the last record basic unit of disk space allocated to the file The disk directory entry for each file contains a list of the disk records allocated to that file This method relies on the size of the file rather than its content to locate the end of the file 1 2 Elliott John C 1998 CP M 3 1 disc formats Archived from the original on 2021 10 26 Retrieved 2021 11 18 Elliott John C 1998 CP M 4 1 disc formats Archived from the original on 2020 11 05 Retrieved 2021 11 18 CSV 1203 format specification Archived 2016 05 16 at the Portuguese Web Archive Quick Reference Unix Commands IT Connect University of Washington Retrieved 2012 06 02 Unicode Security Considerations reportFurther reading EditFederal Standard 1037C Retrieved from https en wikipedia org w index php title Substitute character amp oldid 1163101480, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.