fbpx
Wikipedia

scanf format string

A scanf format string (scan formatted) is a control parameter used in various functions to specify the layout of an input string. The functions can then divide the string and translate into values of appropriate data types. String scanning functions are often supplied in standard libraries.Scanf is a function that reads formatted data from the standard input string, which is usually the keyboard and writes the results whenever called in the specified arguments.

The term "scanf" comes from the C library, which popularized this type of function, but such functions predate C, and other names are used, such as readf in ALGOL 68. scanf format strings, which provide formatted input (parsing), are complementary to printf format strings, which provide formatted output (templating). These provide simple functionality and fixed format compared to more sophisticated and flexible parsers or template engines, but are sufficient for many purposes.

History Edit

Mike Lesk's portable input/output library, including scanf, officially became part of Unix in Version 7.[1]

Usage Edit

The scanf function, which is found in C, reads input for numbers and other datatypes from standard input (often a command line interface or similar kind of a text user interface).

The following C code reads a variable number of unformatted decimal integers from the standard input stream and prints each of them out on separate lines:

#include <stdio.h> int main(void) {  int n;  while (scanf("%d", &n) == 1)  printf("%d\n", n);  return 0; } 

After being processed by the program above, an irregularly spaced list of integers such as

456 123 789 456 12 456 1 2378 

will appear consistently spaced as:

456 123 789 456 12 456 1 2378 

To print out a word:

#include <stdio.h> int main(void) {  char word[20];  if (scanf("%19s", word) == 1)  puts(word);  return 0; } 

No matter what the data type the programmer wants the program to read, the arguments (such as &n above) must be pointers pointing to memory. Otherwise, the function will not perform correctly because it will be attempting to overwrite the wrong sections of memory, rather than pointing to the memory location of the variable you are attempting to get input for.

In the last example an address-of operator (&) is not used for the argument: as word is the name of an array of char, as such it is (in all contexts in which it evaluates to an address) equivalent to a pointer to the first element of the array. While the expression &word would numerically evaluate to the same value, semantically, it has an entirely different meaning in that it stands for the address of the whole array rather than an element of it. This fact needs to be kept in mind when assigning scanf output to strings.

As scanf is designated to read only from standard input, many programming languages with interfaces, such as PHP, have derivatives such as sscanf and fscanf but not scanf itself.

Format string specifications Edit

The formatting placeholders in scanf are more or less the same as that in printf, its reverse function. As in printf, the POSIX extension n$ is defined.[2]

There are rarely constants (i.e., characters that are not formatting placeholders) in a format string, mainly because a program is usually not designed to read known data, although scanf does accept these if explicitly specified. The exception is one or more whitespace characters, which discards all whitespace characters in the input.[2]

Some of the most commonly used placeholders follow:

  • %a : Scan a floating-point number in its hexadecimal notation.
  • %d : Scan an integer as a signed decimal number.
  • %i : Scan an integer as a signed number. Similar to %d, but interprets the number as hexadecimal when preceded by 0x and octal when preceded by 0. For example, the string 031 would be read as 31 using %d, and 25 using %i. The flag h in %hi indicates conversion to a short and hh conversion to a char.
  • %u : Scan for decimal unsigned int (Note that in the C99 standard the input value minus sign is optional, so if a minus sign is read, no errors will arise and the result will be the two's complement of a negative number, likely a very large value. See strtoul().[failed verification]) Correspondingly, %hu scans for an unsigned short and %hhu for an unsigned char.
  • %f : Scan a floating-point number in normal (fixed-point) notation.
  • %g, %G : Scan a floating-point number in either normal or exponential notation. %g uses lower-case letters and %G uses upper-case.
  • %x, %X : Scan an integer as an unsigned hexadecimal number.
  • %o : Scan an integer as an octal number.
  • %s : Scan a character string. The scan terminates at whitespace. A null character is stored at the end of the string, which means that the buffer supplied must be at least one character longer than the specified input length.
  • %c : Scan a character (char). No null character is added.
  • whitespace: Any whitespace characters trigger a scan for zero or more whitespace characters. The number and type of whitespace characters do not need to match in either direction.
  • %lf : Scan as a double floating-point number. "Float" format with the "long" specifier.
  • %Lf : Scan as a long double floating-point number. "Float" format the "long long" specifier.
  • %n : Nothing is expected. The number of characters consumed thus far from the input is stored through the next pointer, which must be a pointer to int. This is not a conversion and does not increase the count returned by the function.


The above can be used in compound with numeric modifiers and the l, L modifiers which stand for "long" and "long long" in between the percent symbol and the letter. There can also be numeric values between the percent symbol and the letters, preceding the long modifiers if any, that specifies the number of characters to be scanned. An optional asterisk (*) right after the percent symbol denotes that the datum read by this format specifier is not to be stored in a variable. No argument behind the format string should be included for this dropped variable.

The ff modifier in printf is not present in scanf, causing differences between modes of input and output. The ll and hh modifiers are not present in the C90 standard, but are present in the C99 standard.[3]

An example of a format string is

"%7d%s %c%lf"

The above format string scans the first seven characters as a decimal integer, then reads the remaining as a string until a space, newline, or tab is found, then consumes whitespace until the first non-whitespace character is found, then consumes that character, and finally scans the remaining characters as a double. Therefore, a robust program must check whether the scanf call succeeded and take appropriate action. If the input was not in the correct format, the erroneous data will still be on the input stream and must discarded before new input can be read. An alternative method, which avoids this, is to use fgets and then examine the string read in. The last step can be done by sscanf, for example.

In the case of the many float type characters a, e, f, g, many implementations choose to collapse most into the same parser. Microsoft MSVCRT does it with e, f, g,[4] while glibc does so with all four.[2]

Vulnerabilities Edit

scanf is vulnerable to format string attacks. Great care should be taken to ensure that the formatting string includes limitations for string and array sizes. In most cases the input string size from a user is arbitrary and cannot be determined before the scanf function is executed. This means that %s placeholders without length specifiers are inherently insecure and exploitable for buffer overflows. Another potential problem is to allow dynamic formatting strings, for example formatting strings stored in configuration files or other user-controlled files. In this case the allowed input length of string sizes cannot be specified unless the formatting string is checked beforehand and limitations are enforced. Related to this are additional or mismatched formatting placeholders which do not match the actual vararg list. These placeholders might be partially extracted from the stack or contain undesirable or even insecure pointers, depending on the particular implementation of varargs.

See also Edit

References Edit

  1. ^ McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
  2. ^ a b c scanf(3) – Linux Programmer's Manual – Library Functions
  3. ^ C99 standard, §7.19.6.2 "The fscanf function" alinea 11.
  4. ^ "scanf Type Field Characters". docs.microsoft.com.

External links Edit

scanf, format, string, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, scanf, format, string, news, newspapers, book. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Scanf format string news newspapers books scholar JSTOR May 2010 Learn how and when to remove this template message A scanf format string scan formatted is a control parameter used in various functions to specify the layout of an input string The functions can then divide the string and translate into values of appropriate data types String scanning functions are often supplied in standard libraries Scanf is a function that reads formatted data from the standard input string which is usually the keyboard and writes the results whenever called in the specified arguments The term scanf comes from the C library which popularized this type of function but such functions predate C and other names are used such as readf in ALGOL 68 scanf format strings which provide formatted input parsing are complementary to printf format strings which provide formatted output templating These provide simple functionality and fixed format compared to more sophisticated and flexible parsers or template engines but are sufficient for many purposes Contents 1 History 2 Usage 3 Format string specifications 4 Vulnerabilities 5 See also 6 References 7 External linksHistory EditMike Lesk s portable input output library including scanf officially became part of Unix in Version 7 1 Usage EditThe scanf function which is found in C reads input for numbers and other datatypes from standard input often a command line interface or similar kind of a text user interface The following C code reads a variable number of unformatted decimal integers from the standard input stream and prints each of them out on separate lines include lt stdio h gt int main void int n while scanf d amp n 1 printf d n n return 0 After being processed by the program above an irregularly spaced list of integers such as 456 123 789 456 12 456 1 2378 will appear consistently spaced as 456 123 789 456 12 456 1 2378 To print out a word include lt stdio h gt int main void char word 20 if scanf 19s word 1 puts word return 0 No matter what the data type the programmer wants the program to read the arguments such as amp n above must be pointers pointing to memory Otherwise the function will not perform correctly because it will be attempting to overwrite the wrong sections of memory rather than pointing to the memory location of the variable you are attempting to get input for In the last example an address of operator amp is not used for the argument as word is the name of an array of char as such it is in all contexts in which it evaluates to an address equivalent to a pointer to the first element of the array While the expression amp word would numerically evaluate to the same value semantically it has an entirely different meaning in that it stands for the address of the whole array rather than an element of it This fact needs to be kept in mind when assigning scanf output to strings As scanf is designated to read only from standard input many programming languages with interfaces such as PHP have derivatives such as sscanf and fscanf but not scanf itself Format string specifications EditThe formatting placeholders in scanf are more or less the same as that in printf its reverse function As in printf the POSIX extension n is defined 2 There are rarely constants i e characters that are not formatting placeholders in a format string mainly because a program is usually not designed to read known data although scanf does accept these if explicitly specified The exception is one or more whitespace characters which discards all whitespace characters in the input 2 Some of the most commonly used placeholders follow a Scan a floating point number in its hexadecimal notation d Scan an integer as a signed decimal number i Scan an integer as a signed number Similar to d but interprets the number as hexadecimal when preceded by 0x and octal when preceded by 0 For example the string 031 would be read as 31 using d and 25 using i The flag h in hi indicates conversion to a short and hh conversion to a char u Scan for decimal unsigned int Note that in the C99 standard the input value minus sign is optional so if a minus sign is read no errors will arise and the result will be the two s complement of a negative number likely a very large value See a href Strtoul html class mw redirect title Strtoul strtoul a failed verification Correspondingly hu scans for an unsigned short and hhu for an unsigned char f Scan a floating point number in normal fixed point notation g G Scan a floating point number in either normal or exponential notation g uses lower case letters and G uses upper case x X Scan an integer as an unsigned hexadecimal number o Scan an integer as an octal number s Scan a character string The scan terminates at whitespace A null character is stored at the end of the string which means that the buffer supplied must be at least one character longer than the specified input length c Scan a character char No null character is added whitespace Any whitespace characters trigger a scan for zero or more whitespace characters The number and type of whitespace characters do not need to match in either direction lf Scan as a double floating point number Float format with the long specifier Lf Scan as a long double floating point number Float format the long long specifier n Nothing is expected The number of characters consumed thus far from the input is stored through the next pointer which must be a pointer to int This is not a conversion and does not increase the count returned by the function The above can be used in compound with numeric modifiers and the l L modifiers which stand for long and long long in between the percent symbol and the letter There can also be numeric values between the percent symbol and the letters preceding the long modifiers if any that specifies the number of characters to be scanned An optional asterisk right after the percent symbol denotes that the datum read by this format specifier is not to be stored in a variable No argument behind the format string should be included for this dropped variable The ff modifier in printf is not present in scanf causing differences between modes of input and output The ll and hh modifiers are not present in the C90 standard but are present in the C99 standard 3 An example of a format string is 7d s c lf The above format string scans the first seven characters as a decimal integer then reads the remaining as a string until a space newline or tab is found then consumes whitespace until the first non whitespace character is found then consumes that character and finally scans the remaining characters as a double Therefore a robust program must check whether the scanf call succeeded and take appropriate action If the input was not in the correct format the erroneous data will still be on the input stream and must discarded before new input can be read An alternative method which avoids this is to use a href Fgets html class mw redirect title Fgets fgets a and then examine the string read in The last step can be done by a href Sscanf html class mw redirect title Sscanf sscanf a for example In the case of the many float type characters a e f g many implementations choose to collapse most into the same parser Microsoft MSVCRT does it with e f g 4 while glibc does so with all four 2 Vulnerabilities Editscanf is vulnerable to format string attacks Great care should be taken to ensure that the formatting string includes limitations for string and array sizes In most cases the input string size from a user is arbitrary and cannot be determined before the scanf function is executed This means that s placeholders without length specifiers are inherently insecure and exploitable for buffer overflows Another potential problem is to allow dynamic formatting strings for example formatting strings stored in configuration files or other user controlled files In this case the allowed input length of string sizes cannot be specified unless the formatting string is checked beforehand and limitations are enforced Related to this are additional or mismatched formatting placeholders which do not match the actual vararg list These placeholders might be partially extracted from the stack or contain undesirable or even insecure pointers depending on the particular implementation of varargs See also EditC programming language Format string attack Printf format string String interpolationReferences Edit McIlroy M D 1987 A Research Unix reader annotated excerpts from the Programmer s Manual 1971 1986 PDF Technical report CSTR Bell Labs 139 a b c scanf 3 Linux Programmer s Manual Library Functions C99 standard 7 19 6 2 The fscanf function alinea 11 scanf Type Field Characters docs microsoft com External links Editscanf System Interfaces Reference The Single UNIX Specification Version 4 from The Open Group C reference for std scanf Retrieved from https en wikipedia org w index php title Scanf format string amp oldid 1167037132, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.