fbpx
Wikipedia

C character classification

C character classification is an operation provided by a group of functions in the ANSI C Standard Library for the C programming language. These functions are used to test characters for membership in a particular class of characters, such as alphabetic characters, control characters, etc. Both single-byte, and wide characters are supported.[1]

History edit

Early C-language programmers working on the Unix operating system developed programming idioms for classifying characters into different types. For example, for the ASCII character set, the following expression identifies a letter, when its value is true:

('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z') 

As this may be expressed in multiple formulations, it became desirable to introduce short, standardized forms of such tests that were placed in the system-wide header file ctype.h.

Implementation edit

Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.

For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written as

#define isdigit(x) (TABLE[x] & 1) 

Early versions of Linux used a potentially faulty method similar to the first code sample:

#define isdigit(x) ((x) >= '0' && (x) <= '9') 

This can cause problems if when the macro expands, the expression substituted for x has a side effect. For example, if one calls isdigit(x++) or isdigit(run_some_program()). It is not immediately evident that the argument to isdigit is evaluated twice. For this reason, the table-based approach is generally used.

Overview of functions edit

The functions that operate on single-byte characters are defined in ctype.h header file (cctype in C++). The functions that operate on wide characters are defined in wctype.h header file (cwctype in C++).

The classification is evaluated according to the effective locale.

Byte
character
Wide
character
Description
isalnum iswalnum checks whether the operand is alphanumeric
isalpha iswalpha checks whether the operand is alphabetic
islower iswlower checks whether the operand is lowercase
isupper iswupper checks whether the operand is an uppercase
isdigit iswdigit checks whether the operand is a digit
isxdigit iswxdigit checks whether the operand is hexadecimal
iscntrl iswcntrl checks whether the operand is a control character
isgraph iswgraph checks whether the operand is a graphical character
isspace iswspace checks whether the operand is space
isblank iswblank checks whether the operand is a blank space character
isprint iswprint checks whether the operand is a printable character
ispunct iswpunct checks whether the operand is punctuation
tolower towlower converts the operand to lowercase
toupper towupper converts the operand to uppercase
iswctype checks whether the operand falls into specific class
towctrans converts the operand using a specific mapping
wctype returns a wide character class to be used with iswctype
wctrans returns a transformation mapping to be used with towctrans

References edit

  1. ^ ISO/IEC 9899:1999 specification (PDF). p. 193, § 7.4.

External links edit

character, classification, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, jstor, . This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources C character classification news newspapers books scholar JSTOR October 2011 Learn how and when to remove this template message C character classification is an operation provided by a group of functions in the ANSI C Standard Library for the C programming language These functions are used to test characters for membership in a particular class of characters such as alphabetic characters control characters etc Both single byte and wide characters are supported 1 Contents 1 History 2 Implementation 3 Overview of functions 4 References 5 External linksHistory editEarly C language programmers working on the Unix operating system developed programming idioms for classifying characters into different types For example for the ASCII character set the following expression identifies a letter when its value is true A lt c amp amp c lt Z a lt c amp amp c lt z As this may be expressed in multiple formulations it became desirable to introduce short standardized forms of such tests that were placed in the system wide header file ctype h Implementation editUnlike the above example the character classification routines are not written as comparison tests In most C libraries they are written as static table lookups instead of macros or functions For example an array of 256 eight bit integers arranged as bitfields is created where each bit corresponds to a particular property of the character e g isdigit isalpha If the lowest order bit of the integers corresponds to the isdigit property the code could be written as define isdigit x TABLE x amp 1 Early versions of Linux used a potentially faulty method similar to the first code sample define isdigit x x gt 0 amp amp x lt 9 This can cause problems if when the macro expands the expression substituted for x has a side effect For example if one calls isdigit x or isdigit run some program It is not immediately evident that the argument to isdigit is evaluated twice For this reason the table based approach is generally used Overview of functions editThe functions that operate on single byte characters are defined in ctype h header file cctype in C The functions that operate on wide characters are defined in wctype h header file cwctype in C The classification is evaluated according to the effective locale Bytecharacter Widecharacter Descriptionisalnum iswalnum checks whether the operand is alphanumericisalpha iswalpha checks whether the operand is alphabeticislower iswlower checks whether the operand is lowercaseisupper iswupper checks whether the operand is an uppercaseisdigit iswdigit checks whether the operand is a digitisxdigit iswxdigit checks whether the operand is hexadecimaliscntrl iswcntrl checks whether the operand is a control characterisgraph iswgraph checks whether the operand is a graphical characterisspace iswspace checks whether the operand is spaceisblank iswblank checks whether the operand is a blank space characterisprint iswprint checks whether the operand is a printable characterispunct iswpunct checks whether the operand is punctuationtolower towlower converts the operand to lowercasetoupper towupper converts the operand to uppercase iswctype checks whether the operand falls into specific class towctrans converts the operand using a specific mapping wctype returns a wide character class to be used with iswctype wctrans returns a transformation mapping to be used with towctransReferences edit ISO IEC 9899 1999 specification PDF p 193 7 4 External links edit nbsp The Wikibook A Little C Primer has a page on the topic of C Character Class Test Library nbsp The Wikibook C Programming has a page on the topic of C Programming C Reference Retrieved from https en wikipedia org w index php title C character classification amp oldid 1115151620, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.