fbpx
Wikipedia

Checksum

A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.[1]

Effect of a typical checksum function (the Unixcksum utility)

The procedure which generates this checksum is called a checksum function or checksum algorithm. Depending on its design goals, a good checksum algorithm usually outputs a significantly different value, even for small changes made to the input.[2] This is especially true of cryptographic hash functions, which may be used to detect many data corruption errors and verify overall data integrity; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted.

Checksum functions are related to hash functions, fingerprints, randomization functions, and cryptographic hash functions. However, each of those concepts has different applications and therefore different design goals. For instance, a function returning the start of a string can provide a hash appropriate for some applications but will never be a suitable checksum. Checksums are used as cryptographic primitives in larger authentication algorithms. For cryptographic systems with these two specific design goals[clarification needed], see HMAC.

Check digits and parity bits are special cases of checksums, appropriate for small blocks of data (such as Social Security numbers, bank account numbers, computer words, single bytes, etc.). Some error-correcting codes are based on special checksums which not only detect common errors but also allow the original data to be recovered in certain cases.

Algorithms

Parity byte or parity word

The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or (XOR) of all those words. The result is appended to the message as an extra word. In simpler terms, this means adding a bit to the end of the word to guarantee that there is an even number of '1's. To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum; if the result is not a word consisting of n zeros, the receiver knows a transmission error occurred.[3]

With this checksum, any transmission error which flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error that affects two bits will not be detected if those bits lie at the same position in two distinct words. Also swapping of two or more words will not be detected. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is 1/n.

Sum complement

A variant of the previous algorithm is to add all the "words" as unsigned binary numbers, discarding any overflow bits, and append the two's complement of the total as the checksum. To validate a message, the receiver adds all the words in the same manner, including the checksum; if the result is not a word full of zeros, an error must have occurred. This variant, too, detects any single-bit error, but the pro modular sum is used in SAE J1708.[4]

Position-dependent

The simple checksums described above fail to detect some common errors which affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero. The checksum algorithms most used in practice, such as Fletcher's checksum, Adler-32, and cyclic redundancy checks (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. This feature generally increases the cost of computing the checksum.

Fuzzy checksum

The idea of fuzzy checksum was developed for detection of email spam by building up cooperative databases from multiple ISPs of email suspected to be spam. The content of such spam may often vary in its details, which would render normal checksumming ineffective. By contrast, a "fuzzy checksum" reduces the body text to its characteristic minimum, then generates a checksum in the usual manner. This greatly increases the chances of slightly different spam emails producing the same checksum. The ISP spam detection software, such as SpamAssassin, of co-operating ISPs, submits checksums of all emails to the centralised service such as DCC. If the count of a submitted fuzzy checksum exceeds a certain threshold, the database notes that this probably indicates spam. ISP service users similarly generate a fuzzy checksum on each of their emails and request the service for a spam likelihood.[5]

General considerations

A message that is m bits long can be viewed as a corner of the m-dimensional hypercube. The effect of a checksum algorithm that yields an n-bit checksum is to map each m-bit message to a corner of a larger hypercube, with dimension m + n. The 2m + n corners of this hypercube represent all possible received messages. The valid received messages (those that have the correct checksum) comprise a smaller set, with only 2m corners.

A single-bit transmission error then corresponds to a displacement from a valid corner (the correct message and checksum) to one of the m adjacent corners. An error which affects k bits moves the message to a corner which is k steps removed from its correct corner. The goal of a good checksum algorithm is to spread the valid corners as far from each other as possible, to increase the likelihood "typical" transmission errors will end up in an invalid corner.

See also

General topic

Error correction

Hash functions

File systems

  • ZFS – a file system that performs automatic file integrity checking using checksums

Related concepts

References

  1. ^ "Definition of CHECKSUM". www.merriam-webster.com. from the original on 2022-03-10. Retrieved 2022-03-10.
  2. ^ Hoffman, Chris. "What Is a Checksum (and Why Should You Care)?". How-To Geek. from the original on 2022-03-09. Retrieved 2022-03-10.
  3. ^ Fairhurst, Gorry (2014). "Checksums & Integrity Checks". from the original on April 8, 2022. Retrieved March 11, 2022.
  4. ^ . Kvaser.com. Archived from the original on 11 December 2013.
  5. ^ "IXhash". Apache. from the original on 31 August 2020. Retrieved 7 January 2020.

External links

  • Additive Checksums (C) theory from Barr Group
  • Practical Application of Cryptographic Checksums
  • Checksum Calculator
  • Open source python based application with GUI used to verify downloads.

checksum, checksum, small, sized, block, data, derived, from, another, block, digital, data, purpose, detecting, errors, that, have, been, introduced, during, transmission, storage, themselves, checksums, often, used, verify, data, integrity, relied, upon, ver. A checksum is a small sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage By themselves checksums are often used to verify data integrity but are not relied upon to verify data authenticity 1 Effect of a typical checksum function the Unix a href Cksum html title Cksum cksum a utility The procedure which generates this checksum is called a checksum function or checksum algorithm Depending on its design goals a good checksum algorithm usually outputs a significantly different value even for small changes made to the input 2 This is especially true of cryptographic hash functions which may be used to detect many data corruption errors and verify overall data integrity if the computed checksum for the current data input matches the stored value of a previously computed checksum there is a very high probability the data has not been accidentally altered or corrupted Checksum functions are related to hash functions fingerprints randomization functions and cryptographic hash functions However each of those concepts has different applications and therefore different design goals For instance a function returning the start of a string can provide a hash appropriate for some applications but will never be a suitable checksum Checksums are used as cryptographic primitives in larger authentication algorithms For cryptographic systems with these two specific design goals clarification needed see HMAC Check digits and parity bits are special cases of checksums appropriate for small blocks of data such as Social Security numbers bank account numbers computer words single bytes etc Some error correcting codes are based on special checksums which not only detect common errors but also allow the original data to be recovered in certain cases Contents 1 Algorithms 1 1 Parity byte or parity word 1 2 Sum complement 1 3 Position dependent 1 4 Fuzzy checksum 1 5 General considerations 2 See also 3 References 4 External linksAlgorithms EditParity byte or parity word Edit The simplest checksum algorithm is the so called longitudinal parity check which breaks the data into words with a fixed number n of bits and then computes the exclusive or XOR of all those words The result is appended to the message as an extra word In simpler terms this means adding a bit to the end of the word to guarantee that there is an even number of 1 s To check the integrity of a message the receiver computes the exclusive or of all its words including the checksum if the result is not a word consisting of n zeros the receiver knows a transmission error occurred 3 With this checksum any transmission error which flips a single bit of the message or an odd number of bits will be detected as an incorrect checksum However an error that affects two bits will not be detected if those bits lie at the same position in two distinct words Also swapping of two or more words will not be detected If the affected bits are independently chosen at random the probability of a two bit error being undetected is 1 n Sum complement Edit A variant of the previous algorithm is to add all the words as unsigned binary numbers discarding any overflow bits and append the two s complement of the total as the checksum To validate a message the receiver adds all the words in the same manner including the checksum if the result is not a word full of zeros an error must have occurred This variant too detects any single bit error but the pro modular sum is used in SAE J1708 4 Position dependent Edit The simple checksums described above fail to detect some common errors which affect many bits at once such as changing the order of data words or inserting or deleting words with all bits set to zero The checksum algorithms most used in practice such as Fletcher s checksum Adler 32 and cyclic redundancy checks CRCs address these weaknesses by considering not only the value of each word but also its position in the sequence This feature generally increases the cost of computing the checksum Fuzzy checksum Edit The idea of fuzzy checksum was developed for detection of email spam by building up cooperative databases from multiple ISPs of email suspected to be spam The content of such spam may often vary in its details which would render normal checksumming ineffective By contrast a fuzzy checksum reduces the body text to its characteristic minimum then generates a checksum in the usual manner This greatly increases the chances of slightly different spam emails producing the same checksum The ISP spam detection software such as SpamAssassin of co operating ISPs submits checksums of all emails to the centralised service such as DCC If the count of a submitted fuzzy checksum exceeds a certain threshold the database notes that this probably indicates spam ISP service users similarly generate a fuzzy checksum on each of their emails and request the service for a spam likelihood 5 General considerations Edit A message that is m bits long can be viewed as a corner of the m dimensional hypercube The effect of a checksum algorithm that yields an n bit checksum is to map each m bit message to a corner of a larger hypercube with dimension m n The 2m n corners of this hypercube represent all possible received messages The valid received messages those that have the correct checksum comprise a smaller set with only 2m corners A single bit transmission error then corresponds to a displacement from a valid corner the correct message and checksum to one of the m adjacent corners An error which affects k bits moves the message to a corner which is k steps removed from its correct corner The goal of a good checksum algorithm is to spread the valid corners as far from each other as possible to increase the likelihood typical transmission errors will end up in an invalid corner See also EditGeneral topic Algorithm Check digit Damm algorithm Data rot File verification Fletcher s checksum Frame check sequence cksum md5sum sha1sum Parchive Sum Unix SYSV checksum BSD checksum xxHashError correction Hamming code Reed Solomon error correction IPv4 header checksumHash functions List of hash functions Luhn algorithm Parity bit Rolling checksum Verhoeff algorithmFile systems ZFS a file system that performs automatic file integrity checking using checksumsRelated concepts Isopsephy Gematria File fixityReferences Edit Definition of CHECKSUM www merriam webster com Archived from the original on 2022 03 10 Retrieved 2022 03 10 Hoffman Chris What Is a Checksum and Why Should You Care How To Geek Archived from the original on 2022 03 09 Retrieved 2022 03 10 Fairhurst Gorry 2014 Checksums amp Integrity Checks Archived from the original on April 8 2022 Retrieved March 11 2022 SAE J1708 Kvaser com Archived from the original on 11 December 2013 IXhash Apache Archived from the original on 31 August 2020 Retrieved 7 January 2020 External links Edit The Wikibook Algorithm Implementation has a page on the topic of Checksums Additive Checksums C theory from Barr Group Practical Application of Cryptographic Checksums Checksum Calculator Open source python based application with GUI used to verify downloads Retrieved from https en wikipedia org w index php title Checksum amp oldid 1146284654, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.