fbpx
Wikipedia

Data integrity

Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle.[1] It is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The term is broad in scope and may have widely different meanings depending on the specific context – even under the same general umbrella of computing. It is at times used as a proxy term for data quality,[2] while data validation is a prerequisite for data integrity.[3] Data integrity is the opposite of data corruption.[4] The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended (such as a database correctly rejecting mutually exclusive possibilities). Moreover, upon later retrieval, ensure the data is the same as when it was originally recorded. In short, data integrity aims to prevent unintentional changes to information. Data integrity is not to be confused with data security, the discipline of protecting data from unauthorized parties.

Any unintended changes to data as the result of a storage, retrieval or processing operation, including malicious intent, unexpected hardware failure, and human error, is failure of data integrity. If the changes are the result of unauthorized access, it may also be a failure of data security. Depending on the data involved this could manifest itself as benign as a single pixel in an image appearing a different color than was originally recorded, to the loss of vacation pictures or a business-critical database, to even catastrophic loss of human life in a life-critical system.

Integrity types edit

Physical integrity edit

Physical integrity deals with challenges which are associated with correctly storing and fetching the data itself. Challenges with physical integrity may include electromechanical faults, design flaws, material fatigue, corrosion, power outages, natural disasters, and other special environmental hazards such as ionizing radiation, extreme temperatures, pressures and g-forces. Ensuring physical integrity includes methods such as redundant hardware, an uninterruptible power supply, certain types of RAID arrays, radiation hardened chips, error-correcting memory, use of a clustered file system, using file systems that employ block level checksums such as ZFS, storage arrays that compute parity calculations such as exclusive or or use a cryptographic hash function and even having a watchdog timer on critical subsystems.

Physical integrity often makes extensive use of error detecting algorithms known as error-correcting codes. Human-induced data integrity errors are often detected through the use of simpler checks and algorithms, such as the Damm algorithm or Luhn algorithm. These are used to maintain data integrity after manual transcription from one computer system to another by a human intermediary (e.g. credit card or bank routing numbers). Computer-induced transcription errors can be detected through hash functions.

In production systems, these techniques are used together to ensure various degrees of data integrity. For example, a computer file system may be configured on a fault-tolerant RAID array, but might not provide block-level checksums to detect and prevent silent data corruption. As another example, a database management system might be compliant with the ACID properties, but the RAID controller or hard disk drive's internal write cache might not be.

Logical integrity edit

This type of integrity is concerned with the correctness or rationality of a piece of data, given a particular context. This includes topics such as referential integrity and entity integrity in a relational database or correctly ignoring impossible sensor data in robotic systems. These concerns involve ensuring that the data "makes sense" given its environment. Challenges include software bugs, design flaws, and human errors. Common methods of ensuring logical integrity include things such as check constraints, foreign key constraints, program assertions, and other run-time sanity checks.

Physical and logical integrity often share many challenges such as human errors and design flaws, and both must appropriately deal with concurrent requests to record and retrieve data, the latter of which is entirely a subject on its own.

If a data sector only has a logical error, it can be reused by overwriting it with new data. In case of a physical error, the affected data sector is permanently unusable.

Databases edit

Data integrity contains guidelines for data retention, specifying or guaranteeing the length of time data can be retained in a particular database (typically a relational database). To achieve data integrity, these rules are consistently and routinely applied to all data entering the system, and any relaxation of enforcement could cause errors in the data. Implementing checks on the data as close as possible to the source of input (such as human data entry), causes less erroneous data to enter the system. Strict enforcement of data integrity rules results in lower error rates, and time saved troubleshooting and tracing erroneous data and the errors it causes to algorithms.

Data integrity also includes rules defining the relations a piece of data can have to other pieces of data, such as a Customer record being allowed to link to purchased Products, but not to unrelated data such as Corporate Assets. Data integrity often includes checks and correction for invalid data, based on a fixed schema or a predefined set of rules. An example being textual data entered where a date-time value is required. Rules for data derivation are also applicable, specifying how a data value is derived based on algorithm, contributors and conditions. It also specifies the conditions on how the data value could be re-derived.

Types of integrity constraints edit

Data integrity is normally enforced in a database system by a series of integrity constraints or rules. Three types of integrity constraints are an inherent part of the relational data model: entity integrity, referential integrity and domain integrity.

  • Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null.
  • Referential integrity concerns the concept of a foreign key. The referential integrity rule states that any foreign-key value can only be in one of two states. The usual state of affairs is that the foreign-key value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of the data owner, a foreign-key value can be null. In this case, we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown.
  • Domain integrity specifies that all columns in a relational database must be declared upon a defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn.
  • User-defined integrity refers to a set of rules specified by a user, which do not belong to the entity, domain and referential integrity categories.

If a database supports these features, it is the responsibility of the database to ensure data integrity as well as the consistency model for the data storage and retrieval. If a database does not support these features, it is the responsibility of the applications to ensure data integrity while the database supports the consistency model for the data storage and retrieval.

Having a single, well-controlled, and well-defined data-integrity system increases

  • stability (one centralized system performs all data integrity operations)
  • performance (all data integrity operations are performed in the same tier as the consistency model)
  • re-usability (all applications benefit from a single centralized data integrity system)
  • maintainability (one centralized system for all data integrity administration).

Modern databases support these features (see Comparison of relational database management systems), and it has become the de facto responsibility of the database to ensure data integrity. Companies, and indeed many database systems, offer products and services to migrate legacy systems to modern databases.

Examples edit

An example of a data-integrity mechanism is the parent-and-child relationship of related records. If a parent record owns one or more related child records all of the referential integrity processes are handled by the database itself, which automatically ensures the accuracy and integrity of the data so that no child record can exist without a parent (also called being orphaned) and that no parent loses their child records. It also ensures that no parent record can be deleted while the parent record owns any child records. All of this is handled at the database level and does not require coding integrity checks into each application.

File systems edit

Various research results show that neither widespread filesystems (including UFS, Ext, XFS, JFS and NTFS) nor hardware RAID solutions provide sufficient protection against data integrity problems.[5][6][7][8][9]

Some filesystems (including Btrfs and ZFS) provide internal data and metadata checksumming that is used for detecting silent data corruption and improving data integrity. If a corruption is detected that way and internal RAID mechanisms provided by those filesystems are also used, such filesystems can additionally reconstruct corrupted data in a transparent way.[10] This approach allows improved data integrity protection covering the entire data paths, which is usually known as end-to-end data protection.[11]

Data integrity as applied to various industries edit

  • The U.S. Food and Drug Administration has created draft guidance on data integrity for the pharmaceutical manufacturers required to adhere to U.S. Code of Federal Regulations 21 CFR Parts 210–212.[12] Outside the U.S., similar data integrity guidance has been issued by the United Kingdom (2015), Switzerland (2016), and Australia (2017).[13]
  • Various standards for the manufacture of medical devices address data integrity either directly or indirectly, including ISO 13485, ISO 14155, and ISO 5840.[14]
  • In early 2017, the Financial Industry Regulatory Authority (FINRA), noting data integrity problems with automated trading and money movement surveillance systems, stated it would make "the development of a data integrity program to monitor the accuracy of the submitted data" a priority.[15] In early 2018, FINRA said it would expand its approach on data integrity to firms' "technology change management policies and procedures" and Treasury securities reviews.[16]
  • Other sectors such as mining[17] and product manufacturing[18] are increasingly focusing on the importance of data integrity in associated automation and production monitoring assets.
  • Cloud storage providers have long faced significant challenges ensuring the integrity or provenance of customer data and tracking violations.[19][20][21]

See also edit

References edit

  1. ^ Boritz, J. . International Journal of Accounting Information Systems. Elsevier. Archived from the original on 5 October 2011. Retrieved 12 August 2011.
  2. ^ What is Data Integrity? Learn How to Ensure Database Data Integrity via Checks, Tests, & Best Practices
  3. ^ What is Data Integrity? Data Protection 101
  4. ^ From the book: Uberveillance and the Social Implications of Microchip Implants: Emerging Page 40
  5. ^ Vijayan Prabhakaran (2006). "IRON FILE SYSTEMS" (PDF). Doctor of Philosophy in Computer Sciences. University of Wisconsin-Madison. Archived (PDF) from the original on 2022-10-09. Retrieved 9 June 2012.
  6. ^ "Parity Lost and Parity Regained".
  7. ^ "An Analysis of Data Corruption in the Storage Stack" (PDF). Archived (PDF) from the original on 2022-10-09.
  8. ^ "Impact of Disk Corruption on Open-Source DBMS" (PDF). Archived (PDF) from the original on 2022-10-09.
  9. ^ "Baarf.com". Baarf.com. Retrieved November 4, 2011.
  10. ^ Bierman, Margaret; Grimmer, Lenz (August 2012). "How I Use the Advanced Capabilities of Btrfs". Retrieved 2014-01-02.
  11. ^ Yupu Zhang; Abhishek Rajimwale; Andrea Arpaci-Dusseau; Remzi H. Arpaci-Dusseau (2010). "End-to-end data integrity for file systems: a ZFS case study" (PDF). USENIX Conference on File and Storage Technologies. CiteSeerX 10.1.1.154.3979. S2CID 5722163. Wikidata Q111972797. Retrieved 2014-01-02.
  12. ^ "Data Integrity and Compliance with CGMP: Guidance for Industry" (PDF). U.S. Food and Drug Administration. April 2016. Archived (PDF) from the original on 2022-10-09. Retrieved 20 January 2018.
  13. ^ Davidson, J. (18 July 2017). "Data Integrity Guidance Around the World". Contract Pharma. Rodman Media. Retrieved 20 January 2018.
  14. ^ Scannel, P. (12 May 2015). "Data Integrity: A perspective from the medical device regulatory and standards framework" (PDF). Data Integrity Seminar. Parenteral Drug Association. pp. 10–57. Retrieved 20 January 2018.
  15. ^ Cook, R. (4 January 2017). "2017 Regulatory and Examination Priorities Letter". Financial Industry Regulatory Authority. Retrieved 20 January 2018.
  16. ^ Cook, R. (8 January 2018). "2018 Regulatory and Examination Priorities Letter". Financial Industry Regulatory Authority. Retrieved 20 January 2018.
  17. ^ "Data Integrity: Enabling Effective Decisions in Mining Operations" (PDF). Accenture. 2016. Archived (PDF) from the original on 2022-10-09. Retrieved 20 January 2018.
  18. ^ "Industry 4.0 and Cyber-Physical Systems Raise the Data Integrity Imperative". Nymi Blog. Nymi, Inc. 24 October 2017. Retrieved 20 January 2018.[permanent dead link]
  19. ^ Priyadharshini, B.; Parvathi, P. (2012). "Data integrity in cloud storage". Proceedings from the 2012 International Conference on Advances in Engineering, Science and Management. ISBN 9788190904223.
  20. ^ Zafar, F.; Khan, A.; Malik, S.U.R.; et al. (2017). "A survey of cloud computing data integrity schemes: Design challenges, taxonomy and future trends". Computers & Security. 65 (3): 29–49. doi:10.1016/j.cose.2016.10.006.
  21. ^ Imran, M.; Hlavacs, H.; Haq, I.U.I.; et al. (2017). "Provenance based data integrity checking and verification in cloud environments". PLOS ONE. 12 (5): e0177576. Bibcode:2017PLoSO..1277576I. doi:10.1371/journal.pone.0177576. PMC 5435237. PMID 28545151.

Further reading edit

  •   This article incorporates public domain material from . General Services Administration. Archived from the original on 2022-01-22. (in support of MIL-STD-188).
  • Xiaoyun Wang; Hongbo Yu (2005). (PDF). EUROCRYPT. ISBN 3-540-25910-4. Archived from the original (PDF) on 2009-05-21. Retrieved 2009-05-10.

data, integrity, maintenance, assurance, data, accuracy, consistency, over, entire, life, cycle, critical, aspect, design, implementation, usage, system, that, stores, processes, retrieves, data, term, broad, scope, have, widely, different, meanings, depending. Data integrity is the maintenance of and the assurance of data accuracy and consistency over its entire life cycle 1 It is a critical aspect to the design implementation and usage of any system that stores processes or retrieves data The term is broad in scope and may have widely different meanings depending on the specific context even under the same general umbrella of computing It is at times used as a proxy term for data quality 2 while data validation is a prerequisite for data integrity 3 Data integrity is the opposite of data corruption 4 The overall intent of any data integrity technique is the same ensure data is recorded exactly as intended such as a database correctly rejecting mutually exclusive possibilities Moreover upon later retrieval ensure the data is the same as when it was originally recorded In short data integrity aims to prevent unintentional changes to information Data integrity is not to be confused with data security the discipline of protecting data from unauthorized parties Any unintended changes to data as the result of a storage retrieval or processing operation including malicious intent unexpected hardware failure and human error is failure of data integrity If the changes are the result of unauthorized access it may also be a failure of data security Depending on the data involved this could manifest itself as benign as a single pixel in an image appearing a different color than was originally recorded to the loss of vacation pictures or a business critical database to even catastrophic loss of human life in a life critical system Contents 1 Integrity types 1 1 Physical integrity 1 2 Logical integrity 2 Databases 2 1 Types of integrity constraints 2 2 Examples 3 File systems 4 Data integrity as applied to various industries 5 See also 6 References 7 Further readingIntegrity types editPhysical integrity edit Physical integrity deals with challenges which are associated with correctly storing and fetching the data itself Challenges with physical integrity may include electromechanical faults design flaws material fatigue corrosion power outages natural disasters and other special environmental hazards such as ionizing radiation extreme temperatures pressures and g forces Ensuring physical integrity includes methods such as redundant hardware an uninterruptible power supply certain types of RAID arrays radiation hardened chips error correcting memory use of a clustered file system using file systems that employ block level checksums such as ZFS storage arrays that compute parity calculations such as exclusive or or use a cryptographic hash function and even having a watchdog timer on critical subsystems Physical integrity often makes extensive use of error detecting algorithms known as error correcting codes Human induced data integrity errors are often detected through the use of simpler checks and algorithms such as the Damm algorithm or Luhn algorithm These are used to maintain data integrity after manual transcription from one computer system to another by a human intermediary e g credit card or bank routing numbers Computer induced transcription errors can be detected through hash functions In production systems these techniques are used together to ensure various degrees of data integrity For example a computer file system may be configured on a fault tolerant RAID array but might not provide block level checksums to detect and prevent silent data corruption As another example a database management system might be compliant with the ACID properties but the RAID controller or hard disk drive s internal write cache might not be Logical integrity edit See also Mutex and Copy on write This type of integrity is concerned with the correctness or rationality of a piece of data given a particular context This includes topics such as referential integrity and entity integrity in a relational database or correctly ignoring impossible sensor data in robotic systems These concerns involve ensuring that the data makes sense given its environment Challenges include software bugs design flaws and human errors Common methods of ensuring logical integrity include things such as check constraints foreign key constraints program assertions and other run time sanity checks Physical and logical integrity often share many challenges such as human errors and design flaws and both must appropriately deal with concurrent requests to record and retrieve data the latter of which is entirely a subject on its own If a data sector only has a logical error it can be reused by overwriting it with new data In case of a physical error the affected data sector is permanently unusable Databases editData integrity contains guidelines for data retention specifying or guaranteeing the length of time data can be retained in a particular database typically a relational database To achieve data integrity these rules are consistently and routinely applied to all data entering the system and any relaxation of enforcement could cause errors in the data Implementing checks on the data as close as possible to the source of input such as human data entry causes less erroneous data to enter the system Strict enforcement of data integrity rules results in lower error rates and time saved troubleshooting and tracing erroneous data and the errors it causes to algorithms Data integrity also includes rules defining the relations a piece of data can have to other pieces of data such as a Customer record being allowed to link to purchased Products but not to unrelated data such as Corporate Assets Data integrity often includes checks and correction for invalid data based on a fixed schema or a predefined set of rules An example being textual data entered where a date time value is required Rules for data derivation are also applicable specifying how a data value is derived based on algorithm contributors and conditions It also specifies the conditions on how the data value could be re derived Types of integrity constraints edit Data integrity is normally enforced in a database system by a series of integrity constraints or rules Three types of integrity constraints are an inherent part of the relational data model entity integrity referential integrity and domain integrity Entity integrity concerns the concept of a primary key Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null Referential integrity concerns the concept of a foreign key The referential integrity rule states that any foreign key value can only be in one of two states The usual state of affairs is that the foreign key value refers to a primary key value of some table in the database Occasionally and this will depend on the rules of the data owner a foreign key value can be null In this case we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown Domain integrity specifies that all columns in a relational database must be declared upon a defined domain The primary unit of data in the relational data model is the data item Such data items are said to be non decomposable or atomic A domain is a set of values of the same type Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn User defined integrity refers to a set of rules specified by a user which do not belong to the entity domain and referential integrity categories If a database supports these features it is the responsibility of the database to ensure data integrity as well as the consistency model for the data storage and retrieval If a database does not support these features it is the responsibility of the applications to ensure data integrity while the database supports the consistency model for the data storage and retrieval Having a single well controlled and well defined data integrity system increases stability one centralized system performs all data integrity operations performance all data integrity operations are performed in the same tier as the consistency model re usability all applications benefit from a single centralized data integrity system maintainability one centralized system for all data integrity administration Modern databases support these features see Comparison of relational database management systems and it has become the de facto responsibility of the database to ensure data integrity Companies and indeed many database systems offer products and services to migrate legacy systems to modern databases Examples edit An example of a data integrity mechanism is the parent and child relationship of related records If a parent record owns one or more related child records all of the referential integrity processes are handled by the database itself which automatically ensures the accuracy and integrity of the data so that no child record can exist without a parent also called being orphaned and that no parent loses their child records It also ensures that no parent record can be deleted while the parent record owns any child records All of this is handled at the database level and does not require coding integrity checks into each application File systems editVarious research results show that neither widespread filesystems including UFS Ext XFS JFS and NTFS nor hardware RAID solutions provide sufficient protection against data integrity problems 5 6 7 8 9 Some filesystems including Btrfs and ZFS provide internal data and metadata checksumming that is used for detecting silent data corruption and improving data integrity If a corruption is detected that way and internal RAID mechanisms provided by those filesystems are also used such filesystems can additionally reconstruct corrupted data in a transparent way 10 This approach allows improved data integrity protection covering the entire data paths which is usually known as end to end data protection 11 Data integrity as applied to various industries editThe U S Food and Drug Administration has created draft guidance on data integrity for the pharmaceutical manufacturers required to adhere to U S Code of Federal Regulations 21 CFR Parts 210 212 12 Outside the U S similar data integrity guidance has been issued by the United Kingdom 2015 Switzerland 2016 and Australia 2017 13 Various standards for the manufacture of medical devices address data integrity either directly or indirectly including ISO 13485 ISO 14155 and ISO 5840 14 In early 2017 the Financial Industry Regulatory Authority FINRA noting data integrity problems with automated trading and money movement surveillance systems stated it would make the development of a data integrity program to monitor the accuracy of the submitted data a priority 15 In early 2018 FINRA said it would expand its approach on data integrity to firms technology change management policies and procedures and Treasury securities reviews 16 Other sectors such as mining 17 and product manufacturing 18 are increasingly focusing on the importance of data integrity in associated automation and production monitoring assets Cloud storage providers have long faced significant challenges ensuring the integrity or provenance of customer data and tracking violations 19 20 21 See also editEnd to end data integrity Message authentication National Information Assurance Glossary Single version of the truth Optical disc Surface error scanningReferences edit Boritz J IS Practitioners Views on Core Concepts of Information Integrity International Journal of Accounting Information Systems Elsevier Archived from the original on 5 October 2011 Retrieved 12 August 2011 What is Data Integrity Learn How to Ensure Database Data Integrity via Checks Tests amp Best Practices What is Data Integrity Data Protection 101 From the book Uberveillance and the Social Implications of Microchip Implants Emerging Page 40 Vijayan Prabhakaran 2006 IRON FILE SYSTEMS PDF Doctor of Philosophy in Computer Sciences University of Wisconsin Madison Archived PDF from the original on 2022 10 09 Retrieved 9 June 2012 Parity Lost and Parity Regained An Analysis of Data Corruption in the Storage Stack PDF Archived PDF from the original on 2022 10 09 Impact of Disk Corruption on Open Source DBMS PDF Archived PDF from the original on 2022 10 09 Baarf com Baarf com Retrieved November 4 2011 Bierman Margaret Grimmer Lenz August 2012 How I Use the Advanced Capabilities of Btrfs Retrieved 2014 01 02 Yupu Zhang Abhishek Rajimwale Andrea Arpaci Dusseau Remzi H Arpaci Dusseau 2010 End to end data integrity for file systems a ZFS case study PDF USENIX Conference on File and Storage Technologies CiteSeerX 10 1 1 154 3979 S2CID 5722163 Wikidata Q111972797 Retrieved 2014 01 02 Data Integrity and Compliance with CGMP Guidance for Industry PDF U S Food and Drug Administration April 2016 Archived PDF from the original on 2022 10 09 Retrieved 20 January 2018 Davidson J 18 July 2017 Data Integrity Guidance Around the World Contract Pharma Rodman Media Retrieved 20 January 2018 Scannel P 12 May 2015 Data Integrity A perspective from the medical device regulatory and standards framework PDF Data Integrity Seminar Parenteral Drug Association pp 10 57 Retrieved 20 January 2018 Cook R 4 January 2017 2017 Regulatory and Examination Priorities Letter Financial Industry Regulatory Authority Retrieved 20 January 2018 Cook R 8 January 2018 2018 Regulatory and Examination Priorities Letter Financial Industry Regulatory Authority Retrieved 20 January 2018 Data Integrity Enabling Effective Decisions in Mining Operations PDF Accenture 2016 Archived PDF from the original on 2022 10 09 Retrieved 20 January 2018 Industry 4 0 and Cyber Physical Systems Raise the Data Integrity Imperative Nymi Blog Nymi Inc 24 October 2017 Retrieved 20 January 2018 permanent dead link Priyadharshini B Parvathi P 2012 Data integrity in cloud storage Proceedings from the 2012 International Conference on Advances in Engineering Science and Management ISBN 9788190904223 Zafar F Khan A Malik S U R et al 2017 A survey of cloud computing data integrity schemes Design challenges taxonomy and future trends Computers amp Security 65 3 29 49 doi 10 1016 j cose 2016 10 006 Imran M Hlavacs H Haq I U I et al 2017 Provenance based data integrity checking and verification in cloud environments PLOS ONE 12 5 e0177576 Bibcode 2017PLoSO 1277576I doi 10 1371 journal pone 0177576 PMC 5435237 PMID 28545151 Further reading edit nbsp This article incorporates public domain material from Federal Standard 1037C General Services Administration Archived from the original on 2022 01 22 in support of MIL STD 188 Xiaoyun Wang Hongbo Yu 2005 How to Break MD5 and Other Hash Functions PDF EUROCRYPT ISBN 3 540 25910 4 Archived from the original PDF on 2009 05 21 Retrieved 2009 05 10 Retrieved from https en wikipedia org w index php title Data integrity amp oldid 1176625538, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.