fbpx
Wikipedia

Data quality

Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning".[1][2][3] Moreover, data is deemed of high quality if it correctly represents the real-world construct to which it refers. Furthermore, apart from these definitions, as the number of data sources increases, the question of internal data consistency becomes significant, regardless of fitness for use for any particular external purpose. People's views on data quality can often be in disagreement, even when discussing the same set of data used for the same purpose. When this is the case, data governance is used to form agreed upon definitions and standards for data quality. In such cases, data cleansing, including standardization, may be required in order to ensure data quality.[4]

Definitions

Defining data quality is difficult due to the many contexts data are used in, as well as the varying perspectives among end users, producers, and custodians of data.[5]

From a consumer perspective, data quality is:[5]

  • "data that are fit for use by data consumers"
  • data "meeting or exceeding consumer expectations"
  • data that "satisfies the requirements of its intended use"

From a business perspective, data quality is:

  • data that are "'fit for use' in their intended operational, decision-making and other roles" or that exhibits "'conformance to standards' that have been set, so that fitness for use is achieved"[6]
  • data that "are fit for their intended uses in operations, decision making and planning"[7]
  • "the capability of data to satisfy the stated business, system, and technical requirements of an enterprise"[8]

From a standards-based perspective, data quality is:

  • the "degree to which a set of inherent characteristics (quality dimensions) of an object (data) fulfills requirements"[9][5]
  • "the usefulness, accuracy, and correctness of data for its application"[10]

Arguably, in all these cases, "data quality" is a comparison of the actual state of a particular set of data to a desired state, with the desired state being typically referred to as "fit for use," "to specification," "meeting consumer expectations," "free of defect," or "meeting requirements." These expectations, specifications, and requirements are usually defined by one or more individuals or groups, standards organizations, laws and regulations, business policies, or software development policies.[5]

Dimensions of data quality

Drilling down further, those expectations, specifications, and requirements are stated in terms of characteristics or dimensions of the data, such as:[5][6][7][8][11]

  • accessibility or availability
  • accuracy or correctness
  • comparability
  • completeness or comprehensiveness
  • consistency, coherence, or clarity
  • credibility, reliability, or reputation
  • flexibility
  • plausibility
  • relevance, pertinence, or usefulness
  • timeliness or latency
  • uniqueness
  • validity or reasonableness

A systematic scoping review of the literature suggests that data quality dimensions and methods with real world data are not consistent in the literature, and as a result quality assessments are challenging due to the complex and heterogeneous nature of these data.[11]

History

Before the rise of the inexpensive computer data storage, massive mainframe computers were used to maintain name and address data for delivery services. This was so that mail could be properly routed to its destination. The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. Government agencies began to make postal data available to a few service companies to cross-reference customer data with the National Change of Address registry (NCOA). This technology saved large companies millions of dollars in comparison to manual correction of customer data. Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available.[citation needed]

Companies with an emphasis on marketing often focused their quality efforts on name and address information, but data quality is recognized[by whom?] as an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found. For example, making supply chain data conform to a certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock; 2) avoiding false stock-out; 3) improving the understanding of vendor purchases to negotiate volume discounts; and 4) avoiding logistics costs in stocking and shipping parts across a large organization.[citation needed]

For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error, bounds checking of data, cross tabulation, modeling and outlier detection, verifying data integrity, etc.[citation needed]

Overview

There are a number of theoretical frameworks for understanding data quality. A systems-theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality, and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science (Ivanov, 1972). One framework, dubbed "Zero Defect Data" (Hansen, 1991) adapts the principles of statistical process control to data quality. Another framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al. 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004). One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996).

A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al., 1993). Software engineers may recognize this as a similar problem to "ilities".

MIT has an Information Quality (MITIQ) Program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality, ICIQ). This program grew out of the work done by Hansen on the "Zero Defect Data" framework (Hansen, 1991).

In practice, data quality is a concern for professionals involved with a wide range of information systems, ranging from data warehousing and business intelligence to customer relationship management and supply chain management. One industry study estimated the total cost to the U.S. economy of data quality problems at over U.S. $600 billion per annum (Eckerson, 2002). Incorrect data – which includes invalid and outdated information – can originate from different data sources – through data entry, or data migration and conversion projects.[12]

In 2002, the USPS and PricewaterhouseCoopers released a report stating that 23.6 percent of all U.S. mail sent is incorrectly addressed.[13]

One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year.[14]

In fact, the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. In some[who?] organizations, this data governance function has been established as part of a larger Regulatory Compliance function - a recognition of the importance of Data/Information Quality to organizations.

Problems with data quality don't only arise from incorrect data; inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse is one of the initiatives a company can take to ensure data consistency.

Enterprises, scientists, and researchers are starting to participate within data curation communities to improve the quality of their common data.[15]

The market is going some way to providing data quality assurance. A number of vendors make tools for analyzing and repairing poor quality data in situ, service providers can clean the data on a contract basis and consultants can advise on fixing processes or systems to avoid data quality problems in the first place. Most data quality tools offer a series of tools for improving data, which may include some or all of the following:

  1. Data profiling - initially assessing the data to understand its current state, often including value distributions
  2. Data standardization - a business rules engine that ensures that data conforms to standards
  3. Geocoding - for name and address data. Corrects data to U.S. and Worldwide geographic standards
  4. Matching or Linking - a way to compare data so that similar, but slightly different records can be aligned. Matching may use "fuzzy logic" to find duplicates in the data. It often recognizes that "Bob" and "Bbo" may be the same individual. It might be able to manage "householding", or finding links between spouses at the same address, for example. Finally, it often can build a "best of breed" record, taking the best components from multiple data sources and building a single super-record.
  5. Monitoring - keeping track of data quality over time and reporting variations in the quality of data. Software can also auto-correct the variations based on pre-defined business rules.
  6. Batch and Real time - Once the data is initially cleansed (batch), companies often want to build the processes into enterprise applications to keep it clean.

There are several well-known authors and self-styled experts, with Larry English perhaps the most popular guru. In addition, IQ International - the International Association for Information and Data Quality was established in 2004 to provide a focal point for professionals and researchers in this field.

ISO 8000 is an international standard for data quality.[16]

Data quality assurance

Data quality assurance is the process of data profiling to discover inconsistencies and other anomalies in the data, as well as performing data cleansing[17][18] activities (e.g. removing outliers, missing data interpolation) to improve the data quality.

These activities can be undertaken as part of data warehousing or as part of the database administration of an existing piece of application software.[19]

Data quality control

Data quality control is the process of controlling the usage of data for an application or a process. This process is performed both before and after a Data Quality Assurance (QA) process, which consists of discovery of data inconsistency and correction.

Before:

  • Restricts inputs

After QA process the following statistics are gathered to guide the Quality Control (QC) process:

  • Severity of inconsistency
  • Incompleteness
  • Accuracy
  • Precision
  • Missing / Unknown

The Data QC process uses the information from the QA process to decide to use the data for analysis or in an application or business process. General example: if a Data QC process finds that the data contains too many errors or inconsistencies, then it prevents that data from being used for its intended process which could cause disruption. Specific example: providing invalid measurements from several sensors to the automatic pilot feature on an aircraft could cause it to crash. Thus, establishing a QC process provides data usage protection.[citation needed]

Optimum use of data quality

Data Quality (DQ) is a niche area required for the integrity of the data management by covering gaps of data issues. This is one of the key functions that aid data governance by monitoring data to find exceptions undiscovered by current data management operations. Data Quality checks may be defined at attribute level to have full control on its remediation steps.[citation needed]

DQ checks and business rules may easily overlap if an organization is not attentive of its DQ scope. Business teams should understand the DQ scope thoroughly in order to avoid overlap. Data quality checks are redundant if business logic covers the same functionality and fulfills the same purpose as DQ. The DQ scope of an organization should be defined in DQ strategy and well implemented. Some data quality checks may be translated into business rules after repeated instances of exceptions in the past.[citation needed]

Below are a few areas of data flows that may need perennial DQ checks:

Completeness and precision DQ checks on all data may be performed at the point of entry for each mandatory attribute from each source system. Few attribute values are created way after the initial creation of the transaction; in such cases, administering these checks becomes tricky and should be done immediately after the defined event of that attribute's source and the transaction's other core attribute conditions are met.

All data having attributes referring to Reference Data in the organization may be validated against the set of well-defined valid values of Reference Data to discover new or discrepant values through the validity DQ check. Results may be used to update Reference Data administered under Master Data Management (MDM).

All data sourced from a third party to organization's internal teams may undergo accuracy (DQ) check against the third party data. These DQ check results are valuable when administered on data that made multiple hops after the point of entry of that data but before that data becomes authorized or stored for enterprise intelligence.

All data columns that refer to Master Data may be validated for its consistency check. A DQ check administered on the data at the point of entry discovers new data for the MDM process, but a DQ check administered after the point of entry discovers the failure (not exceptions) of consistency.

As data transforms, multiple timestamps and the positions of that timestamps are captured and may be compared against each other and its leeway to validate its value, decay, operational significance against a defined SLA (service level agreement). This timeliness DQ check can be utilized to decrease data value decay rate and optimize the policies of data movement timeline.

In an organization complex logic is usually segregated into simpler logic across multiple processes. Reasonableness DQ checks on such complex logic yielding to a logical result within a specific range of values or static interrelationships (aggregated business rules) may be validated to discover complicated but crucial business processes and outliers of the data, its drift from BAU (business as usual) expectations, and may provide possible exceptions eventually resulting into data issues. This check may be a simple generic aggregation rule engulfed by large chunk of data or it can be a complicated logic on a group of attributes of a transaction pertaining to the core business of the organization. This DQ check requires high degree of business knowledge and acumen. Discovery of reasonableness issues may aid for policy and strategy changes by either business or data governance or both.

Conformity checks and integrity checks need not covered in all business needs, it's strictly under the database architecture's discretion.

There are many places in the data movement where DQ checks may not be required. For instance, DQ check for completeness and precision on not–null columns is redundant for the data sourced from database. Similarly, data should be validated for its accuracy with respect to time when the data is stitched across disparate sources. However, that is a business rule and should not be in the DQ scope.[citation needed]

Regretfully, from a software development perspective, DQ is often seen as a nonfunctional requirement. And as such, key data quality checks/processes are not factored into the final software solution. Within Healthcare, wearable technologies or Body Area Networks, generate large volumes of data.[20] The level of detail required to ensure data quality is extremely high and is often underestimated. This is also true for the vast majority of mHealth apps, EHRs and other health related software solutions. However, some open source tools exist that examine data quality.[21] The primary reason for this, stems from the extra cost involved is added a higher degree of rigor within the software architecture.

Health data security and privacy

The use of mobile devices in health, or mHealth, creates new challenges to health data security and privacy, in ways that directly affect data quality.[2] mHealth is an increasingly important strategy for delivery of health services in low- and middle-income countries.[22] Mobile phones and tablets are used for collection, reporting, and analysis of data in near real time. However, these mobile devices are commonly used for personal activities, as well, leaving them more vulnerable to security risks that could lead to data breaches. Without proper security safeguards, this personal use could jeopardize the quality, security, and confidentiality of health data.[23]

Data quality in public health

Data quality has become a major focus of public health programs in recent years, especially as demand for accountability increases.[24] Work towards ambitious goals related to the fight against diseases such as AIDS, Tuberculosis, and Malaria must be predicated on strong Monitoring and Evaluation systems that produce quality data related to program implementation.[25] These programs, and program auditors, increasingly seek tools to standardize and streamline the process of determining the quality of data,[26] verify the quality of reported data, and assess the underlying data management and reporting systems for indicators.[27] An example is WHO and MEASURE Evaluation's Data Quality Review Tool[28] WHO, the Global Fund, GAVI, and MEASURE Evaluation have collaborated to produce a harmonized approach to data quality assurance across different diseases and programs.[29]

Open data quality

There are a number of scientific works devoted to the analysis of the data quality in open data sources, such as Wikipedia, Wikidata, DBpedia and other. In the case of Wikipedia, quality analysis may relate to the whole article[30] Modeling of quality there is carried out by means of various methods. Some of them use machine learning algorithms, including Random Forest,[31] Support Vector Machine,[32] and others. Methods for assessing data quality in Wikidata, DBpedia and other LOD sources differ.[33]

Professional associations

IQ International—the International Association for Information and Data Quality[34]
IQ International is a not-for-profit, vendor neutral, professional association formed in 2004, dedicated to building the information and data quality profession.

ECCMA (Electronic Commerce Code Management Association)

The Electronic Commerce Code Management Association (ECCMA) is a member-based, international not-for-profit association committed to improving data quality through the implementation of international standards. ECCMA is the current project leader for the development of ISO 8000 and ISO 22745, which are the international standards for data quality and the exchange of material and service master data, respectively. ECCMA provides a platform for collaboration amongst subject experts on data quality and data governance around the world to build and maintain global, open standard dictionaries that are used to unambiguously label information. The existence of these dictionaries of labels allows information to be passed from one computer system to another without losing meaning.[35]

See also

References

  1. ^ Redman, Thomas C. (30 December 2013). Data Driven: Profiting from Your Most Important Business Asset. Harvard Business Press. ISBN 978-1-4221-6364-1.
  2. ^ a b Fadahunsi, Kayode Philip; Akinlua, James Tosin; O’Connor, Siobhan; Wark, Petra A; Gallagher, Joseph; Carroll, Christopher; Majeed, Azeem; O’Donoghue, John (March 2019). "Protocol for a systematic review and qualitative synthesis of information quality frameworks in eHealth". BMJ Open. 9 (3): e024722. doi:10.1136/bmjopen-2018-024722. ISSN 2044-6055. PMC 6429947. PMID 30842114.
  3. ^ Fadahunsi, Kayode Philip; O'Connor, Siobhan; Akinlua, James Tosin; Wark, Petra A.; Gallagher, Joseph; Carroll, Christopher; Car, Josip; Majeed, Azeem; O'Donoghue, John (2021-05-17). "Information Quality Frameworks for Digital Health Technologies: Systematic Review". Journal of Medical Internet Research. 23 (5): e23479. doi:10.2196/23479. PMC 8167621. PMID 33835034.
  4. ^ Smallwood, R.F. (2014). Information Governance: Concepts, Strategies, and Best Practices. John Wiley and Sons. p. 110. ISBN 9781118218303. from the original on 2020-07-30. Retrieved 2020-04-18. Having a standardized data governance program in place means cleaning up corrupted or duplicated data and providing users with clean, accurate data as a basis for line-of-business software applications and for decision support analytics in business intelligence (BI) applications.
  5. ^ a b c d e Fürber, C. (2015). "3. Data Quality". Data Quality Management with Semantic Technologies. Springer. pp. 20–55. ISBN 9783658122249. from the original on 31 July 2020. Retrieved 18 April 2020.
  6. ^ a b Herzog, T.N.; Scheuren, F.J.; Winkler, W.E. (2007). "Chapter 2: What is data quality and why should we care?". Data Quality and Record Linkage Techniques. Springer Science & Business Media. pp. 7–15. ISBN 9780387695020. from the original on 31 July 2020. Retrieved 18 April 2020.{{cite book}}: CS1 maint: multiple names: authors list (link)
  7. ^ a b Fleckenstein, M.; Fellows, L. (2018). "Chapter 11: Data Quality". Modern Data Strategy. Springer. pp. 101–120. ISBN 9783319689920. from the original on 31 July 2020. Retrieved 18 April 2020.{{cite book}}: CS1 maint: multiple names: authors list (link)
  8. ^ a b Mahanti, R. (2019). "Chapter 1: Data, Data Quality, and Cost of Poor Data Quality". Data Quality: Dimensions, Measurement, Strategy, Management, and Governance. Quality Press. pp. 5–6. ISBN 9780873899772. from the original on 23 November 2020. Retrieved 18 April 2020.
  9. ^ International Organization for Standardization (September 2015). "ISO 9000:2015(en) Quality management systems — Fundamentals and vocabulary". International Organization for Standardization. from the original on 19 May 2020. Retrieved 18 April 2020.
  10. ^ NIST Big Data Public Working Group, Definitions and Taxonomies Subgroup (October 2019). "NIST Big Data Interoperability Framework: Volume 4, Security and Privacy" (PDF). NIST Special Publication 1500-4r2 (3rd ed.). National Institute of Standards and Technology. doi:10.6028/NIST.SP.1500-4r2. (PDF) from the original on 9 May 2020. Retrieved 18 April 2020. Validity refers to the usefulness, accuracy, and correctness of data for its application. Traditionally, this has been referred to as data quality.
  11. ^ a b Bian, Jiang; Lyu, Tianchen; Loiacono, Alexander; Viramontes, Tonatiuh Mendoza; Lipori, Gloria; Guo, Yi; Wu, Yonghui; Prosperi, Mattia; George, Thomas J; Harle, Christopher A; Shenkman, Elizabeth A (2020-12-09). "Assessing the practice of data quality evaluation in a national clinical data research network through a systematic scoping review in the era of real-world data". Journal of the American Medical Informatics Association. 27 (12): 1999–2010. doi:10.1093/jamia/ocaa245. ISSN 1527-974X. PMC 7727392. PMID 33166397.
  12. ^ "Liability and Leverage - A Case for Data Quality". Information Management. August 2006. from the original on 2011-01-27. Retrieved 2010-06-25.
  13. ^ . Directions Magazine. Archived from the original on 2005-04-28. Retrieved 2010-06-25.
  14. ^ "USPS | PostalPro" (PDF). (PDF) from the original on 2010-02-15. Retrieved 2010-06-25.
  15. ^ E. Curry, A. Freitas, and S. O'Riáin, "The Role of Community-Driven Data Curation for Enterprises", 2012-01-23 at the Wayback Machine in Linking Enterprise Data, D. Wood, Ed. Boston, Mass.: Springer US, 2010, pp. 25-47.
  16. ^ "ISO/TS 8000-1:2011 Data quality -- Part 1: Overview". International Organization for Standardization. from the original on 21 December 2016. Retrieved 8 December 2016.
  17. ^ . spotlessdata.com. Archived from the original on 2017-02-11.
  18. ^ "What is Data Cleansing? - Experian Data Quality". 13 February 2015. from the original on 11 February 2017. Retrieved 9 February 2017.
  19. ^ . Watch Free Video Training Online. Archived from the original on 2016-12-21. Retrieved 8 December 2016.
  20. ^ O'Donoghue, John, and John Herbert. "Data management within mHealth environments: Patient sensors, mobile devices, and databases". Journal of Data and Information Quality (JDIQ) 4.1 (2012): 5.
  21. ^ Huser, Vojtech; DeFalco, Frank J; Schuemie, Martijn; Ryan, Patrick B; Shang, Ning; Velez, Mark; Park, Rae Woong; Boyce, Richard D; Duke, Jon; Khare, Ritu; Utidjian, Levon; Bailey, Charles (30 November 2016). "Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Datasets". eGEMs. 4 (1): 24. doi:10.13063/2327-9214.1239. PMC 5226382. PMID 28154833.
  22. ^ MEASURE Evaluation. (2017) Improving data quality in mobile community-based health information systems: Guidelines for design and implementation (tr-17-182). Chapel Hill, NC: MEASURE Evaluation, University of North Carolina. Retrieved from https://www.measureevaluation.org/resources/publications/tr-17-182 2017-08-08 at the Wayback Machine
  23. ^ Wambugu, S. & Villella, C. (2016). mHealth for health information systems in low- and middle-income countries: Challenges and opportunities in data quality, privacy, and security (tr-16-140). Chapel Hill, NC: MEASURE Evaluation, University of North Carolina. Retrieved from https://www.measureevaluation.org/resources/publications/tr-16-140 2017-08-08 at the Wayback Machine
  24. ^ MEASURE Evaluation. (2016) Data quality for monitoring and evaluation systems (fs-16-170). Chapel Hill, NC: MEASURE Evaluation, University of North Carolina. Retrieved from https://www.measureevaluation.org/resources/publications/fs-16-170-en 2017-08-08 at the Wayback Machine
  25. ^ MEASURE Evaluation. (2016). Routine health information systems: A curriculum on basic concepts and practice - Syllabus (sr-16-135a). Chapel Hill, NC: MEASURE Evaluation, University of North Carolina. Retrieved from https://www.measureevaluation.org/resources/publications/sr-16-135a 2017-08-08 at the Wayback Machine
  26. ^ "Data quality assurance tools". MEASURE Evaluation. from the original on 8 August 2017. Retrieved 8 August 2017.
  27. ^ "Module 4: RHIS data quality". MEASURE Evaluation. from the original on 8 August 2017. Retrieved 8 August 2017.
  28. ^ MEASURE Evaluation. "Data quality". MEASURE Evaluation. from the original on 8 August 2017. Retrieved 8 August 2017.
  29. ^ The World Health Organization (WHO). (2009). Monitoring and evaluation of health systems strengthening. Geneva, Switzerland: WHO. Retrieved from http://www.who.int/healthinfo/HSS_MandE_framework_Nov_2009.pdf 2017-08-28 at the Wayback Machine
  30. ^ Mesgari, Mostafa; Chitu, Okoli; Mehdi, Mohamad; Finn Årup, Nielsen; Lanamäki, Arto (2015). ""The Sum of All Human Knowledge": A Systematic Review of Scholarly Research on the Content of Wikipedia" (PDF). Journal of the Association for Information Science and Technology. 66 (2): 219–245. doi:10.1002/asi.23172. S2CID 218071987. (PDF) from the original on 2020-05-10. Retrieved 2020-01-21.
  31. ^ Warncke-Wang, Morten; Cosley, Dan; Riedl, John (2013). Tell me more: An actionable quality model for wikipedia. WikiSym '13 Proceedings of the 9th International Symposium on Open Collaboration. doi:10.1145/2491055.2491063. ISBN 9781450318525. S2CID 18523960.
  32. ^ Hasan Dalip, Daniel; André Gonçalves, Marcos; Cristo, Marco; Calado, Pável (2009). "Automatic quality assessment of content created collaboratively by web communities". Proceedings of the 2009 joint international conference on Digital libraries - JCDL '09. p. 295. doi:10.1145/1555400.1555449. ISBN 9781605583228. S2CID 14421291.
  33. ^ Färber, Michael; Bartscherer, Frederic; Menne, Carsten; Rettinger, Achim (2017-11-30). "Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO". Semantic Web. 9 (1): 77–129. doi:10.3233/SW-170275. from the original on 2018-01-22.
  34. ^ "IQ International - the International Association for Information and Data Quality". IQ International website. from the original on 2017-05-10. Retrieved 2016-08-05.
  35. ^ "Home". ECCMA. from the original on 2018-08-19. Retrieved 2018-10-03.

Further reading

  • Baškarada, S; Koronios, A (2014). "A Critical Success Factors Framework for Information Quality Management". Information Systems Management. 31 (4): 1–20. doi:10.1080/10580530.2014.958023. S2CID 33018618.
  • Baamann, Katharina, "Data Quality Aspects of Revenue Assurance", Article
  • Eckerson, W. (2002) "Data Warehousing Special Report: Data quality and the bottom line",
  • Ivanov, K. (1972) . The University of Stockholm and The Royal Institute of Technology. Doctoral dissertation.
  • Hansen, M. (1991) Zero Defect Data, MIT. Masters thesis [1]
  • Kahn, B., Strong, D., Wang, R. (2002) "Information Quality Benchmarks: Product and Service Performance," Communications of the ACM, April 2002. pp. 184–192. Article
  • Price, R. and Shanks, G. (2004) A Semiotic Information Quality Framework, Proc. IFIP International Conference on Decision Support Systems (DSS2004): Decision Support in an Uncertain and Complex World, Prato.
  • Redman, T. C. (2008) Data Driven: Profiting From Our Most Important Business Asset
  • Wand, Y. and Wang, R. (1996) "Anchoring Data Quality Dimensions in Ontological Foundations," Communications of the ACM, November 1996. pp. 86–95. Article
  • Wang, R., Kon, H. & Madnick, S. (1993), Data Quality Requirements Analysis and Modelling, Ninth International Conference of Data Engineering, Vienna, Austria. Article
  • Fournel Michel, Accroitre la qualité et la valeur des données de vos clients, éditions Publibook, 2007. ISBN 978-2-7483-3847-8.
  • Daniel F., Casati F., Palpanas T., Chayka O., Cappiello C. (2008) "Enabling Better Decisions through Quality-aware Reports", International Conference on Information Quality (ICIQ), MIT. Article
  • Jack E. Olson (2003), "Data Quality: The Accuracy dimension", Morgan Kaufmann Publishers
  • Woodall P., Oberhofer M., and Borek A. (2014), "A Classification of Data Quality Assessment and Improvement Methods". International Journal of Information Quality 3 (4), 298–321. doi:10.1504/ijiq.2014.068656.
  • Woodall, P., Borek, A., and Parlikad, A. (2013), "Data Quality Assessment: The Hybrid Approach." Information & Management 50 (7), 369–382.

External links

  • Data quality course, from the Global Health Learning Center

data, quality, refers, state, qualitative, quantitative, pieces, information, there, many, definitions, data, quality, data, generally, considered, high, quality, intended, uses, operations, decision, making, planning, moreover, data, deemed, high, quality, co. Data quality refers to the state of qualitative or quantitative pieces of information There are many definitions of data quality but data is generally considered high quality if it is fit for its intended uses in operations decision making and planning 1 2 3 Moreover data is deemed of high quality if it correctly represents the real world construct to which it refers Furthermore apart from these definitions as the number of data sources increases the question of internal data consistency becomes significant regardless of fitness for use for any particular external purpose People s views on data quality can often be in disagreement even when discussing the same set of data used for the same purpose When this is the case data governance is used to form agreed upon definitions and standards for data quality In such cases data cleansing including standardization may be required in order to ensure data quality 4 Contents 1 Definitions 2 Dimensions of data quality 3 History 4 Overview 5 Data quality assurance 6 Data quality control 7 Optimum use of data quality 7 1 Health data security and privacy 8 Data quality in public health 9 Open data quality 10 Professional associations 10 1 ECCMA Electronic Commerce Code Management Association 11 See also 12 References 13 Further reading 14 External linksDefinitions EditDefining data quality is difficult due to the many contexts data are used in as well as the varying perspectives among end users producers and custodians of data 5 From a consumer perspective data quality is 5 data that are fit for use by data consumers data meeting or exceeding consumer expectations data that satisfies the requirements of its intended use From a business perspective data quality is data that are fit for use in their intended operational decision making and other roles or that exhibits conformance to standards that have been set so that fitness for use is achieved 6 data that are fit for their intended uses in operations decision making and planning 7 the capability of data to satisfy the stated business system and technical requirements of an enterprise 8 From a standards based perspective data quality is the degree to which a set of inherent characteristics quality dimensions of an object data fulfills requirements 9 5 the usefulness accuracy and correctness of data for its application 10 Arguably in all these cases data quality is a comparison of the actual state of a particular set of data to a desired state with the desired state being typically referred to as fit for use to specification meeting consumer expectations free of defect or meeting requirements These expectations specifications and requirements are usually defined by one or more individuals or groups standards organizations laws and regulations business policies or software development policies 5 Dimensions of data quality EditDrilling down further those expectations specifications and requirements are stated in terms of characteristics or dimensions of the data such as 5 6 7 8 11 accessibility or availability accuracy or correctness comparability completeness or comprehensiveness consistency coherence or clarity credibility reliability or reputation flexibility plausibility relevance pertinence or usefulness timeliness or latency uniqueness validity or reasonablenessA systematic scoping review of the literature suggests that data quality dimensions and methods with real world data are not consistent in the literature and as a result quality assessments are challenging due to the complex and heterogeneous nature of these data 11 History EditBefore the rise of the inexpensive computer data storage massive mainframe computers were used to maintain name and address data for delivery services This was so that mail could be properly routed to its destination The mainframes used business rules to correct common misspellings and typographical errors in name and address data as well as to track customers who had moved died gone to prison married divorced or experienced other life changing events Government agencies began to make postal data available to a few service companies to cross reference customer data with the National Change of Address registry NCOA This technology saved large companies millions of dollars in comparison to manual correction of customer data Large companies saved on postage as bills and direct marketing materials made their way to the intended customer more accurately Initially sold as a service data quality moved inside the walls of corporations as low cost and powerful server technology became available citation needed Companies with an emphasis on marketing often focused their quality efforts on name and address information but data quality is recognized by whom as an important property of all types of data Principles of data quality can be applied to supply chain data transactional data and nearly every other category of data found For example making supply chain data conform to a certain standard has value to an organization by 1 avoiding overstocking of similar but slightly different stock 2 avoiding false stock out 3 improving the understanding of vendor purchases to negotiate volume discounts and 4 avoiding logistics costs in stocking and shipping parts across a large organization citation needed For companies with significant research efforts data quality can include developing protocols for research methods reducing measurement error bounds checking of data cross tabulation modeling and outlier detection verifying data integrity etc citation needed Overview EditThere are a number of theoretical frameworks for understanding data quality A systems theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science Ivanov 1972 One framework dubbed Zero Defect Data Hansen 1991 adapts the principles of statistical process control to data quality Another framework seeks to integrate the product perspective conformance to specifications and the service perspective meeting consumers expectations Kahn et al 2002 Another framework is based in semiotics to evaluate the quality of the form meaning and use of the data Price and Shanks 2004 One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously Wand and Wang 1996 A considerable amount of data quality research involves investigating and describing various categories of desirable attributes or dimensions of data Nearly 200 such terms have been identified and there is little agreement in their nature are these concepts goals or criteria their definitions or measures Wang et al 1993 Software engineers may recognize this as a similar problem to ilities MIT has an Information Quality MITIQ Program led by Professor Richard Wang which produces a large number of publications and hosts a significant international conference in this field International Conference on Information Quality ICIQ This program grew out of the work done by Hansen on the Zero Defect Data framework Hansen 1991 In practice data quality is a concern for professionals involved with a wide range of information systems ranging from data warehousing and business intelligence to customer relationship management and supply chain management One industry study estimated the total cost to the U S economy of data quality problems at over U S 600 billion per annum Eckerson 2002 Incorrect data which includes invalid and outdated information can originate from different data sources through data entry or data migration and conversion projects 12 In 2002 the USPS and PricewaterhouseCoopers released a report stating that 23 6 percent of all U S mail sent is incorrectly addressed 13 One reason contact data becomes stale very quickly in the average database more than 45 million Americans change their address every year 14 In fact the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality In some who organizations this data governance function has been established as part of a larger Regulatory Compliance function a recognition of the importance of Data Information Quality to organizations Problems with data quality don t only arise from incorrect data inconsistent data is a problem as well Eliminating data shadow systems and centralizing data in a warehouse is one of the initiatives a company can take to ensure data consistency Enterprises scientists and researchers are starting to participate within data curation communities to improve the quality of their common data 15 The market is going some way to providing data quality assurance A number of vendors make tools for analyzing and repairing poor quality data in situ service providers can clean the data on a contract basis and consultants can advise on fixing processes or systems to avoid data quality problems in the first place Most data quality tools offer a series of tools for improving data which may include some or all of the following Data profiling initially assessing the data to understand its current state often including value distributions Data standardization a business rules engine that ensures that data conforms to standards Geocoding for name and address data Corrects data to U S and Worldwide geographic standards Matching or Linking a way to compare data so that similar but slightly different records can be aligned Matching may use fuzzy logic to find duplicates in the data It often recognizes that Bob and Bbo may be the same individual It might be able to manage householding or finding links between spouses at the same address for example Finally it often can build a best of breed record taking the best components from multiple data sources and building a single super record Monitoring keeping track of data quality over time and reporting variations in the quality of data Software can also auto correct the variations based on pre defined business rules Batch and Real time Once the data is initially cleansed batch companies often want to build the processes into enterprise applications to keep it clean There are several well known authors and self styled experts with Larry English perhaps the most popular guru In addition IQ International the International Association for Information and Data Quality was established in 2004 to provide a focal point for professionals and researchers in this field ISO 8000 is an international standard for data quality 16 Data quality assurance EditData quality assurance is the process of data profiling to discover inconsistencies and other anomalies in the data as well as performing data cleansing 17 18 activities e g removing outliers missing data interpolation to improve the data quality These activities can be undertaken as part of data warehousing or as part of the database administration of an existing piece of application software 19 Data quality control EditData quality control is the process of controlling the usage of data for an application or a process This process is performed both before and after a Data Quality Assurance QA process which consists of discovery of data inconsistency and correction Before Restricts inputsAfter QA process the following statistics are gathered to guide the Quality Control QC process Severity of inconsistency Incompleteness Accuracy Precision Missing UnknownThe Data QC process uses the information from the QA process to decide to use the data for analysis or in an application or business process General example if a Data QC process finds that the data contains too many errors or inconsistencies then it prevents that data from being used for its intended process which could cause disruption Specific example providing invalid measurements from several sensors to the automatic pilot feature on an aircraft could cause it to crash Thus establishing a QC process provides data usage protection citation needed Optimum use of data quality EditData Quality DQ is a niche area required for the integrity of the data management by covering gaps of data issues This is one of the key functions that aid data governance by monitoring data to find exceptions undiscovered by current data management operations Data Quality checks may be defined at attribute level to have full control on its remediation steps citation needed DQ checks and business rules may easily overlap if an organization is not attentive of its DQ scope Business teams should understand the DQ scope thoroughly in order to avoid overlap Data quality checks are redundant if business logic covers the same functionality and fulfills the same purpose as DQ The DQ scope of an organization should be defined in DQ strategy and well implemented Some data quality checks may be translated into business rules after repeated instances of exceptions in the past citation needed Below are a few areas of data flows that may need perennial DQ checks Completeness and precision DQ checks on all data may be performed at the point of entry for each mandatory attribute from each source system Few attribute values are created way after the initial creation of the transaction in such cases administering these checks becomes tricky and should be done immediately after the defined event of that attribute s source and the transaction s other core attribute conditions are met All data having attributes referring to Reference Data in the organization may be validated against the set of well defined valid values of Reference Data to discover new or discrepant values through the validity DQ check Results may be used to update Reference Data administered under Master Data Management MDM All data sourced from a third party to organization s internal teams may undergo accuracy DQ check against the third party data These DQ check results are valuable when administered on data that made multiple hops after the point of entry of that data but before that data becomes authorized or stored for enterprise intelligence All data columns that refer to Master Data may be validated for its consistency check A DQ check administered on the data at the point of entry discovers new data for the MDM process but a DQ check administered after the point of entry discovers the failure not exceptions of consistency As data transforms multiple timestamps and the positions of that timestamps are captured and may be compared against each other and its leeway to validate its value decay operational significance against a defined SLA service level agreement This timeliness DQ check can be utilized to decrease data value decay rate and optimize the policies of data movement timeline In an organization complex logic is usually segregated into simpler logic across multiple processes Reasonableness DQ checks on such complex logic yielding to a logical result within a specific range of values or static interrelationships aggregated business rules may be validated to discover complicated but crucial business processes and outliers of the data its drift from BAU business as usual expectations and may provide possible exceptions eventually resulting into data issues This check may be a simple generic aggregation rule engulfed by large chunk of data or it can be a complicated logic on a group of attributes of a transaction pertaining to the core business of the organization This DQ check requires high degree of business knowledge and acumen Discovery of reasonableness issues may aid for policy and strategy changes by either business or data governance or both Conformity checks and integrity checks need not covered in all business needs it s strictly under the database architecture s discretion There are many places in the data movement where DQ checks may not be required For instance DQ check for completeness and precision on not null columns is redundant for the data sourced from database Similarly data should be validated for its accuracy with respect to time when the data is stitched across disparate sources However that is a business rule and should not be in the DQ scope citation needed Regretfully from a software development perspective DQ is often seen as a nonfunctional requirement And as such key data quality checks processes are not factored into the final software solution Within Healthcare wearable technologies or Body Area Networks generate large volumes of data 20 The level of detail required to ensure data quality is extremely high and is often underestimated This is also true for the vast majority of mHealth apps EHRs and other health related software solutions However some open source tools exist that examine data quality 21 The primary reason for this stems from the extra cost involved is added a higher degree of rigor within the software architecture Health data security and privacy Edit The use of mobile devices in health or mHealth creates new challenges to health data security and privacy in ways that directly affect data quality 2 mHealth is an increasingly important strategy for delivery of health services in low and middle income countries 22 Mobile phones and tablets are used for collection reporting and analysis of data in near real time However these mobile devices are commonly used for personal activities as well leaving them more vulnerable to security risks that could lead to data breaches Without proper security safeguards this personal use could jeopardize the quality security and confidentiality of health data 23 Data quality in public health EditData quality has become a major focus of public health programs in recent years especially as demand for accountability increases 24 Work towards ambitious goals related to the fight against diseases such as AIDS Tuberculosis and Malaria must be predicated on strong Monitoring and Evaluation systems that produce quality data related to program implementation 25 These programs and program auditors increasingly seek tools to standardize and streamline the process of determining the quality of data 26 verify the quality of reported data and assess the underlying data management and reporting systems for indicators 27 An example is WHO and MEASURE Evaluation s Data Quality Review Tool 28 WHO the Global Fund GAVI and MEASURE Evaluation have collaborated to produce a harmonized approach to data quality assurance across different diseases and programs 29 Open data quality EditThere are a number of scientific works devoted to the analysis of the data quality in open data sources such as Wikipedia Wikidata DBpedia and other In the case of Wikipedia quality analysis may relate to the whole article 30 Modeling of quality there is carried out by means of various methods Some of them use machine learning algorithms including Random Forest 31 Support Vector Machine 32 and others Methods for assessing data quality in Wikidata DBpedia and other LOD sources differ 33 Professional associations EditIQ International the International Association for Information and Data Quality 34 IQ International is a not for profit vendor neutral professional association formed in 2004 dedicated to building the information and data quality profession ECCMA Electronic Commerce Code Management Association Edit The Electronic Commerce Code Management Association ECCMA is a member based international not for profit association committed to improving data quality through the implementation of international standards ECCMA is the current project leader for the development of ISO 8000 and ISO 22745 which are the international standards for data quality and the exchange of material and service master data respectively ECCMA provides a platform for collaboration amongst subject experts on data quality and data governance around the world to build and maintain global open standard dictionaries that are used to unambiguously label information The existence of these dictionaries of labels allows information to be passed from one computer system to another without losing meaning 35 See also EditData quality firewall Data validation Record linkage Information quality Master data management Data governance Database normalization Data visualization Data Analysis Clinical data managementReferences Edit Redman Thomas C 30 December 2013 Data Driven Profiting from Your Most Important Business Asset Harvard Business Press ISBN 978 1 4221 6364 1 a b Fadahunsi Kayode Philip Akinlua James Tosin O Connor Siobhan Wark Petra A Gallagher Joseph Carroll Christopher Majeed Azeem O Donoghue John March 2019 Protocol for a systematic review and qualitative synthesis of information quality frameworks in eHealth BMJ Open 9 3 e024722 doi 10 1136 bmjopen 2018 024722 ISSN 2044 6055 PMC 6429947 PMID 30842114 Fadahunsi Kayode Philip O Connor Siobhan Akinlua James Tosin Wark Petra A Gallagher Joseph Carroll Christopher Car Josip Majeed Azeem O Donoghue John 2021 05 17 Information Quality Frameworks for Digital Health Technologies Systematic Review Journal of Medical Internet Research 23 5 e23479 doi 10 2196 23479 PMC 8167621 PMID 33835034 Smallwood R F 2014 Information Governance Concepts Strategies and Best Practices John Wiley and Sons p 110 ISBN 9781118218303 Archived from the original on 2020 07 30 Retrieved 2020 04 18 Having a standardized data governance program in place means cleaning up corrupted or duplicated data and providing users with clean accurate data as a basis for line of business software applications and for decision support analytics in business intelligence BI applications a b c d e Furber C 2015 3 Data Quality Data Quality Management with Semantic Technologies Springer pp 20 55 ISBN 9783658122249 Archived from the original on 31 July 2020 Retrieved 18 April 2020 a b Herzog T N Scheuren F J Winkler W E 2007 Chapter 2 What is data quality and why should we care Data Quality and Record Linkage Techniques Springer Science amp Business Media pp 7 15 ISBN 9780387695020 Archived from the original on 31 July 2020 Retrieved 18 April 2020 a href Template Cite book html title Template Cite book cite book a CS1 maint multiple names authors list link a b Fleckenstein M Fellows L 2018 Chapter 11 Data Quality Modern Data Strategy Springer pp 101 120 ISBN 9783319689920 Archived from the original on 31 July 2020 Retrieved 18 April 2020 a href Template Cite book html title Template Cite book cite book a CS1 maint multiple names authors list link a b Mahanti R 2019 Chapter 1 Data Data Quality and Cost of Poor Data Quality Data Quality Dimensions Measurement Strategy Management and Governance Quality Press pp 5 6 ISBN 9780873899772 Archived from the original on 23 November 2020 Retrieved 18 April 2020 International Organization for Standardization September 2015 ISO 9000 2015 en Quality management systems Fundamentals and vocabulary International Organization for Standardization Archived from the original on 19 May 2020 Retrieved 18 April 2020 NIST Big Data Public Working Group Definitions and Taxonomies Subgroup October 2019 NIST Big Data Interoperability Framework Volume 4 Security and Privacy PDF NIST Special Publication 1500 4r2 3rd ed National Institute of Standards and Technology doi 10 6028 NIST SP 1500 4r2 Archived PDF from the original on 9 May 2020 Retrieved 18 April 2020 Validity refers to the usefulness accuracy and correctness of data for its application Traditionally this has been referred to as data quality a b Bian Jiang Lyu Tianchen Loiacono Alexander Viramontes Tonatiuh Mendoza Lipori Gloria Guo Yi Wu Yonghui Prosperi Mattia George Thomas J Harle Christopher A Shenkman Elizabeth A 2020 12 09 Assessing the practice of data quality evaluation in a national clinical data research network through a systematic scoping review in the era of real world data Journal of the American Medical Informatics Association 27 12 1999 2010 doi 10 1093 jamia ocaa245 ISSN 1527 974X PMC 7727392 PMID 33166397 Liability and Leverage A Case for Data Quality Information Management August 2006 Archived from the original on 2011 01 27 Retrieved 2010 06 25 Address Management for Mail Order and Retail Directions Magazine Archived from the original on 2005 04 28 Retrieved 2010 06 25 USPS PostalPro PDF Archived PDF from the original on 2010 02 15 Retrieved 2010 06 25 E Curry A Freitas and S O Riain The Role of Community Driven Data Curation for Enterprises Archived 2012 01 23 at the Wayback Machine in Linking Enterprise Data D Wood Ed Boston Mass Springer US 2010 pp 25 47 ISO TS 8000 1 2011 Data quality Part 1 Overview International Organization for Standardization Archived from the original on 21 December 2016 Retrieved 8 December 2016 Can you trust the quality of your data spotlessdata com Archived from the original on 2017 02 11 What is Data Cleansing Experian Data Quality 13 February 2015 Archived from the original on 11 February 2017 Retrieved 9 February 2017 Lecture 23 Data Quality Concepts Tutorial Data Warehousing Watch Free Video Training Online Archived from the original on 2016 12 21 Retrieved 8 December 2016 O Donoghue John and John Herbert Data management within mHealth environments Patient sensors mobile devices and databases Journal of Data and Information Quality JDIQ 4 1 2012 5 Huser Vojtech DeFalco Frank J Schuemie Martijn Ryan Patrick B Shang Ning Velez Mark Park Rae Woong Boyce Richard D Duke Jon Khare Ritu Utidjian Levon Bailey Charles 30 November 2016 Multisite Evaluation of a Data Quality Tool for Patient Level Clinical Datasets eGEMs 4 1 24 doi 10 13063 2327 9214 1239 PMC 5226382 PMID 28154833 MEASURE Evaluation 2017 Improving data quality in mobile community based health information systems Guidelines for design and implementation tr 17 182 Chapel Hill NC MEASURE Evaluation University of North Carolina Retrieved from https www measureevaluation org resources publications tr 17 182 Archived 2017 08 08 at the Wayback Machine Wambugu S amp Villella C 2016 mHealth for health information systems in low and middle income countries Challenges and opportunities in data quality privacy and security tr 16 140 Chapel Hill NC MEASURE Evaluation University of North Carolina Retrieved from https www measureevaluation org resources publications tr 16 140 Archived 2017 08 08 at the Wayback Machine MEASURE Evaluation 2016 Data quality for monitoring and evaluation systems fs 16 170 Chapel Hill NC MEASURE Evaluation University of North Carolina Retrieved from https www measureevaluation org resources publications fs 16 170 en Archived 2017 08 08 at the Wayback Machine MEASURE Evaluation 2016 Routine health information systems A curriculum on basic concepts and practice Syllabus sr 16 135a Chapel Hill NC MEASURE Evaluation University of North Carolina Retrieved from https www measureevaluation org resources publications sr 16 135a Archived 2017 08 08 at the Wayback Machine Data quality assurance tools MEASURE Evaluation Archived from the original on 8 August 2017 Retrieved 8 August 2017 Module 4 RHIS data quality MEASURE Evaluation Archived from the original on 8 August 2017 Retrieved 8 August 2017 MEASURE Evaluation Data quality MEASURE Evaluation Archived from the original on 8 August 2017 Retrieved 8 August 2017 The World Health Organization WHO 2009 Monitoring and evaluation of health systems strengthening Geneva Switzerland WHO Retrieved from http www who int healthinfo HSS MandE framework Nov 2009 pdf Archived 2017 08 28 at the Wayback Machine Mesgari Mostafa Chitu Okoli Mehdi Mohamad Finn Arup Nielsen Lanamaki Arto 2015 The Sum of All Human Knowledge A Systematic Review of Scholarly Research on the Content of Wikipedia PDF Journal of the Association for Information Science and Technology 66 2 219 245 doi 10 1002 asi 23172 S2CID 218071987 Archived PDF from the original on 2020 05 10 Retrieved 2020 01 21 Warncke Wang Morten Cosley Dan Riedl John 2013 Tell me more An actionable quality model for wikipedia WikiSym 13 Proceedings of the 9th International Symposium on Open Collaboration doi 10 1145 2491055 2491063 ISBN 9781450318525 S2CID 18523960 Hasan Dalip Daniel Andre Goncalves Marcos Cristo Marco Calado Pavel 2009 Automatic quality assessment of content created collaboratively by web communities Proceedings of the 2009 joint international conference on Digital libraries JCDL 09 p 295 doi 10 1145 1555400 1555449 ISBN 9781605583228 S2CID 14421291 Farber Michael Bartscherer Frederic Menne Carsten Rettinger Achim 2017 11 30 Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Semantic Web 9 1 77 129 doi 10 3233 SW 170275 Archived from the original on 2018 01 22 IQ International the International Association for Information and Data Quality IQ International website Archived from the original on 2017 05 10 Retrieved 2016 08 05 Home ECCMA Archived from the original on 2018 08 19 Retrieved 2018 10 03 Further reading EditBaskarada S Koronios A 2014 A Critical Success Factors Framework for Information Quality Management Information Systems Management 31 4 1 20 doi 10 1080 10580530 2014 958023 S2CID 33018618 Baamann Katharina Data Quality Aspects of Revenue Assurance Article Eckerson W 2002 Data Warehousing Special Report Data quality and the bottom line Article Ivanov K 1972 Quality control of information On the concept of accuracy of information in data banks and in management information systems The University of Stockholm and The Royal Institute of Technology Doctoral dissertation Hansen M 1991 Zero Defect Data MIT Masters thesis 1 Kahn B Strong D Wang R 2002 Information Quality Benchmarks Product and Service Performance Communications of the ACM April 2002 pp 184 192 Article Price R and Shanks G 2004 A Semiotic Information Quality Framework Proc IFIP International Conference on Decision Support Systems DSS2004 Decision Support in an Uncertain and Complex World Prato Article Redman T C 2008 Data Driven Profiting From Our Most Important Business Asset Wand Y and Wang R 1996 Anchoring Data Quality Dimensions in Ontological Foundations Communications of the ACM November 1996 pp 86 95 Article Wang R Kon H amp Madnick S 1993 Data Quality Requirements Analysis and Modelling Ninth International Conference of Data Engineering Vienna Austria Article Fournel Michel Accroitre la qualite et la valeur des donnees de vos clients editions Publibook 2007 ISBN 978 2 7483 3847 8 Daniel F Casati F Palpanas T Chayka O Cappiello C 2008 Enabling Better Decisions through Quality aware Reports International Conference on Information Quality ICIQ MIT Article Jack E Olson 2003 Data Quality The Accuracy dimension Morgan Kaufmann Publishers Woodall P Oberhofer M and Borek A 2014 A Classification of Data Quality Assessment and Improvement Methods International Journal of Information Quality 3 4 298 321 doi 10 1504 ijiq 2014 068656 Woodall P Borek A and Parlikad A 2013 Data Quality Assessment The Hybrid Approach Information amp Management 50 7 369 382 External links EditData quality course from the Global Health Learning Center Retrieved from https en wikipedia org w index php title Data quality amp oldid 1095457146, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.