About

An Analysis of Data Quality Dimensions

Vimukthi Jayawardane, Marta Indulska, Shazia Sadiq

The University of Queensland

 


Data quality (DQ) has been studied in significant depth over the last two decades and has received attention from both the academic and the practitioner community. Over that period of time a large number of data quality dimensions have been identified in due course of research and practice. While it is important to embrace the diversity of views of data quality, it is equally important for the data quality research and practitioner community to be united in the consistent interpretation of this foundational concept. In this paper, we provide a step towards this consistent interpretation. Through a systematic review of research and practitioner literature, we identify previously published data quality dimensions and embark on the analysis and consolidation of the overlapping and inconsistent definitions. We stipulate that the shared understanding facilitated by this consolidation is a necessary prelude to generic and declarative forms of requirements modeling for data quality.

 

 

Dimension Characteristic Description References
Completeness Completeness of mandatory attributes The attributes which are necessary for a complete representation of a real world entity must 
contain values and cannot be null
[1-5]
Completeness of optional attributes Optional attributes should not contain  invalid null values [2, 6]
Completeness of records Every real world entity instance that is relevant for the organization can be found in the data. [1, 7-10]
Data volume The volume of data is neither deficient nor overwhelming to perform an intended task [11, 12]
Availability  & Accessibility Continuity of Data Access The technology infrastructure should not prohibit the speed and continuity of access to the data
 for the users.
[9, 11, 13]
Data maintainability Data should be accessible to perform necessary updates and maintenance operations in its
 entire lifecycle.
[11, 12]
Data awareness Data users should be aware of all available data and its location. [8]
Ease of data access Data should be easily accessible in a form that is suitable for its intended use. [14-16]
Data Punctuality Data should be available at the time of its intended use. [1, 4, 11, 16]
Data access control The access to the data should be controlled to ensure it is secure against damage or
 unauthorised access.
[9, 11, 14, 15]
Currency Data timeliness Data which refers to time should be available for use within an acceptable time relative
to its time of creation.
[1, 3, 5, 8, 9, 12, 13, 15]
Data Freshness Data which is subjected to changes over the time should be fresh and up-to-date with
 respect to its intended use.
[2, 4, 6, 7, 11, 12]
Accuracy Accuracy to reference source Data should agree with an identified source. [1, 2, 4, 6, 8, 12-14]
Accuracy to reality Data should truly reflect the real world. [1, 3, 5, 9-11, 15]
Precision Attribute values should be accurate as per linguistics and granularity. [1, 2, 6, 7, 11, 15]
Validity Business rules compliance Calculations on data must comply with business rules. [1, 3]
Meta-data compliance Data should comply with its metadata [1-6, 9]
Standards and Regulatory compliance All data processing activities should comply with the policies, procedures, standards, industry
 benchmark practices and all regulatory requirements  that the organization is bound by.
[1, 5, 8, 12]
Statistical validity Computed data must be statistically valid. [8, 16]
Reliability Source Quality Data used is from trusted and credible sources. [1, 2, 13-15]
Objectivity Data are unbiased and impartial. [1, 11, 13, 14]
Traceability The lineage of the data is verifiable. [10, 11, 15]
Consistency Uniqueness The data is uniquely identifiable. [4, 5, 9]
Non-redundancy The data is recorded in exactly one place. [1, 3, 12]
Semantic consistency Data is semantically consistent. [1, 7, 15]
Value consistency Data values are consistent and do not provide conflicting or heterogeneous instances. [1-3, 5, 13]
Format consistency Data formats are consistently used. [12, 15]
Referential integrity Data relationships are represented through referential integrity rules. [1, 4, 9]
Usability and Interpretability Usefulness and relevance The data is useful and relevant for the task at hand. [1, 8, 9, 11, 14-16]
Understandability The data is understandable. [1, 2, 5, 6, 8, 13-15]
Appropriate Presentation The data presentation is aligned with its use. [1, 2, 6, 9, 12]
Interpretability Data should be interpretable. [6-8, 16]
Information value The value that is delivered by quality information should be effectively evaluated and
continuously monitored in the organizational context.
[2, 12-14]

 

Sources

[1] English, L.P., Information quality applied: Best practices for improving
 business information, processes and systems
. 2009: Wiley Publishing.
Check References
[2] Loshin, D., Enterprise knowledge management: The data quality approach.
 2001: Morgan Kaufmann Pub.
Check References
[3] Gatling G., ChamplinC.B. R., StefaniH. , WeigelG., Enterprise Information
 Management with SAP
. 2007, Boston: Galileo Press Inc.
Check References
[4] Loshin, D., Monitoring Data quality Performance using Data Quality Metrics.
Informatica Corporation, 2006.
Check References
[5] Byrne, J.K., D. Mccarty, G. Sauter, H. Smith, P Worcester, The information perspective
of SOA design Part 6:The value of applying the data quality analysis pattern in SOA
. 2008: IBM corporation.
Check References
[6] Redman, T.C., Data quality for the information age
1997: Artech House, Inc.
Check References
[7] Kimball, R. and J. Caserta, The data warehouse ETL toolkit: practical techniques
 for extracting.
Cleaning, Conforming, and Delivering, Digitized Format, originally published, 2004.
Check References
[8] HIQA, International Review of Data Quality Health Information and Quality Authority
 (HIQA), Ireland. http://www.hiqa.ie/press-release/2011-04-28-international-review-data-quality., 2011.
Check References
[9] Price, R.J. and G. Shanks. Empirical refinement of a semiotic information quality framework.
in System Sciences, 2005. HICSS’05. Proceedings of the 38th Annual Hawaii International Conference on. 2005. IEEE.
Check References
[10] ISO, ISO 8000-2 Data Quality-Part 2-Vocabulary. 2012, ISO. Check References
[11] Eppler, M.J., Managing information quality: increasing the value of
 information in knowledge-intensive products and processes
. 2006: Springer.
Check References
[12] McGilvray, D., Executing data quality projects: Ten steps to quality data and trusted
 information
. 2008: Morgan Kaufmann.
Check References
[13] Scannapieco, M. and T. Catarci, Data quality under a computer science 
perspective.
Archivi & Computer, 2002. 2: p. 1-15.
Check References
[14] Wang, R.Y. and D.M. Strong, Beyond accuracy: What data quality means to data 
consumers.
Journal of management information systems, 1996: p. 5-33.
Check References
[15] Stvilia, B., et al., A framework for information quality assessment.
Journal of the American Society for Information Science and Technology, 2007. 58(12): p. 1720-1733.
Check References
[16] Lyon, M., Assessing Data Quality,Monetary and Financial Statistics.Bank of
 England. http://www.bankofengland.co.uk/statistics/Documents/ms/articles/art1mar08.pdf., 2008.
Check References

 

 

More detais about the above classification and related work can be found in the following publications.

 

 

 

 

 

 

  • Shazia Sadiq, Naiem Khodabandehloo Yeganeh and Marta Indulska. 20 years of data quality research: Themes, trends and synergies. In: Heng Tao Shen and Yanchun Zhang, Conferences in Research and Practice in Information Technology. Proceedings of: The 22nd Australasian Database Conference (ADC 2011). Australasian Database Conference [ADC], Perth, WA, Australia, (1-10). 17-20 January 2011.

 

  • Shazia Sadiq, Naiem Khodabandehloo Yeganeh and Marta Indulska.  An Analysis of Cross-Disciplinary Collaborations in Data Quality Research. European Conference on Information Systems (ECIS2011), Helsinki, Finland, 9-11 June 2011