期刊文献+

数据质量多种性质的关联关系研究 被引量:33

Association Relationships Study of Multi-Dimensional Data Quality
下载PDF
导出
摘要 信息化时代数据海量增长的同时,用户需要利用多种指标从不同性质角度对数据质量进行评价和改善.但在目前数据质量管理过程中,影响数据可用性的多种重要因素并非完全孤立,在评估机制和指导数据清洗规则时,彼此会发生关联.研究了在实际信息系统中适用的综合性数据质量评估方法,将文献所提出以及在实际的信息系统中常用的数据质量性质指标按其定义与性质进行了归纳总结,提出了基于性质的数据质量综合评估框架.之后针对影响数据可用性的4个重要性质:精确性、完整性、一致性以及时效性整理出在数据集合上的操作方法,并逐一介绍其违反模式的定义,随后给出其具体关系证明,进而确定数据质量多维关联关系评估策略,并通过实验验证了该策略的有效性. Recently, with the rapid growth of data quantity, users are using a variety of indicators to evaluate and improve the quality of data from different dimensions. During the course of data quality management, it is found that many important factors that influence the data availability are not completely isolated. In the evaluation mechanism which can guide data cleaning rules, these dimensions may be associated with each other. In this paper, serveral data quality dimensions researched in the literature as well as being used in the real information system are discussed, and accordingly the definition and properties of the dimensions are summarized. In addition, a multi-dimensional data quality assessment framework is proposed. According to the four important properties of data availability: Accuracy, completeness, consistency and currency, the operation method and the relationships among them on the data set are constructed. Finally, a multi-dimensional data quality accessment strategy is created. The effctiveness of the proposed strategy is verified by experiments.
出处 《软件学报》 EI CSCD 北大核心 2016年第7期1626-1644,共19页 Journal of Software
基金 国家重点基础研究发展计划(973)(2012CB316200) 国家自然科学基金(U1509216 61472099 61133002) 黑龙江省留学回国人员基金(LC2016026)~~
关键词 数据质量 数据质量性质 多性质关系 数据清洗 数据管理 data quality data quality dimension relationship among dimensions data cleaning data management
  • 相关文献

参考文献1

二级参考文献10

  • 1Eckerson W W. Data quality and the bottom line: Achieving business success through a commitment to high quality data. Data Warehousing Institute: Technical Report TDWI Report Series, 2002.
  • 2Zhang H, Diao Y, Immerman N. Recognizing patterns in streams with imprecise timestamps. Proceedings of the VLDB Endowment, 2010, 3(1-2): 244-255.
  • 3Fan W, Geerts F, Wijsen J. Determining the currency of data//Proceedings of the ACM Symposium on Principles of Database Systems(PODS). Athens, Greece, 2011:71-82.
  • 4Berti-EquiUe L, Sarma A D, Dong X, Marian A, Srivastava D.Sailing the information ocean with awareness of currents: Discovery and application of source dependence//Proceedings of the Conference on Innovative Data Systems Research (CIDR). Asilomar, CA, USA, 2009.
  • 5Dong X, Berti-Equille L, Hu Y, Srivastava D. Global detec- tion of complex copying relationships between sources. Pro- ceedings of the VLDB Endowment, 2010, 3(1 2) : 1358-1369.
  • 6Dong X, Berti-Equille L, Srivastava D. Truth discovery and copying detection in a dynamic world. Proceedings of the VLDB Endowment, 2009, 2(1) : 562-573.
  • 7Clifford J, Dyreson C E, Isakowitz T, Jensen C S, Snodgrass R T. On the semantics of "now" in databases. ACM Transactions on Database Systems (TODS), 1997, 22 (2):171-214.
  • 8Snodgrass R T, Gao D, Zhang R, Thomas S W. Temporal support for persistent stored modules//Proceedings of the 1EEE International Conference on Data Engineering (ICDE). Washington, DC, USA, 2012.
  • 9Bodirsky M, Kara J. The cortxplexity of temporal constraint satisfaction problems//Proceedings of the 40th Annual ACM Symposium on Theory of Computing. Victoria, British Columbia, Canada, 2008:29-38.
  • 10Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate record detection: A survey. Transactions on Knowledge and Data Engineering (TKDE), 2007, 19(1) : 1-16.

共引文献18

同被引文献257

引证文献33

二级引证文献260

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部