摘要
数据时效性是影响数据质量的重要因素,可靠的数据时效性对数据检索的精确度、数据分析结论的可信性起到关键作用.数据时效不精确、数据过时等现象给大数据应用带来诸多问题,很大程度上影响着数据价值的发挥.对于缺失了时间戳或者时间不准确的数据,精确恢复其时间戳是困难的,但可以依据一定的规则对其时间先后顺序进行还原恢复,满足数据清洗及各类应用需求.在数据时效性应用需求分析的基础上,首先明确了属性的时效规则相关概念,对属性的时效规则等进行了形式化定义;然后提出了基于图模型的时效规则发现以及数据时序修复算法;随后,对相关算法进行了实现,并在真实数据集上对算法运行效率、修复正确率等进行了测试,分析了影响算法修复数据正确率的一些影响因素,对算法进行了较为全面的分析评价.实验结果表明,算法具有较高的执行效率和较好的时效修复效果.
Data currency is an important factor influencing the data quality. The reliability of data currency plays a critical role in data retrieval accuracy and data analysis credibility. Inaccurate data currency and outdated data bring many problems to the application of big data, which greatly affects the exertion of data value. For data that with imprecise time attribute or missing timestamp, exact repair of timestamp is often difficult, but it is possible to restore the currency orders according to specific currency based rules to meet various requirements in data cleaning and applications. Based on the analysis of data currency application requirements, this study first introduces the related concepts of data currency, defines attributes currency-based rules in formal method, and then proposes the currency rules discovery algorithm and the currency repair method. The algorithms efficiency and recovery effect are tested on real dataset, the factors that affect accuracy of the algorithms are analyzed. Experimental results show that the proposed methods are efficient and effective.
作者
段旭良
郭兵
沈艳
申云成
董祥千
张洪
DUAN Xu-Liang;GUO Bing;SHEN Yan;SHEN Yun-Cheng;DONG Xiang-Qian;ZHANG Hong(College of Computer Science, Sichuan University, Chengdu 610065, China;School of Control Engineering, Chengdu University of Information Technology, Chengdu 610054, China;College of Information Engineering, Sichuan Agricultural University, Yaan 625014, China)
出处
《软件学报》
EI
CSCD
北大核心
2019年第3期589-603,共15页
Journal of Software
基金
国家自然科学基金(61332001
61772352
61472050)
四川省科技计划(2019ZDZX0045
2019ZDZX0010
2018ZDZX0010
2017GZDZX0003
2018JY0182)~~