摘要
针对多源异构大数据中传统真值发现算法可扩展性不足、增量真值发现效果差等问题,本文将Map-Reduce框架和贝叶斯真值发现模型相结合,提出了基于Map-Reduce的并行真值发现算法;在MPTF算法基础上,引入Incoop增量框架和基于投票机制的分类器集成策略,并优化了Map过程和Reduce过程,提出了一种高效的大数据增量真值发现算法;实验表明:该算法不仅提高了分类器的准确性,而且实现了新增数据源的真值发现。通过理论分析和实验对比证明,该算法具有高效性和广泛适用性,同时可以兼顾多种现实中的复杂情形。
Considering the lack of extensibility of the traditional truth value discovery algorithm and its poor discovery of incremental truth value in multi-source heterogeneous big data,a parallel truth discovery algorithm based on the Map-Reduce model( MPTF) is proposed in this paper;it is developed by combining the Map-Reduce framework with a Bayesian truth value discovery model. On the basis of MPTF,by introducing Incoop increment framework and a voting mechanism-based classifier integration strategy and optimizing the Map process and Reduce process,we develop a high-efficiency incremental truth discovery algorithm,IncooMPTF. This algorithm cannot only improve the classifier accuracy,but also realize truth discovery based on new data sources. Theoretical analysis and experimental results show that the algorithm has characteristics of high efficiency and wide applicability,and it can take account of various complicated real-life situations.
作者
谭龙
张晓琪
贾立
李建中
王宏志
TAN Long;ZHANG Xiaoqi;JIA li;LI jianzhong;WANG hongzhi(School of Computer Science and Technology,Heilongjiang University,Harbin 150080,China;Department of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2019年第4期805-812,共8页
Journal of Harbin Engineering University
基金
国家自然科学基金面上项目(81273649)
黑龙江省自然科学基金面上项目(F201434)