摘要
目前多数真值发现算法建立在真值唯一的基础上,无法处理多真值的情况。为此,针对冲突Deep Web数据的多真值发现问题,借鉴HITS算法思想,定义视图权威度与视图描述可信度,两者相互影响。在此基础上,定义视图链接关系图,提出多真值迭代发现算法MTF。当算法收敛时,权威度最大的视图就是真值。在Book-Authors数据集上进行实验,结果表明,与基准算法VOTE相比,MTF算法的精确度大幅提高。
Based on the assumption of only one truth value, most of current truth value discovery algorithm cannot process the multiple truth value condition. In order to solve this problem, aiming at the multiple true value discovery problem in conflicting Deep Web data ,this paper defines authority of view and credibility of description, inspired by the idea of Hypertext-Induced Topic Search (HITS) algorithm. The authority of view and the credibility of description depend on each other. On this basis, it constructs link graph of views, and proposes an iterative multiple truth value discovery algorithm, named MTF. When the algorithm converges, the view with maximum authority is the truth value. Experimental results on Book-Authors datesets show that the accuracy of MTF can be improved greatly than standard VOTE algorithm.
出处
《计算机工程》
CAS
CSCD
北大核心
2016年第9期158-162,共5页
Computer Engineering
基金
国家社科基金资助项目"基于大数据整合的空气质量测度方法研究"(14GSD95)
全国统计科研基金资助重点项目"海量异源异物数据的采集
存储和分析方案研究"(2013LZ44)
陇原创新人才扶持计划基金资助项目(14GSD95)
甘肃省财政厅高校基本科研业务费基金资助项目(GZ14007
GZ14023)