期刊文献+

基于真值发现的冲突数据源质量评价算法 被引量:2

Quality evaluation algorithm for conflicting data sources based on true value finding
下载PDF
导出
摘要 针对目前冲突数据源的质量评价模型仅考虑准确度与精确度2个方面,没有考虑数据源提供错误描述与提供空值对数据源质量会产生不同影响的情况,通过将数据源提供的错误描述定义为主动错误,并将数据源没有为实体提供描述定义为被动错误,从主动错误、被动错误2个方面建立数据源质量模型.该模型以敏感度、明确度代替了准确度与精确度;为了处理多真值问题,预先合并数据源对实体的描述,并定义了合并描述的包含关系及包含度计算模型;在包含度计算模型的基础上,提出了基于描述包含度的冲突数据源质量评价算法(TFDQ).在通用数据集Books-Authors上的实验表明,与Vote算法、TruthFinder算法相比,TFDQ算法实验结果更接近真实情况. Existing evaluating models for conflicting data sources usually take nothing but accuracy and precision into account, ignoring different impacts to the quality of data sources caused by false data values and empty values. In this paper, false descriptions provided by data sources were defined as initiative errors, while empty values were defined as passive errors. A new quality evaluating model was constructed, in which accuracy and precision were respectively substituted by sensitivity and specificity. Multiple descriptions from different sources were merged and a notion of inclusion relation as well as a calculating model for inclusion degrees was proposed as pretreatments to deal with multi-value problems. An evaluating algorithm TFDQ for conflicting data source quality based on the calculating model was put forward. Experiments on the universal data set Books-Authors show that the result from TFDQ is closer to the reality comparing to the classic Vote and TruthFinder algorithms.
出处 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2015年第2期303-308,共6页 Journal of Zhejiang University:Engineering Science
基金 国家自然科学基金资助项目(51475097) 国家“十二五”科技支撑计划项目(2012BAF12B14) 贵州省科技资助项目(黔科合JZ字[2014]2001,黔科合计Z字[2012]4009)
关键词 数据集成 数据源质量 真值发现 data integration quality of data sources truth finder
  • 相关文献

参考文献12

  • 1BLEIHOLDER J, NAUMANN F. Conflict handling strategies in an integrated information system [C]// In Proceedings of the IJCAI Workshop on Information on the Web. Edinburgh, Scotland, UK: ACM, 2006: 1- 6.
  • 2ABOULNAGA A, E1 GEBALY K. tbe: User guided source selection and schema mediation for internet scale data integration [ C]// In Proceedings oflnternational Conference on Data Engineering. Istanbul, Turkey: ACM,2007:186 - 195.
  • 3万常选,邓松,刘喜平,廖国琼,刘德喜,江腾蛟.Web数据源选择技术[J].软件学报,2013,24(4):781-797. 被引量:16
  • 4YIN X, HAN J, YU P S. Truth discovery with multi- pie conflicting information providers on the web [J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 796-308.
  • 5DONG X L, BERTI-EQUILLE L, SRIVASTAVA D. Integrating conflicting data: the role of source depend- ence [C]///In Proceedings of the VLDB Endowment. Ly- on, France: ACM , 2009, 2(1): 550-561.
  • 6DONG X L, BERTI-EQUILLE L, SRIVASTAVA D. Truth discovery and copying detection in a dynamic worm [C]// In Proceedings of the VLDB Endowment. Lyon, France: ACM , 2009, 2(1): 562- 573.
  • 7张志强,刘丽霞,谢晓芹,潘海为,方一向.基于数据源依赖关系的信息评价方法研究[J].计算机学报,2012,35(11):2392-2402. 被引量:14
  • 8考明军,张坤,高宏.冲突数据中的真值发现算法[J].计算机研究与发展,2010,47(增刊):188-192.
  • 9GALLAND A, ABITEBOUL S, MARIAN A, et al. Corroborating information from disagreeing views [C] ff In Proceedings of the third ACM International Conference on Web Search And Data Mining. ACM, 2010:131 - 140.
  • 10ZHAO B, RUBINSTEIN B I P, GEMMELL J, et al. A Bayesian approach to discovering truth from conflict- ing sources for data integration [C]// In Proceedings of the VLDB Endowment. Istanbul, Turkey : ACM,2012, 5(6) : 550 - 561.

二级参考文献37

  • 1梁吉业,王俊红.基于概念格的规则产生集挖掘算法[J].计算机研究与发展,2004,41(8):1339-1344. 被引量:56
  • 2张文修,梁广锡,梁怡.包含度及其在人工智能中的应用[J].西安交通大学学报,1995,29(8):111-116. 被引量:10
  • 3张文修,徐宗本,梁怡,梁广锡.包含度理论[J].模糊系统与数学,1996,10(4):1-9. 被引量:49
  • 4Wille R..Restructuring lattice theory:An approach based on hierarchies of concepts.In:Rival I.ed..Ordered Sets.Dordrecht:Reidel,1982,445~470.
  • 5Ganter B.,Wille R..Formal Concept Analysis:Mathematical Foundations.Berlin:Springer-Verlag,1999.
  • 6Yao Y.Y..Concept lattices in rough set theory.In:Proceed ings of 2004 Annual Meeting of North American Fuzzy Information Processing Society,Canada,2004,796~801.
  • 7Qu K.S.,Liang J.Y.,Wang J.H.et al..The algebraic properties of concept lattice.Journal of Systems Science and Information,2004,2(2):271~277.
  • 8Zupa B.,Bohance M..Learning by discovering concept hierarchies.Artificial Intelligence,1999,109(1~2):211~242.
  • 9Tonella.Using a concept lattice of decomposition slices for program understanding and impact analysis.IEEE Transactions on Software Engineering,2003,29(6):495~509.
  • 10Dekel U..Revealing Java class structure with concept lattices[M.S.dissertation].Technion-Israel Institute of Technology,2003.

共引文献81

同被引文献4

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部