期刊文献+

基于关键证据与E^2LSH的增量式人名聚类消歧方法 被引量:6

Incremental Clustering Method based on Key Evidence and E^2LSH for Person Name Disambiguation
下载PDF
导出
摘要 搜索引擎中关于人名的相关文档往往数据量庞大,且数据为增量式更新过程,新文档出现的时间与规模都存在不确定性。现有的方法多为全局的人名聚类方法,在处理大规模数据时往往效率较低,且无法实现增量聚类。本文提出了一种基于关键证据与E^2LSH的增量式人名聚类消歧方法。对于初始文档集,采用全局的人名聚类方法,保证聚类性能且能有效控制全局聚类的文档规模,提高聚类效率。对于增量文档集,利用提出的关键证据与E2LSH方法生成候选文档集,极大降低了需要计算相似度的文档规模,提高方法效率。实验结果表明,本文提出的增量式人名聚类消歧方法能有效改善人名聚类的效率,且具有良好的性能。 There are a large number of documents related with query person name which are indexed into the search engine. These documents are updated incrementally, and the update time and scale of new documents are uncertain. The most of existed methods are more focused on global clustering for person name disambiguation, but they are usually inefficient while processing a large-scale data, and cannot support incremental clustering. In this paper, an incremental clustering method based on key evidence and E2LSH for person name disambiguation is proposed. For initial document set, a global clustering method is adopted, and this method can achieve higher performance and reduce this size of documents, which the global clustering method needs to process, for the purpose of increasing the efficiency of document clustering. For incremental document set, the method based on key evidence and E2LSH is proposed to generate candidate document set. It significantly reduces the size of documents that need to compute the similarity, and increases the efficiency. The experimental results show that our method can improve clustering efficiency for person name disambiguation, and achieve good clustering performance.
出处 《情报学报》 CSSCI 北大核心 2016年第7期714-722,共9页 Journal of the China Society for Scientific and Technical Information
基金 国家社会科学基金项目"网上舆情斗争系统建模与应对策略研究"(14BXW028)资助
关键词 人名消歧 增量聚类 关键证据 E2LSH 大规模文档 person name disambiguation, incremental clustering, key evidence, E2LSH, scalable documents
  • 相关文献

参考文献18

  • 1Most Common Male First Names in the United States[ EB/OL]. [ 2015-01-05 ]. http://names, mongabay, corn/ male_names, htm.
  • 2Elmacioglu E, Tan Y F, Yan S, et al. PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features [ C ]// Proceedings of the Fourth International Workshop on Semantic Evaluations, Prague, 2007 : 268-271.
  • 3Long C, Shi L. Web Person Name Disambiguation by Relevance Weighting of Extended Feature Sets [ C ]// Proceedings of CLEF( Notebook Papers/LABs/Workshops), Padua, 2010: 1-13.
  • 4Chen L W, Feng Y S, Zou L, et al. Explore Person Specific Evidence in Web Person Name Disambiguation [C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, 2012 : 832-842.
  • 5张菲菲,李宗海,周晓辉,李晓戈.基于层次聚类的跨文本中文人名消歧研究[J].计算机工程与应用,2014,50(6):106-111. 被引量:8
  • 6李广一,王厚峰.基于多步聚类的汉语命名实体识别和歧义消解[J].中文信息学报,2013,27(5):29-34. 被引量:17
  • 7Chen Y, Lee S Y M, Huang C R. PolyUHK: A Robust Information Extraction System for Web Personal Names [ C ]// Proceedings of the 2nd Web People Search Evaluation Workshop, Madrid, 2009.
  • 8Delgado A D, Martinez R, Fresno V, et al. A Data Driven Approach for Person Name Disambiguation in Web Search Results [ C ]// Proceedings International Conference on Dublin, 2014: 301-310. of COLING 2014, the 25th Computational Linguistics,.
  • 9Nikesh D R, Yarowsky G D. JHUI: An Unsupervised Approach to Person Name Disambiguation using Web Snippets [ C ]// Proceedings of the 4th International Workshop on Semantic Evaluations Association for Computational Linguistics, Prague, 2007 : 199-202.
  • 10Lefever E, Fayruzov T, Hoste V, et al. Fuzzy Ants Clustering for Web People Search [ C] // Proceedings of Web People Search Evaluation Workshop, Madrid, 2009.

二级参考文献36

  • 1YE Hui-min,CHENG Wei,DAI Guan-zhong.Design and Implementation of On-Line Hot Topic Discovery Model[J].Wuhan University Journal of Natural Sciences,2006,11(1):21-26. 被引量:14
  • 2于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 3周晓,李超,胡明涵,等.基于人物互斥属性的中文人名消歧[c].见:第六届全国信息检索学术会议(CCIR2010).2010:333—340.
  • 4丁海波,肖桐,朱靖波.基于多阶段的中文人名消歧聚类技术的研究[C].见:第六届全国信息检索学术会(CCIR2010).2010:316—324.
  • 5赵华,赵铁军,于浩,郑德权.基于查询向量的英语话题跟踪研究[J].计算机研究与发展,2007,44(8):1412-1417. 被引量:8
  • 6J Artiles,j Gonzalo,S Sekine.The SemEval-2007 WePS evaluation:Establishing a Benchmark for the Web People Search Task[C]//Proceedings of SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations,2007:64-69.
  • 7J Artiles,J Gonzalo,S Sekine.WePS 2 Evaluation Campaign:Overview of the Web People Search Clustering Task[C]//Proceedings of 2nd Web People Search Evaluation Workshop,18th WWW Conference,2009.
  • 8J Artiles,A Borthwick,J Gonzalo,et al.WePS-3 Evaluation Campaign:Overview of the Web People Search Clustering and Attribute Extraction Tasks[C]//Proceedings of Conference on Multilingual and Multimodal Information Access Evaluation (CLEF).2010.
  • 9H Ji,R Grishman,H T.Dang,et al.An Overview of the TAC2010 Knowledge Base Population Track[C]//Proceedings of Text Analytics Conference (TAC2010).
  • 10H Ji,R Grishman,H T Dang.An Overview of the TAC2011 Knowledge Base Population Track[C]//Proceedings of Text Analysis Conference (TAC2011).

共引文献41

同被引文献50

引证文献6

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部