期刊文献+

基于分步聚类的人名消歧算法 被引量:3

Name Disambiguation Based on Clustering by Step
下载PDF
导出
摘要 针对知识库中存在单条实体定义特征稀疏和人工设置相似度阈值适用性不强的问题,本文提出了一种基于分步聚类的人名消歧算法。首先,将知识库中人名实体定义的人物属性特征作为查询特征,利用文本检索的方式实现基于知识库的初次聚类,弥补了知识库中单条实体定义中特征稀疏的问题;然后,利用初次聚类的结果,采用基于自适应阈值的凝聚层次聚类算法实现知识库人名消歧;最后,采用条件随机场进行Other类识别,利用基于自适应阈值的凝聚层次聚类完成S类聚类,从而实现非知识库人名消歧。在CLP2012的中文人名消歧评测语料上进行实验,结果表明本文的算法能够有效地对人名进行消歧。 In the knowledge base there exist characteristics of sparse for a single entity, and it is difficult to determine the similarity threshold of clustering. Therefore, this paper presents a name disambiguation algorithm based on cluster by step. Firstly, query features for character attribute are obtained from knowledge base, and the initial clustering based on knowledge base is carried out by text retrieval, which make up characteristics of sparse for a single entity name defined in knowledge base. Then, taking initial clustering results as input, name disambiguation in knowledge base is completed by using hierarchical clustering algorithm based on adaptive threshold. Finally, the other classes are identified by conditional random fields, and the cluster by using hierarchical clustering algorithm based on adaptive threshold is completed. The experiment on data of CLP2012 Chinese person name disambiguation results shows that the proposed algorithm can effectively achieve disambiguation names.
出处 《数据采集与处理》 CSCD 北大核心 2016年第1期213-222,共10页 Journal of Data Acquisition and Processing
基金 国家社会科学基金(14BXW028)资助项目 全军军事研究生课题(2011JY002k-158)资助项目
关键词 人名消歧 特征稀疏 文本检索 凝聚层次聚类 相似度阈值 name disambiguation characteristics of sparse text retrieval hierarchical clustering similarity threshold
  • 相关文献

参考文献16

  • 1周耀明,李弼程.一种自适应网络舆情演化建模方法[J].数据采集与处理,2013,28(1):69-76. 被引量:26
  • 2Chen Ying, Jin Peng, l.i Wenjie, et al. Exploration of personal name disambiguation in Chinese news [C]ffCIPS-SIGHAN Joint Conference on Chinese Language Processing. Bejing,China:ACL, 2010: 20-26.
  • 3He Zhengyan, Wang Houfeng. l.i Sujian. The task 2 of CIPS-SIGHAN 2012 named entity recognition and disambiguation in Chinese bakeoff [C3//C'IPS-SIC; H AN J oint Conference on Chinese Language Processing. Tianiin, China : ACL, 2012 : 108-114.
  • 4()no S, Sato I, Yoshida M, et al. Person name disambiguation in web pages using social network, compound words and latent topics [C]//Advances in Knowledge Discovery and Data Mining. E S. 1.1: Springer Berlin Heidelberg: 2008: 260-271.
  • 5Long C, Shi L. Web person name disambiguation by relevance weighting of extended feature sets[C]//11 th Workshop of the Cross-Language Evaluation Forum. Padua: ACL, 2010 : 1-13.
  • 6杨欣欣,李培峰,朱巧明.基于网页文本依存特征的人名消歧[J].计算机工程,2012,38(19):133-136. 被引量:6
  • 7Fan Xiaoming, Wang Jianyong, Pu Xu, et al. On graph-based name disambiguation [J]. Journal of Data and Information Quality, 2011, 2(2): 1-23.
  • 8郎君,秦兵,宋巍,刘龙,刘挺,李生.基于社会网络的人名检索结果重名消解[J].计算机学报,2009,32(7):1365-1374. 被引量:32
  • 9陈晨,王厚峰.基于社会网络的跨文本同名消歧[J].中文信息学报,2011,25(5):75-82. 被引量:13
  • 10Peng Zehuan, Sun I,e, Han Xianpei. A Chinese named entity recognition and disambiguation system using a two-stage Method[C]// The 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. Tianjin,China: ACL, 2012:115-120.

二级参考文献65

  • 1Wang Houfeng(王厚峰),Mei Zheng.Chinese multi-document personal name disambiguation[J].High Technology Letters,2005,11(3):280-283. 被引量:8
  • 2J. Artiles, J. Gonzalo, S. Sekine. The SemEval- 2007WePS Evaluation.. Establishing a benchmark for the Web People Search Task [C]//SemEval, 2007.
  • 3A. Bagga, B. Baldwin. Entity-based cross-document coreferencing using the Vector Space Model[C]//Proceedings of the 17th international conference on Computational linguistics-Volume 1, 1998: 79-85.
  • 4G. S. Mann, D. Yarowsky. Unsupervised personal name disambiguation [C]//Proceedings of the seventh conference on Natural language learning at HLT- NAACL, 2003.. 33-40.
  • 5M. B. Fleischman, E. Hovy. Multi-document person name resolution[C]//Proceedings of ACL-42, Reference Resolution Workshop, 2004.
  • 6B. Malin. Unsupervised Name Disambiguation via Social Network Similarity [C]//Workshop Notes on Link Analysis, Counterterrorism, and Security, 2005.
  • 7T. Pedersen, K. Anagha. Automatic Cluster Stopping with Criterion Functions and the Gap Statistic[C]// Proceedings of the Demonstration Session of the Human Language Technology Conference and the Sixth Annual Meeting of the North American Chapter of the Association for Computational Linguistic, New York City. 2006.
  • 8Scott J. Social network analysis: A handbook (2nd ed. ) [M]. Thousands Oaks, CA: Sage. 2000.
  • 9Ng A, Jordan M,Weiss Y. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Precessing Systems 14 [C]//MIT Press, 2002.
  • 10Z. Wu, R. Leahy. An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 1993, 15 (11) : 1101-1113.

共引文献95

同被引文献28

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部