摘要
针对自然语言处理中的中文命名实体消歧问题,提出一种基于异构知识库的层次聚类方法。利用中文信息抽取系统对中文维基百科等知识库进行抽取,形成包含人物信息、实体关系的实体信息对象,并在Hadoop平台上用分布式计算进行层次聚类,研究人物实体特征的选取和维基百科等知识库的使用对命名实体消歧结果的影响。结果表明加入百科知识库后,F值从91.33%增加到了92.68%。
A scalable and robust system is proposed to deal with Named Entity disambiguation problem based on hierarchical clustering using Wikipedia as Knowledge Base.The entity profiles, as information obj ects which contain entity attributes and entity relations created by our IE system,are disambiguated with hierarchical clustering on Hadoop platform.Features selection on similarity measurement and comparison of the results using Heterogeneous as Knowledge Base are studied mainly in this paper.Results show that F-measure value increase from 91.33% to 92.68% by using Wikipedia as knowledge base.
出处
《西安邮电大学学报》
2014年第4期70-76,共7页
Journal of Xi’an University of Posts and Telecommunications
基金
陕西省教育厅科研计划自然基金资助项目(12JK0938)
关键词
人名消歧
维基百科
中文信息抽取
层次聚类
实体信息
entity disambiguation
Wikipedia
Chinese information extraction
hierarchical clustering
entity information