摘要
人名消歧问题属于文本聚类范围,但有其自身的特殊性,即参与聚类的文本集采用向量空间模型表示以后具有较高的维度,导致数据在聚类过程中效率低下、计算内存开销过高。为了深入分析人名消歧研究中聚类算法的整体应用情况,从中国知网期刊数据库收集2006-2018年10月相关文献进行了统计和分析,介绍了利用聚类算法进行人名消歧研究的一般流程,阐述了聚类算法在人名消歧研究的应用、聚类评价指标和聚类结果评价,详细介绍相关研究成果及代表文献,为研究人员提供参考和借鉴。
Name disambiguation belongs to the scope of text clustering,but it has its own particularity:the set of text clustering represented by vector space model has a higher dimension,which leads to inefficiency and high computational memory in clustering process. In order to deeply analyze the overall application of clustering algorithm in the research of name disambiguation,the paper collected the related literature from the database of CNKI from October 2006 to October 2018 to statistics and analyze. Also,introduces the general process of using clustering algorithm in the researching name disambiguation,expounds the application of clustering evaluation in researching name disambiguation,clustering evaluation and evaluation of clustering result. Finally,the paper introduces in detail research results and representative literature,which provides reference for researchers of name disambiguation.
作者
展金梅
陈君涛
ZHAN Jinmei;CHEN Juntao(Qiongtai Normal University,Haikou 571127,China;Hainan College of Economics and Business,Haikou 571127,China)
出处
《现代信息科技》
2019年第10期88-91,共4页
Modern Information Technology
基金
海南省高等学校科学研究项目:聚类集成算法在中文文本中人名消歧的应用研究(项目编号:Hnky2018-78)资助,属其阶段性研究成果之一
关键词
聚类
人名消歧
研究综述
clustering
name disambiguation
research summary