摘要
为解决中文检索系统中重名问题带来的自引统计不准确问题,设计了一种基于规则的人名消歧算法,包括作者机构、作者名、学科分类和来源期刊规则,以实现人名消歧,进而辅助自引统计。实验表明,对比基于KMeans的聚类算法,基于规则的人名消歧算法较为有效,综合测评指标F值最高达到0.87,可供自引统计模块使用。
The paper aims at solving the problem of self-citation statistics inaccuracy due to personal name duplication in Chinese retrieval system, designs a rule-based personal name disambiguation algorithm, including rules of authors’ organization, author name, discipline category and source journal, to realize the disambiguation of personal name and then to assist self-citation statistics.The experiment result shows that the rule-based personal name disambiguation algorithm is more effective than KMeans-based clustering algorithm, its comprehensive assessment index F tops at 0.87; it can be used for statistic module of self-citation.
出处
《情报探索》
2015年第5期57-59,67,共4页
Information Research
关键词
自引统计
人名消歧
聚类
规则
self-citation statistics
personal name disambiguation
cluster
rule