摘要
中文图书作者中一人多名和多人同名现象普遍存在;且各属性描述参差不齐。融合特征消歧算法处理过程中准确率有所下降。本文将作者属性分为实体特征、上下文关系特征、社会关系特征。借助向量空间模型用属性互斥放大和特征矩阵空缺缩小方法调整属性和矩阵权重系数后计算作者相似度。通过基于凝聚的层次聚类实现消歧,构建中文图书作者信息模型。用B_Cubed指标评测消歧结果,准确率、F值分别达到为89.42%、87.45%。
There is a widespread phenomenon that one person has many names and mutil-persons have co-name in Chinese book authors; and the description of attributes are uneven.The phenomenon of the homonym of more than one and many people in Chinese book writers is common, and the description of each attribute is uneven.The accuracy of the fusion feature disambiguation algorithm is reduced.This paper divides the author's attributes into three categories: Entity Features, Contextual Relationships, and Social Relations.With the aid of the vector space model, the attribute mutex amplification and the matrix vacancy reduction method are used to adjust the weight, then calculate the authors' similarity.The Chinese book author information model is constructed by using the hierarchical agglomerative clustering to realize disambiguation. The results of disambiguation were evaluated with B_Cubed index. The accuracy and F-value were 89.42% and 90.47% respectively.
出处
《电脑知识与技术》
2018年第4Z期182-184,共3页
Computer Knowledge and Technology
关键词
中文图书作者
人名消歧
互斥放大
空缺缩小
Chinese book author
name disambiguation
mutex amplification
vacancy reduction