期刊文献+

采用改进最长公共子序列的人名消歧 被引量:5

Person Name Disambiguation Based on Revised Longest Common Subsequence
下载PDF
导出
摘要 将名词、形容词、动名词和命名实体作为文本特征,考虑词序与词频,结合特征项的语义,提出一种基于改进最长公共子序列的文本聚类(LCSC)方法.实验结果表明:相对于传统的余弦值聚类方法,LCSC方法在人名消歧的P-IP指标上,F平均值由74.2%提高到了84.9%;相对于最长公共子序列方法,总体性能也提高了3.7%. This paper uses nouns,adjectives,gerunds and named entities as text features,and also considers the word order and word frequency when computing the text similarity.A text clustering method based on revised longest common subsequence(LCSC)is proposed.The experimental results show that the LCSC method can significantly improve the overall performance in person name disambiguation compared with traditional clustering method and make the average Fmeasure increase from 74.2%to 84.9%.The overall performance also improved by 3.7% when compared with the longest common subsequence method.
出处 《华侨大学学报(自然科学版)》 CAS 北大核心 2016年第2期201-206,共6页 Journal of Huaqiao University(Natural Science)
基金 福建省科技计划重大项目(2011H6016) 福建省科技计划重点项目(2011H0028)
关键词 人名消歧 文本相似度 最长公共子序列 层次聚类 person name disambiguation text similarity longest common subsequence hierarc
  • 相关文献

参考文献16

  • 1ARTILES J, GONZALO J, VERDEJO K A testbed for people searching strategies in the WWW[C]///Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrievak Piscataway: ACM, 2005 : 569-570.
  • 2BAGGA A, BALDWIN ]3. Entity-based cross-document coreferencing using the vector space model[C]//Proceed- ings of the 17th International Conference on Computational Linguistics. Boston: Association for Computational Lin- guistics, 1998 : 79-85.
  • 3MANN G S,YAROWSKY D. Unsupervised personal name disambiguation[C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL. Edmonton:Association for Computational Linguistics, 2003:33-40.
  • 4PEDERSEN T, PURANDARE A, KULKARNI A. Name discrimination by clustering similar contexts[C]//Compu-tational Linguistics and Intelligent Text Processing. Berlin.. Springer Berlin Heidelberg, 2005:226-237.
  • 5CHEN Y,MARTUB J. Towards robust unsupervised personal name disambiguation[C]//EMNLP-CoNLL. Wash- ington D C: IEEE Press, 2007 : 190-198.
  • 6IKEDA M,ONO S,SATO I, et al. Person name disambiguation on the web by two-stage clustering[C]ff2nd Web People Search Evaluation Workshop. New York: Association for Computing Machinery, 2009 : 33-38.
  • 7YANG Xia, JIN Peng, XIANG Wei. Exploring word similarity to improve Chinese personal name disambiguation[C]ffWeb Intelligence and Intelligent Agent Technology. Washington D C.. IEEE Press,2011 ;197-200.
  • 8SALTON G,WONG A, YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975,18(11) : 613-620.
  • 9董振东,董强.知网简介[EB/0L][2014-03-16].http://www.keenage.com.
  • 10刘群 李素建.基于《知网》的词汇语义相似度计算.中文计算语言学,2002,7(2):59-76.

二级参考文献23

共引文献398

同被引文献41

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部