期刊文献+

基于层次聚类的跨文本中文人名消歧研究 被引量:8

Cross-document Chinese personal name entity disambiguation based on hierarchical clustering
下载PDF
导出
摘要 人名消歧已经成为自然语言处理和信息抽取应用中亟待解决的重要问题。运用中文自然语言处理和信息抽取系统识别命名实体和实体关系,生成实体信息对象(Entity Profile),采用实体信息对象(EP)中的个人信息特征,实体关系和上下文相关信息在Hadoop平台上基于凝聚的层次聚类方法解决了实体消歧问题。采用哈尔滨工业大学整理的全网新闻语料作为人名消歧训练和测试数据,着重研究了中文人名消歧特征的选取,参数的确定和验证,在训练集和测试集上分别取得了91.33%和88.73%的F值。说明提出的方法具有较好的可行性。 Cross-document entity disambiguation is the problem of identifying whether mentions from different documents refer to the same or distinct entities. This paper describes a Chinese information extraction system which involves both document-level IE and corpus-level IE, a pipeline and multi-level modular approach to name entity and Entity Profile extrac-tion. It introduces novel features based on document-level entity profiles and study on the influence of feature selection, parameter selection, parameter validation and analysis on results. Disambiguation is performed based on agglomerative hier-archical clustering using Hadoop. Experiments show that F-measure of training set is 91.33% and testing set is 88.73%, using the whole network news corpus dataset from Harbin Institute of Technology.
出处 《计算机工程与应用》 CSCD 2014年第6期106-111,共6页 Computer Engineering and Applications
关键词 人名消歧 信息抽取 相似度 层次聚类 entity disambiguation information extraction similarity hierarchical clustering
  • 相关文献

参考文献10

  • 1Gao Liqi, Zhang Yu, Liu Ting, et al.Word sense language model for information retrieval[C]//AIRS,2006.
  • 2李保利,陈玉忠,俞士汶.信息抽取研究综述[J].计算机工程与应用,2003,39(10):1-5. 被引量:178
  • 3McCarthy, Lehnert W.Using decision trees for coreference resolution[C]//Proceedings of the Sixth Message Under- standing Conference(MUC-6), 1995.
  • 4Bagga A,Baldwin B.Entity-based cross-document corefer- encing using the vector space model[C]//Proceeding of the 17th International Conference on Computational Linguis- tics, Canada, 1998 : 79-85.
  • 5WePS-3 workshop program[EB/OL]. (2010-07-10).http :// nlp.uned.es/weps/.
  • 6Task3 Chinese version[EB/OL]. ( 2010-10-16 ) .http ://www. clpsc.org.cn/clp2010/task3_ch.htm.
  • 7周晓,李超,胡明涵,等.基于人物互斥属性的中文人名消歧[c].见:第六届全国信息检索学术会议(CCIR2010).2010:333—340.
  • 8丁海波,肖桐,朱靖波.基于多阶段的中文人名消歧聚类技术的研究[C].见:第六届全国信息检索学术会(CCIR2010).2010:316—324.
  • 9郎君,秦兵,宋巍,刘龙,刘挺,李生.基于社会网络的人名检索结果重名消解[J].计算机学报,2009,32(7):1365-1374. 被引量:32
  • 10Shingo O, Issei S, Minoru Y.Person name disambiguation in Web pages using social network, compound words and latent topics[C]//LNAI 5012 : PAKDD2008,2008 : 260-271.

二级参考文献21

  • 1Wang Houfeng(王厚峰),Mei Zheng.Chinese multi-document personal name disambiguation[J].High Technology Letters,2005,11(3):280-283. 被引量:8
  • 2[16]Hobbs J,Appelt D,Bear J et al.FASTUS:A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text[C].In:Roche,Schabes eds. Finite State Devices for Natural Language Processing, MIT Press,Cambridge MA, 1996
  • 3[17]Appelt D E.Introduction to Information Extraction[J].AI COMMUNICATIONS, 1999; 12(3)
  • 4[18]Yangarber R.Scenario Customization for Information Extraction[D].Ph D Thesis.New York University,2001-01
  • 5[19]Cowie J, Lehnert W.Information Extraction[J].Communications of the ACM, 1996;39(1)
  • 6[20]Grishman R Adaptive information extraction and sublangu age analysis[C].In:Proceedings of IJCAI-2001 Workshop on Adaptive Text Extraction and Mining,2001
  • 7[1]Applet D E,Israel D J.Introduction to Information Extraction Technology. A Tutorial for IJCAI-99,1999
  • 8[2]Gaizauskas R,Wilks Y.Information Extraction:Beyond Document Retrieval[J].Journal of Documentation, 1997
  • 9[3]Sager N.Natural Language Information Processing. Reading,Massachusetts:Addison Wesley, 1981
  • 10[4]Dejong G.An Overview of the FRUMP System[C].In:LEHNERT W,RINGLE M h eds. Strategies for Natural Language Processing,Lawrence Erlbaum, 1982:149~176

共引文献210

同被引文献65

  • 1徐琳宏,林鸿飞.基于语义特征和本体的语篇情感计算[J].计算机研究与发展,2007,44(z2):356-360. 被引量:13
  • 2洪铭材,张阔,唐杰,李涓子.基于条件随机场(CRFs)的中文词性标注方法[J].计算机科学,2006,33(10):148-151. 被引量:56
  • 3唐慧丰,谭松波,程学旗.基于监督学习的中文情感分类技术比较研究[J].中文信息学报,2007,21(6):88-94. 被引量:136
  • 4Riloff E, Shepherd J. A Corpus-Based Approach for Building Semantic Lexicons [DB/OL]. [ 2014-05-06]. http://wenku, baidu, corn/link? url = lVk7gt BrI- JG4keL Bxpd8_jqp-Qr05 RC4598Fs 7QQ7 CScCUX- pj-CVIBeritB fLbVWusxXSr 8VKwixdXD 31 hcG7-zi- iOx OWIyj 3rwSZh- OBRO.
  • 5Hatzivassiglou V, McKeown K R. Predicting the se- mantic orientation of adjectives[DB/OL]. [2014-05- 06]. http://dl, acre. org/citation, c fro? id= 976909-979640.
  • 6Turney P D, Littman M I. Measuring praise and crit- ism inference of semantic orientation from association [J]. ACM Trans on Information Systems, 2003, 21 (4) :315-346.
  • 7Pang B, Lillian L, Vaithyanathan S. Thumbs up?.. sentiment classification using machine learning tech- niques[DB/OL]. [ 2014-04-06 ]. http //wenku. baidu, com/view/8ab50109bb68a98271fefa48, html.
  • 8Pang B, Lillian L. Seeing stars., exploiting class rela tionships for Sentiment categorization with respect to rating scales[DB/OL]. [2014-05-10]. http=//www. doc88, com/ p 6 7 31169330626. html.
  • 9Velikovich L, Blair-Goldensohn S, Hannan K,et al. The viability of Web-derived polarity lexicons [DB/ OL]. [2014-05-10] http://www, docin, corn p- 723326219. html ACL 2010.
  • 10Most Common Male First Names in the United States[ EB/OL]. [ 2015-01-05 ]. http://names, mongabay, corn/ male_names, htm.

引证文献8

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部