期刊文献+

C值和互信息相结合的术语抽取 被引量:7

TERM RECOGNITION BASED ON INTEGRATION OF C-VALUE AND MUTUAL INFORMATION
下载PDF
导出
摘要 在目前的生物信息领域开放语料的术语抽取实验中,前2000多个双字词的精度已经达到了90.36%,但是三字以上的词的抽取精度只有66.63%,多字词的抽取成为了名词术语自动抽取的一个难点问题。针对该难点,提出综合C-value参数在长术语抽取方面的优势,并与术语抽取中的互信息参数相结合的策略来识别术语。实验结果表明,长术语抽取正确率为75.7%,召回率为68.4%,F测量值为71.9%,高于相同语料下的其他方法。 In current experimental results of term recognition on biology information open corpus,more than 2000 anterior Chinese phrases composed of two characters has reached the precision of 90.36%,but the recognition precision of Chinese phrases composed of three or more characters is only 66.63% .So the recognition of Chinese phrases with multiple characters becomes a difficulty in automatic recognition of noun terminologies.To resolve this,a strategy of term recognition for biology information is proposed in this paper.It integrates C-value parameter which has the predominance in long terminology's recognition with the parameter of mutual information of term recognition.Experimental result shows,for long terminologies,the recognition precision is 75.7%,the recall rate is 68.4%,and the F-measure is 71.9%,all are higher than those obtained with other methods on the same corpus.
出处 《计算机应用与软件》 CSCD 2010年第4期108-110,共3页 Computer Applications and Software
基金 江苏省现代企业信息化应用支撑软件工程技术研究开发项目(SX200907) 黑龙江省博士后基金(520415029) 江苏省"青蓝"工程(2008)
关键词 术语抽取 C值 互信息 Term recognition C-value Mutual information
  • 相关文献

参考文献16

  • 1Cohen J D.Highlights:Language and Domain-Independent Automatic Indexing Terms for Abstracting[J].Journal of the American Society for Information Science,1995,46(3):162-174.
  • 2Church K,Hanks K.Word Association Norms,Mutual Informantion and Lexicography[J].1990,16(1):22-29.
  • 3Patrick Pantel,Dekang Lin.A Statistical Corpus-Based Term Extractor[C]//Canadian Conferernce on AI 2001,2001:36-46.
  • 4Justeson John S,Slava M Katz.Technical terminology:some linguistic properties and an algorithm for identification in text[M].Natural Language Engineering,1995:224-265.
  • 5Frank Smadja.Retrieving collocations from text:Xtract[M].Computational Linguistics,1993:110-129.
  • 6Frantzi K,Ananiadou S.The C-value/NC-value domain independent method for multi-word term extraction[J].Journal of Natural Language Processing.1999,6(3):20-27.
  • 7Frantzi K,Ananiadou S.A Hybrid Approach to Term Recognition[C]//Proceedings of NLP+IA,1996a:93-98.
  • 8Frantzi K,Ananiadou S.Extracting Nested Collocations[C]//Proceedings of the 16th international conference on computational linguistics,Coling 96,1996b:41-46.
  • 9Maynard D G,Ananiadou S.Identifying contextual information for term extraction[C]//Proc,of 5th International Congress on Terminolohy and Knowledge Enginerring.1999a:33-37.
  • 10Diana Maynard,Sophia Ananiadou.Identifying Contextual Information for Multi-Word Term Extraction[M].1999:277-319.

同被引文献71

  • 1冯志伟.科技术语古今谈[J].术语标准化与信息技术,2005(2):4-8. 被引量:12
  • 2黄德才,戚华春.PageRank算法研究[J].计算机工程,2006,32(4):145-146. 被引量:69
  • 3何燕,穗志方,段慧明,俞士汶.一种结合术语部件库的术语提取方法[J].计算机工程与应用,2006,42(33):4-7. 被引量:17
  • 4梁爱林.论术语知识工程学的发展[J].术语标准化与信息技术,2007(2):4-10. 被引量:9
  • 5田淼.科学技术史名词术语审定项目——构建标准的科学技术史名词术语体系[J].广西民族大学学报(自然科学版),2007,13(3):34-35. 被引量:1
  • 6刘群,李素建.基于知网的词汇语义相似度计算[C]//第三届汉语词汇语义学研讨会.台北,2002.
  • 7Qin He. Knowledge discovelw through co-word analysis [ J ]. Li- brary Trends. 1999, 48( 1 ) : 133 - 159.
  • 8Wong W, Liu Wet. Bennamoun M. Determining the unithood of word sequences using mutual information and independence measure [ C ]//The Asian Federation of Natural Language Processing ( AFN- EP). Proceedings of the 3rd International Joint Conference on Natu- ral Language Processing (IJCNLP). Hyderabad,2008 : 103 - 110.
  • 9Kit C. Corpus tools for retrieving and deriving termhood evidence [ C ]//Proceedings of the 5th East Asia Fontm of Terminology. Haikou, 2002:69 - 80.
  • 10Nie Jianyun, Hannah M L, Jin Wanying. Unknown word detection and segmentation of Chinese using statistical and heurislic knowl- edge [ C ]//Chinese and Orienlal Languages lnformation Processing Sociely . Communications of COLIPS. Singapore, 1995 : 47 -57.

引证文献7

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部