期刊文献+

基于信息熵和词频分布变化的术语抽取研究 被引量:20

Term Extraction Based on Information Entropy and Word Frequency Distribution Variety
下载PDF
导出
摘要 在分别研究了基于信息熵和基于词频分布变化的术语抽取方法的情况下,该文提出了一种信息熵和词频分布变化相结合的术语抽取方法。信息熵体现了术语的完整性,词频分布变化体现了术语的领域相关性。通过应用信息熵,即将信息熵结合到词频分布变化公式中进行术语抽取,且应用简单语言学规则过滤普通字符串。实验表明,在汽车领域的语料上,应用该方法抽取出1 300个术语,其正确率达到73.7%。结果表明该方法对低频术语有更好的抽取效果,同时抽取出的术语结构更完整。 A term extraction system based on information entropy and word frequency distribution variety is presen- ted. Information entropy can measure the integrality of the terms while word frequency distribution variety can measure the domain relativity of terms. Incorporating with simple linguistic rules as an addition filter,the automatic term extraction system integrates information entropy into word frequency distribution variety formula. Preliminary experiment on the corpus of automotive domain indicates that the precision is 73.7% when 1,300 terms are extrac- ted. The result shows that the proposed approach can effectively recognize the terms with lower frequency and the recognized terms are well of integrality.
出处 《中文信息学报》 CSCD 北大核心 2015年第1期82-87,共6页 Journal of Chinese Information Processing
基金 国家自然科学基金(61173101 61173100)
关键词 术语抽取 信息熵 词频分布变化 term extraction information entropy word frequency distribution variety
  • 相关文献

参考文献12

二级参考文献87

  • 1邹纲,刘洋,刘群,孟遥,于浩,西野文人,亢世勇.面向Internet的中文新词语检测[J].中文信息学报,2004,18(6):1-9. 被引量:59
  • 2LIANG Yinghong,ZHAO Tiejun,YAO Jianmin,YU Hao.A Multi-Agent Strategy Chinese Text for Both English and Chunking[J].Chinese Journal of Electronics,2006,15(3):422-426. 被引量:1
  • 3王还.现代汉语频率词典[M].北京:北京语言学院出版社,1986..
  • 4Bourigault D.Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases[C]//Proceedings of COLING' 92.1992:977-981
  • 5Pantel P,Lin D.A Statistical Corpora-based Term Extractor[C] //Lecture Notes in Artificial Intelligence.Springer,Verlag,2001:34-46
  • 6Frantzi K T,Ananiadou S,Mima H.Automatic Recognition of Multi-word terms:the C-value/NC-value Method[J].International Journal on Digital Libraries,2000,3(2):115-130
  • 7Kageura K,Umino B.Methods of Automatic Term Recognition:A Review[J].Terminology,1996,3(2):259-289
  • 8刘桐菊,于浩,杨沐昀.基于TFIDF的专业领域词汇获取的研究[C]//第一届学生计算语言学研讨会论文集.2002
  • 9张普.信息领域汉语术语的特征及其在语料中的分布规律.语言教学与研究,2001,.
  • 10Jun Xu, Yunbo Cao, Hang Li, Min zhao. Ranking Definitions with Supervised Learning Methods [C]// Proc. 14th International World Wide Web Conference Committee, Chiba, Japan: 2005: 811-819.

共引文献136

同被引文献241

引证文献20

二级引证文献148

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部