期刊文献+

采用术语定义模式和多特征的新术语及定义识别方法 被引量:12

Applying Terminology Definition Pattern and Multiple Features to Identify Technical New Term and Its Definition
下载PDF
导出
摘要 新术语及其定义抽取是信息抽取的重要研究内容之一.研究结果表明,在科技文献中,一个新术语往往伴随其定义出现,通过考察,在真实文本中,术语定义存在显著的语言表述特征,从大规模真实语料库中,通过考察术语定义构成的语言学模式、定义中词汇和术语周边的统计特征,提出了以术语定义的语言学模式(LPTD)作为待识别候选新术语集,同时考虑到有关新术语出现的上下文统计特征,用SVM分类器方法完成科技语料中新术语及其定义的识别.在大规模科技期刊上进行方法验证,开放性评测结果的精确率为90.5%、召回率达78.1%. identification of technical new term and its definition is an important research topic information extraction. It is still a great challenge to provide a scalable solution for large-scale terms extraction, because most previous approaches fail to explicitly define the linguistic constituent of terms and the function of their definition patterns. The authors' research shows that the occurrences of technical new terms in most cases are accompanied with their definition descriptions in the real corpus. Based on this intuition, the linguistic constituent of technical terms and the numerical function of their definitions are defined explicitly. Also presented is a novel statistical approach based on linguistic pattern of terminology definition (LPTD) to extract Chinese lechnical new terms and their definitions. LPTD in this paper is first proposed to delimit the boundary of technical terms. In the identification phase, both statistical information of terms and LPTD features obtained from the previous filtering process are taken into account in the SVM classifier. They are integrated into one unified framework. The idea in this paper can also be used for reference in collocation extraction (CE) and be easily extended to other different languages. Compared with the previously reported outcomes, this approach achieves a competitive result in real large-scale corpora at 90.5 % in precision and 78.1% in recall.
作者 荀恩东 李晟
出处 《计算机研究与发展》 EI CSCD 北大核心 2009年第1期62-69,共8页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2006AA010101) 国家自然科学基金项目(60572158)~~
关键词 信息抽取 术语定义模式 统计语言学模型 支持向量机 句子隶属度 information extraction linguistic pattern of terminology definition statistical language model SVM classifiers membership degree of sentence
  • 相关文献

参考文献17

  • 1Frantzi K, Ananiadou S, Mima H. Automatic recognition of multi-word terms: The C-value/NC-value method [J]. International Journal on Digital Libraries, 2000, 3(2): 115- 130
  • 2Justeson J, Katz S. Technical terminology: Some linguistic properties and an algorithm for identification in text [J]. Natural Language Engineering, 1995, 1(1): 9-27
  • 3Maynard D, Ananiadou S. Identifying terms by their family and friends [C] //Proc of the 18th Int Conf on Computational Linguistics (COLING). Morristown, N J: ACI., 2000: 530- 536
  • 4Wermter J, Hahn U. Paradigmatic modifiability statistics for the extraction of complex multi-word terms [C] //Proc of the 5th Human Language Technology Conference and 2005 Conf on Empirical Methods in Natural Language Processing. Morristown, NJ: ACL, 2005:843-850
  • 5Argamon S, Dagan I, Krymolowski Yuval. A memory-based approach to learning shallow natural language patterns [C] // Proc of the 17th COLING and the 36th Annual Meeting of ACL. Morristown, NJ: ACI., 1999: 67-73
  • 6Xun E, Ge S, Zhang R. Internet based Chinese term definition extraction research [C] //Proc of the 3rd Int Conf on Terminology, Standardization and Technology Transfer (TSTT'2006). Beijing: Encyclopedia of China Publishing House. 2006:382-389
  • 7Nenadice G, Ananiadou S, McNaught J. Enhancing automatic term recognition through recognition of variation [C] //Proc of the 20th Int Conf on Computational Linguistics (COLING). Morristown, NJ : ACL, 2004 : 604-610
  • 8Bourigault D. Surface grammatical analysis for the extraction of terminological noun phrases [C] //Proc of the 14th Int Conf on Computational Linguistics (COLING). Morristown, NJ: ACL, 1992:977-981
  • 9张艳,宗成庆,徐波.汉语术语定义的结构分析和提取[J].中文信息学报,2003,17(6):9-16. 被引量:23
  • 10Hartmann R, James G. Dictionary of Lexicography [M]. London: Rutledge, 1998

二级参考文献4

  • 1冯志伟.术语定义的原则和方法.中国术语网通讯,1994,.
  • 2刘悦耕.术语标准中的定义.自然科学术语研究,1990,.
  • 3黄鸿森.百科全书编纂求索[M].北京:中国大百科全书出版社,1993..
  • 4Masaru Tomita. An Efficient Augment-Context-Free Parsing Algorithm [J]. Computational Linguistics,1987,13 (1 - 2) : 157 - 166.

共引文献22

同被引文献114

引证文献12

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部