摘要
术语是通过语言或文字来表达或限定专业概念的约定性语言符号,本文首先对术语的定义、术语的特性以及术语抽取效果的评价方法进行概述,并在概述的基础上介绍了目前常见的术语抽取方法,包括基于规则、基于统计、基于词图模型、基于主题模型和基于深度学习的方法等。文章还对上述方法做了原理介绍和使用该方法进行术语抽取的流程,最后指出了术语抽取面临的挑战和研究展望。
Term is a conventional language symbol that expresses or restricts professional concepts through language or text.This article first summarizes the definition of terms,the characteristics of terms,and the evaluation methods of term extraction effects,and accordingly introduces the current common term extraction methods,including rule-based,statistics-based,word graph model-based,topic model-based,and deep learning-based methods.This article also introduces the principle of the above methods and the process of term extraction using this method,and finally points out the challenges and research prospects of term extraction.
作者
郑坤
薛明晰
纪传胤
Zheng Kun;Xue Ming-xi;Ji Chuan-yin(Troop 32180 of Chinese PLA,Beijing 100012,China)
出处
《科学与信息化》
2021年第29期118-121,共4页
Technology and Information
关键词
抽取
文本处理
automatic term extraction
term recognition
key word extraction
text processing