摘要
分析了科技领域术语的特点,综合考虑语言学与统计学特征,提出了一种科技领域术语自动抽取模型,包括预处理、串扩展和术语筛选3个部分.通过实验研究了阈值选取同评价指标之间的关系,验证了本文模型的有效性.对比实验结果表明,在保证较高准确率和召回率的基础上,抽取速度较通用方法提高2倍以上.
The features of terms in technology domain were analyzed, and a model of automatic term extraction for technology domain was proposed considering linguistic and statistical characteristics. The model consisted pre-processing, string extension and term filtering. The relationship between threshold selection and evaluating indicators was studied by experiment, and the validity of the model proposed was verified. Experimental results show that the rate of extraction has been raised more than 2 times as well as the receivable precise rate and recall rate.
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2013年第1期230-235,共6页
Systems Engineering-Theory & Practice
关键词
科技领域
术语自动抽取
串扩展
术语筛选
technology domain
automatic term extraction
string extension
term filtering