摘要
介绍术语的基本特征,探讨科技术语的自动识别方法,并结合文本特征对TF-IDF和C-value两种主流统计指标进行改进。为了区分词汇位置对文档内容的影响,分别对不同位置的候选词设置不同的权重。最后设计并实现一个统计与规则相结合的科技术语自动抽取系统,通过位置权重、C-value、TF-IDF指标的联合计算来识别术语,提高抽取的准确率。
Firstly,the article introduces the basic features of terms,and discusses the automatic identification method of scientific terms.Then V-value is proposed,which improves the two main statistical indicators:TF-IDF and C-value according to text characteristics.Different weights are also set for the candidate terms by the position to show their effect.Finally,a term extraction system is implemented based on statistics and rules.The system combines the weight,C-value and TF-IDF,so it has a higher precision of extraction.
出处
《现代图书情报技术》
CSSCI
北大核心
2010年第12期28-33,共6页
New Technology of Library and Information Service
基金
“十一五”科技支撑计划课题“网络科技信息监测与评价”(项目编号:2006BAH03B05)的研究成果之一