摘要
【目的】更准确便捷地完成术语词汇的自动抽取。【方法】利用CBOW模型计算构成术语的各个词部件的向量空间模型。通过词向量之间的余弦相似度衡量术语词汇内部各个词部件的关联度。利用Page Rank算法计算候选词汇的领域代表性并排序,通过阈值的设定,抽取出更为具有领域代表性的术语词汇。【结果】在以自然语言处理领域内的论文摘要作为数据集的实验中取得较高的准确率和召回率。【局限】测试的数据训练集偏小,而数据集的训练效果直接影响实验的效果。【结论】实验结果表明利用CBOW模型完成术语的抽取工作是一个较为合理、可行的方法。
[Objective] This study tries to extract domain terms more accurately and conveniently. [Methods] First, proposed a method using the CBOW model to build word vectors for each component of the terms. Then, applied the cosine similarity to calculate the internal correlation degree among each term’s individual components. To get more representative terms, we used the Page Rank algorithm to rank the candidates. [Results] We obtained high recall and precision rates using the paper abstacts in the field of natural language processing as the training pool. [Limitations] The training pool was relatively small, which might influence the results. [Conclusions] This study shows that CBOW model is a more appropriate method to extract terminologies.
出处
《现代图书情报技术》
CSSCI
2016年第2期9-15,共7页
New Technology of Library and Information Service
基金
南京农业大学人文社会科学研究基金项目"人文社会科学组块级汉英平行语料库构建及知识挖掘研究"(项目编号:SK2013023)
国家自然科学基金项目"基于CSSCI的句法级汉英平行语料库构建及知识挖掘研究"(项目编号:71303120)的研究成果之一
关键词
术语抽取
神经网络
CBOW模型
Terminology extraction
Neural network
Continuous Bag-of-Words Model