摘要
提出一种改进的基于相似度计算的科技文献关键词选取算法。先利用N-gram算法提取领域词库,再综合利用领域词库和常识词库,对最初选择的关键词重新切分,进行给定关键词之间的语义对比。语义相似度大于一定阈值的关键词被认为是表达同一意义的同义词,将同义词在文献库中合并,从而解决关键词冗余问题。实验结果可以证明该方法的有效性。
Irregular keywords often cause high redundancy in the same research topic. To address the issue, this paper proposes an improved keywords selection algorithm based on similarity calculation. It re - segments keywords using field dictionary and common -sense knowledge database thesaurus. When the total semantic similarity is greater than a given threshold, the two compared keywords are considered to express the same meaning, then merging and keeping only one of them in library, which achieves the purpose of the dimension reduction. Finally, experimental results show the effective- ness of the method.
出处
《现代图书情报技术》
CSSCI
北大核心
2012年第1期34-39,共6页
New Technology of Library and Information Service
关键词
科技文献关键词
冗余
语义相似度
特征降维
Scientific literature keywords Redundancy Semantic similarity Feature reduction