期刊文献+

中文短文本自动关键词提取的改进RAKE算法 被引量:10

Improved RAKE Algorithm for Automatic Keyword Extraction in Chinese Short Text
下载PDF
导出
摘要 针对RAKE(Rapid Automatic Keywords Extraction)算法在中文短文本关键词提取算法中未考虑词语语义和候选关键词过长的问题,提出一种以RAKE算法为基础的改进方法.在词语特征值计算阶段,利用词项距离、词间关系频率、共现频率构建共现矩阵,利用语境值计算公式计算每个候选关键词的特征值;按照特征值的降序输出候选关键词,若候选关键词词语个数超过n个,则利用窗口输出算法限制关键词的长度.实验表明,本文方法在中文短文本关键词提取方面相比RAKE算法及其它算法有更好的表现. In order to solve the problem that RAKE(Rapid Automatic Keywords Extraction)does not consider the word semantics and the candidate Key words are too long,an improved algorithm based on RAKE method is proposed.In the eigenvalue calculation stage,the co-occurrence matrix is constructed by using the term distance,the frequency of inter-word relation and the co-occurrence frequency,and the eigenvalue of each candidate keyword is calculated by using the contextual value calculation formula.Candidate keywords are output in descending order according to the eigenvalues.If the number of candidate keyword words exceeds n,the window output algorithm is used to limit the length of keywords.Experiments show that the proposed method has better performance in extracting Chinese short text keywords than RAKE algorithm and other algorithms.
作者 陈可嘉 黄思翌 CHEN Ke-jia;HUANG Si-yi(School of Economics and Management,Fuzhou University,Fuzhou 350108,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2021年第6期1171-1175,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(71701019)资助.
关键词 RAKE算法 自动关键词提取 语境 窗口输出 RAKE automatic keywords extraction context Window output
  • 相关文献

参考文献8

二级参考文献52

  • 1冯志伟.特思尼耶尔的从属关系语法[J].当代语言学,1983(1):63-65. 被引量:48
  • 2谭胜,马静,吴一占.基于主题描述模型的相关性判断在网页信息抽取中的应用[J].情报学报,2011,30(2):155-159. 被引量:6
  • 3许力生.语言学研究的语境理论构建[J].浙江大学学报(人文社会科学版),2006,36(4):158-165. 被引量:60
  • 4HE Q,HAO H-W,YIN X-C.Keyword extraction based on multi-feature fusion for Chinese Web pages[C]//Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science.Berlin:Springer,2012:119-124.
  • 5LU Y,LI R,WEN K,et al.Automatic keyword extraction for scientific literatures using references[C]//Proceedings of the 2014 International Conference on Innovative Design and Manufacturing.Piscataway:IEEE,2014:78-81.
  • 6PARK N H,JOO K H.Log based keyword extraction and spread based clustering for an efficient information searching[J]//International Journal of Software Engineering and Its Applications,2013,7(6):201.
  • 7YANG S,ZHANG B,LI S,et al.Keyword extraction using multiple novel features[J].Journal of Computational Information Systems,2014,10(7):2795-2802.
  • 8国家电网公司信息通信分公司.国家电网公司[EB/OL].[2014-12-01].http://www.sgcc.com.cn/.
  • 9AIZAWA A.An information-theoretic perspective of tf-idf measures[J].Information Processing and Management,2003,39(1):45-65.
  • 10DEHAK N,DEHAK R,GLASS J,et al.Cosine similarity scoring without score normalization techniques[EB/OL].[2014-12-01].http://groups.csail.mit.edu/sls/publications/2010/Dehak_Odyssey.pdf.

共引文献90

同被引文献111

引证文献10

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部