期刊文献+

词语位置加权TextRank的关键词抽取研究 被引量:76

Study on Keyword Extraction Using Word Position Weighted TextRank
原文传递
导出
摘要 把关键词抽取问题看作是构成文档词语的重要性排序问题,基于TextRank基本思想,构建候选关键词图,引入覆盖影响力、位置影响力和频度影响力用于计算词语之间的影响力概率转移矩阵,通过迭代法实现候选关键词分值计算,并挑选前N个作为关键词抽取结果。实验结果表明,对词语位置加权的TextRank方法优于传统的TextRank方法和基于LDA主题模型的关键词抽取方法。 The keyword extraction problem is taken as a word importance ranking problem. In this paper,candidate key- word graph is constructed based on TextRank, and the influences of word coverage, location and frequency are used to cal- culate the probability transition matrix, then, the word score is calculated by iterative method, and the top N candidate keywords are picked as the final results. Experimental results show that the proposed word position weighted TextRank method is better than the traditional TextRank method and LDA topic model method.
作者 夏天
出处 《现代图书情报技术》 CSSCI 北大核心 2013年第9期30-34,共5页 New Technology of Library and Information Service
基金 国家社会科学基金项目"Web2.0环境下的网络舆情采集与分析"(项目编号:09CTQ027) 国家社会科学基金重大项目"云计算环境下的信息资源集成与服务研究"(项目编号:12&ZD220)的研究成果之一
关键词 关键词抽取 词排序 TextRank 图模型LDA Keyword extraction Word rank TextRank Graph model LDA
  • 相关文献

参考文献11

  • 1Mihalcea R, Tarau P. TextRank : Bringing Order into Texts [ C ]. In: Proceedings of Empirical Methods in Natural Language Process- ing, Barcelona, Spain. 2004:404-411.
  • 2Frank E, Paynter G W, Witten I H, et al. Domain - Specific Key- phrase Extraction [ C ] In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. 1999 : 668 -673.
  • 3Turney P D. Learning Algorithms for Keyphrase Extraction[ J]. In- formation Retrieval, 2000, 2 (4) :303 - 336.
  • 4Pasquier C. Task 5 : Single Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation [ C ]. In : Pro- ceedings of the 5th International Workshop on Semantic Evaluation. Stroudsburg, PA, USA : Association for Computational Linguistics, 2010:154 - 157.
  • 5石晶,李万龙.基于LDA模型的主题词抽取方法[J].计算机工程,2010,36(19):81-83. 被引量:47
  • 6刘俊,邹东升,邢欣来,李英豪.基于主题特征的关键词抽取[J].计算机应用研究,2012,29(11):4224-4227. 被引量:30
  • 7Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[ J]. Journal of Machine Learning Research, 2003, 3: 993- 1022.
  • 8Page L, Brin S, Motwani R, et al. The PageRank Citation Rank- ing: Bringing Order to the Web [ R]. Stanford Digital Library Technologies Project, 1998.
  • 9Rajaraman A, Ullman J D. Mining of Massive Datasets[ M]. Cam- bride University Press. 2012 : 171 - 173.
  • 10夏天.中心网页中主题网页链接的自动抽取[J].山东大学学报(理学版),2012,47(5):25-31. 被引量:4

二级参考文献39

  • 1王琦,唐世渭,杨冬青,王腾蛟.基于DOM的网页主题信息自动提取[J].计算机研究与发展,2004,41(10):1786-1792. 被引量:81
  • 2耿焕同,蔡庆生,于琨,赵鹏.一种基于词共现图的文档主题词自动抽取方法[J].南京大学学报(自然科学版),2006,42(2):156-162. 被引量:30
  • 3王芳,于浩,谭红叶,赵铁军.基于链接分块的相关链接提取方法[J].计算机工程与应用,2006,42(31):110-113. 被引量:2
  • 4朱红灿,邹凯.基于机器学习的Web链接的抽取[J].情报理论与实践,2007,30(2):252-255. 被引量:2
  • 5刘兵.Web数据挖掘[M].北京:清华大学出版社,2009.
  • 6Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 7Caol J, Li Jintao, Zhang Yongdong, et al. LDA-based Retrieval Framework for Semantic News Video Retrieval[C]//Proc. of Conf. on Semantic Computing. Irvine, California, USA: IEEE Press, 2007.
  • 8Steyvers M, Griffiths T. Probabilistic Topic Models[M]//Landauer T, McNamara D, Dennis S, et al. Latent Semantic Analysis: A Road to Meaning. [S. l.]: MIT Press, 2006.
  • 9Griffiths T, Steyvers M. Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences, 2004, 101 (Suppl. 1 ): 5228-5235.
  • 10Nevada L V. Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2008: 569-577.

共引文献72

同被引文献567

引证文献76

二级引证文献548

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部