期刊文献+

一种短文本关键词抽取及扩展方法

A Method of Keyword Extraction and Extension in Short Text
下载PDF
导出
摘要 为提升抽取短文本关键词的准确率和召回率,并发掘出文中未出现但能很好表达短文主题的关键词,提出一种短文本关键词抽取及扩展方法。该方法在关键词抽取时,考虑了词的统计特征、主题特征及词搭配特征等多种特征,分步对词的评分进行修正,最终得到较为准确的关键词。关键词扩展时,通过计算抽取出的关键词与主题特征词之间的相似度,扩展出能够较好反应短文本主题的扩展关键词。考虑主题特征及关键词扩展时,需要有主题相关性较强的长文本语料库辅助。有相关性较强的长文本语料库时,该方法有较好的表现。 To improve the accuracy and recall rate of extracting keywords from short text,and to find keywords that do not appear in the text but can well express the theme of short text.A short text keyword extraction and extension method was proposed.When extracting keywords,this method takes into account the statistical features,subject features and collocation features of words,and revises the scores of words step by step,and finally obtains more accurate keywords.In keyword expansion,by calculating the similarity between extracted keywords and topic feature words,extended keywords that can better reflect short text topics are extended.When considering topic features and keyword extension,a long text corpus with strong topic relevance is needed.This method has better performance in long text corpus with strong correlation.
作者 徐立 XU Li(School of Software,Shangqiu Polytechnic,Shangqiu Henan 476100,China)
出处 《河北软件职业技术学院学报》 2021年第2期8-11,共4页 Journal of Hebei Software Institute
关键词 短文本 关键词抽取 词频 主题 词搭配 keyword extraction word frequency topic word collocation short text
  • 相关文献

参考文献7

二级参考文献65

  • 1张引,陈敏,廖小飞.大数据应用的现状与展望[J].计算机研究与发展,2013,50(S2):216-233. 被引量:379
  • 2以科技手段辅助网络舆情突发事件的监测分析——方正智思舆情辅助决策支持系统[J].信息化建设,2005(10):50-52. 被引量:16
  • 3马费成,望俊成,陈金霞,胡超.我国数字信息资源研究的热点领域:共词分析透视[J].情报理论与实践,2007,30(4):438-443. 被引量:79
  • 4刘知远.基于文档主题结构的关键词抽取方法研究[D].北京:清华大学,2011.
  • 5Mihalcea R, Tarau P. TextRank: Bringing Order into Texts [C]. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004: 404-411.
  • 6Frank E, Paynter G W, Witten I H, et al. Domain-Specific Keyphrase Extraction [C]. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. San Francisco: Morgan Kaufmann Publishers Inc., 1999: 668-673.
  • 7Turney P D. Learning Algorithms for Keyphrase Extraction [J]. Information Retrieval, 2000, 2(4): 303-336.
  • 8Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 9Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web [R]. Stanford InfoLab, 1999.
  • 10Kleinberg J M. Authoritative Sources in a Hyperlinked Environment[J]. Journal of the ACM, 1999, 46(5): 604-632.

共引文献165

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部