摘要
将信息熵融入TextRank算法中分词器部分以改善关键词抽取的准确度。在分词前,先用信息熵的方式提取文章的关键新词,加入到分词字典中,使分词器能够自主识别新词,以增强文章关键词提取的准确性。当语料中存在新词的时候,关键词抽取准确度效果提升明显,对不存在新词的语料关键词抽取准确度无明显提升。改善分词效果可以提升关键词抽取准确度。
Information entropy is incorporated into the word segmentation part of the TextRank algorithm to improve the accuracy of keyword extraction.Before word segmentation,the key new words of the article are first extracted by information entropy and added to the word segmentation dictionary,so that the word segmenter can recognize new words autonomously to enhance the accuracy of the keyword extraction of the article.When there are new words in the corpus,the accuracy of keyword extraction is significantly improved.There is no improvement in the accuracy of corpus keyword extraction without new words.Improving the performance of word segmentation can improve the accuracy of keyword extraction.
作者
于腊梅
杨良斌
YU Lamei;YANG Liangbin(School of Information Science and Technology,University of International Relations,Beijing 100091)
出处
《计算机与数字工程》
2022年第3期516-519,579,共5页
Computer & Digital Engineering
基金
国家安全高精尖学科建设科研专项(学校基金)“国家安全视角下社交网络的实体识别与影响机制研究”(编号:2019GA37)资助。