摘要
为提升抽取短文本关键词的准确率和召回率,并发掘出文中未出现但能很好表达短文主题的关键词,提出一种短文本关键词抽取及扩展方法。该方法在关键词抽取时,考虑了词的统计特征、主题特征及词搭配特征等多种特征,分步对词的评分进行修正,最终得到较为准确的关键词。关键词扩展时,通过计算抽取出的关键词与主题特征词之间的相似度,扩展出能够较好反应短文本主题的扩展关键词。考虑主题特征及关键词扩展时,需要有主题相关性较强的长文本语料库辅助。有相关性较强的长文本语料库时,该方法有较好的表现。
To improve the accuracy and recall rate of extracting keywords from short text,and to find keywords that do not appear in the text but can well express the theme of short text.A short text keyword extraction and extension method was proposed.When extracting keywords,this method takes into account the statistical features,subject features and collocation features of words,and revises the scores of words step by step,and finally obtains more accurate keywords.In keyword expansion,by calculating the similarity between extracted keywords and topic feature words,extended keywords that can better reflect short text topics are extended.When considering topic features and keyword extension,a long text corpus with strong topic relevance is needed.This method has better performance in long text corpus with strong correlation.
作者
徐立
XU Li(School of Software,Shangqiu Polytechnic,Shangqiu Henan 476100,China)
出处
《河北软件职业技术学院学报》
2021年第2期8-11,共4页
Journal of Hebei Software Institute
关键词
短文本
关键词抽取
词频
主题
词搭配
keyword extraction
word frequency
topic
word collocation
short text