摘要
本文针对中文文本主题词提取的TFIDF算法不足进行了改进,综合考虑关键词在文本中出现的频率及位置权重,设计了贝叶斯推理和TFIDF主题词提取混合算法,并基于候选词排序位置进行了正向、逆向和中间向前后的提取测试,结果表明,本算法比单纯TFIDF算法正向提取平均准确率提高了6.2%.
The shortcoming of the TFIDF algorithm is improved for Chinese text topic word extraction. This paper considers the keywords appearing frequency,position weight in the text,the hybrid algorithm of Bayesian Reasoning and TFIDF was designed to extracte topic words,and the topic words was extracted from forward,reverse and middle based on sorting position of the candidate words. The results was higher average accuracy than the simple TFIDF by 6. 2%.
出处
《南京师大学报(自然科学版)》
CAS
CSCD
北大核心
2014年第1期57-60,65,共5页
Journal of Nanjing Normal University(Natural Science Edition)
基金
科技部国家中小企业创新基金项目(11C26213204533)
徐州市科技计划项目(XF11C052)
关键词
贝叶斯推理
位置权重
提取
TFIDF算法
Bayesian reasoning
position weight
topic words extraction
TFIDF algorithm