期刊文献+

基于邻域粗糙集的文本主题特征提取 被引量:5

Text Topic Feature Extraction Based on Neighborhood Rough Set
下载PDF
导出
摘要 LDA主题模型是一种有效的文本语义信息提取工具,利用在文档层中实现词项的共现,将词项矩阵转化为主题矩阵,得到主题特征;然而在生成文档过程中会蕴含冗余主题.针对LDA主题模型提取主题特征时存在冗余的不足,提出一种基于邻域粗糙集的LDA主题模型约简算法NRS-LDA.利用邻域粗糙集构造主题决策系统,通过预先设定主题个数,计算出每个主题的重要度;根据重要度进行排序,将排序后重要度低的主题删除.将提出的NRS-LDA算法应用于K-means文本聚类问题上并与传统的文本特征提取算法及改进的算法进行比较,结果表明NRS-LDA方法可以得到更高的聚类精度. LDA topic model is an effective tool for text feature extraction.Although the topic feature is obtained through the co-occurrence of the term in the document level,which transfers the term space into the topic space,the redundant topic is included in the process of generating the document.As to the redundant topic shortage during topic feature extraction by LDA,an LDA topic model reduction algorithm NRS-LDA based on neighborhood rough set was proposd.Based on the neighborhood rough set,the topic decision system was conducted.By pre-setting the number of topics,the importance of each topic was calculated.According to the importance degree the topics were sorted so as to delete low topics.The NRS-LDA algorithm was applied to the K-means text clustering problem and compared with the traditional extraction algorithm of text feature.The experimental results show that the proposed NRS-LDA method can obtain higher clustering accuracy.
作者 靳红伟 谢珺 续欣莹 JIN Hong-wei;XIE Jun;XU Xin-ying(College of Information and Computer Taiyuan University of Technology,Jinzhong 030600,China;College of Electrical and Power Engineering,Taiyuan 030024,China)
出处 《科学技术与工程》 北大核心 2019年第22期208-214,共7页 Science Technology and Engineering
基金 山西省回国留学人员科研项目(2015-045)资助
关键词 LDA主题模型 邻域粗糙集 文本特征提取 主题约简 LDA topic model neighborhood rough set text feature extraction topic reduction
  • 相关文献

参考文献14

二级参考文献124

共引文献909

同被引文献37

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部