摘要
社交软件的普及使得从海量数字文本中挖掘有效信息成为一个热点问题,经典主题模型LDA和LSA均基于单词共现来捕获主题信息,忽略了单词间的位置信息。为此,设计主题与单词间的注意力机制并将主题信息和单词信息融入到LDA框架中,构建一种主题模型JEA-LDA。该模型通过单词与主题间的注意力机制将单词信息和主题信息融合为特征表示,用于LDA模型的主题提取。实验结果表明,相比LDA、DMM等模型,该模型的主题一致性和分类性能均较高,能够取得更好的主题提取效果。
With the popularity of social software,mining effective information from massive digital documents has been a hotspot.The classic topic models including LDA and LSA capture topic information based on word co-occurrence and ignore the context information of words.To address the problem,this paper designs an attention mechanism between words and topics,integrates the topic information and word information into the LDA framework,and on this basis constructs a JEA-LDA topic model.The model uses the attention mechanism between words and topics to merge the word information and topic information into feature representation for topic extraction of the LDA model.The experimental results show that compared with LDA,DMM and other models,the proposed model has better performance in topic coherence and classification tasks,and improves the topic extraction results.
作者
覃婷婷
刘峥
陈可佳
QIN Tingting;LIU Zheng;CHEN Kejia(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2020年第11期104-108,共5页
Computer Engineering
基金
南京邮电大学引进人才科研启动基金(NY215045)
南京邮电大学国家自然科学基金孵化项目(NY219084)。