期刊文献+

结合LDA与Self-Attention的短文本情感分类方法 被引量:8

Short Text Emotion Classification Method Combining LDA and Self-Attention
下载PDF
导出
摘要 在对短文本进行情感分类任务的过程中,由于文本长度过短导致数据稀疏,降低了分类任务的准确率。针对这个问题,提出了一种基于潜在狄利克雷分布(LDA)与Self-Attention的短文本情感分类方法。使用LDA获得每个评论的主题词分布作为该条评论信息的扩展,将扩展信息和原评论文本一起输入到word2vec模型,进行词向量训练,使得该评论文本在高维向量空间实现同一主题的聚类,使用Self-Attention进行动态权重分配并进行分类。通过在谭松波酒店评论数据集上的实验表明,该算法与当前主流的短文本分类情感算法相比,有效地提高了分类性能。 In the process of the short text emotional classification tasks,the data is sparse due to the short text length,which reduces the accuracy of classification tasks.To solve this problem,this paper proposes a short text emotional classification method based on Latent Dirichlet Allocation(LDA)and Self-Attention.LDA is used to obtain the topic word distribution of each comment as the extension of the comment information.The extended information and the original comment text are input into word2vec model to train the word vector,so that the comment text can cluster the same topic in high-dimensional vector space.Self-Attention is used for dynamic weight allocation and classification.The experiment on Tan Songbo hotel review data set shows that the algorithm in this paper improves the classification performance effectively compared with the current mainstream short text emotional classification algorithm.
作者 陈欢 黄勃 朱翌民 俞雷 余宇新 CHEN Huan;HUANG Bo;ZHU Yimin;YU Lei;YU Yuxin(School of Electrical and Electronic Engineering,Shanghai University of Engineering and Technology,Shanghai 201620,China;Jiangxi Collaborative Innovation Center for Economic Crime Detection and Prevention and Control,Nanchang 330103,China;School of Economics and Finance,Shanghai International Studies University,Shanghai 201620,China)
出处 《计算机工程与应用》 CSCD 北大核心 2020年第18期165-170,共6页 Computer Engineering and Applications
基金 国家自然科学基金青年基金(No.61603242) 江西省经济犯罪侦查与防控技术协同创新中心开放基金(No.JXJZXTCX-030)。
关键词 主题词 短文本 Self-Attention 潜在狄利克雷分布(LDA) word2vec topic word short text Self-Attention Latent Dirichlet Allocation(LDA) word2vec
  • 相关文献

参考文献13

二级参考文献92

  • 1徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181-184. 被引量:56
  • 2谌志群,张国煊.文本挖掘研究进展[J].模式识别与人工智能,2005,18(1):65-74. 被引量:50
  • 3许云,樊孝忠,张锋.一种不需分词的中文文本分类方法[J].北京理工大学学报,2005,25(9):778-781. 被引量:5
  • 4杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量:192
  • 5耿焕同,蔡庆生,于琨,赵鹏.一种基于词共现图的文档主题词自动抽取方法[J].南京大学学报(自然科学版),2006,42(2):156-162. 被引量:30
  • 6Hotho A, Staab S, Stumme G. Ontologies Improve Text Document Clustering[ C ]. In : Proceedings of the 3rd IEEE International Con- ference on Data Mining ( ICDM' 03 ). Washington, D C : IEEE Computer Society, 2003:541 -544.
  • 7Pinto D, Rosso P, Benajiba Y, et al. Word Sense Induction in the Arabic Language: A Self- Term Expansion Based Approach [ C ]. In: Proceedings of the 7 th Conference on Language Engineering of the Egyptian Society of Language Engineering ( ESOLE 2007 ). 2007 : 235 - 245.
  • 8Banerjee S, Ramanathan K, Gupta A. Clustering Short Texts Using Wikipedia[ C]. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). New York: ACM, 2007:787-788.
  • 9Pinto D, Jimnez - Salazar H, Rosso P. Clustering Abstracts of Scientific Texts Using the Transition Point Technique [ C ]. In: Proceedings of the 7 th International Conference on Computational Linguistics and Intelligent Text Processing ( CICLing' 06 ). Heidel- berg, Berlin : Springer - Verlag, 2006 : 536 - 546.
  • 10Fan X, Hu H. A New Model for Chinese Short - text Classification Considering Feature Extension [ C ]. In : Proceedings of the Interna- tional Conference on Artificial Intelligence and Computational Intel- ligence (A1CI' 10). Washington, D C: IEEE Computer Society, 2010,2:7 -11.

共引文献558

同被引文献79

引证文献8

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部