摘要
针对短文本简短的特性,为提高对其进行情感分类准确率,提出了T-CLSTM(Topic-based Context CLSTM)模型。该模型通过LDA模型生成词主题向量,并构建滑动窗口词主题上下文和层次词主题上下文,实现短文本信息扩展。探讨词主题、词主题上下文的构成,以及滑动窗口尺寸对词主题上下文的影响;将词向量和词主题上下文向量作为输入特征量训练分类模型,进行情感分类。在COAE2014语料上进行实验,结果表明,本文提出的模型分类准确率可达92.3%,相比baseline算法SVM和LSTM分别提高2%和4%。
In order to improve the accuracy of sentiment classification of short text,a T-CLSTM model was proposed to according to its characteristic.The model generates word topic vectors with LDA model,and constructs sliding window word topic context and hierarchical word topic context to extend the short text information.The composition of word topic,word topic context and the effect of the sliding window size on the topic context were discussed.The word vector and word topic context vectors are used as input features to train models for sentiment classification.Experimental results on the COAE2014 corpus show that the proposed model can obtain 92.3%accuracy,which is 2%and 4%higher than that of baseline algorithms SVM and LSTM.
作者
秦锋
黄超
郑啸
邵光梅
QIN Feng;HUANG Chao;ZHENG Xiao;SHAO Guangmei(College of Computer Science and Technology,Anhui University of Technology,Ma’anshan 243032 China)
出处
《安徽工业大学学报(自然科学版)》
CAS
2017年第3期289-295,共7页
Journal of Anhui University of Technology(Natural Science)
基金
国家自然科学基金项目(61402008
61402009)
安徽省科技重大专项(16030901060)
安徽省高校自然科学研究重大项目(KJ2014ZD05)
安徽省高校优秀青年人才支持计划