摘要
针对藏文文本情感计算研究,将CNN-LSTM深度学习模型引入到藏文微博情感计算,弥补了少数语言自然语言处理研究的缺乏,对藏文研究具有一定的推动作用。针对藏文语料的不公开,通过藏文同反义情感词典对标注好的藏文微博语料中情感词汇的同反义词进行替换,进一步扩充了藏文微博语料,以适合深度学习对大数据语料的要求。藏文微博分词后,利用Word2vec工具训练出藏文微博词向量模型,提高特征向量对文本深层次语义信息的表达;然后,将训练好的词向量和对应的情感倾向标签直接引到由卷积层、池化层、LSTM层、全连接层等构成的CNN-LSTM模型,在每一层的输出做归一化处理;最后经过Softmax分类器对藏文微博进行情感倾向分类,并与LSTM以及传统的情感词典做了实验对比。结果表明,该算法获得了较好的分类效果。
Aiming at the study of Tibetan text emotion calculation,the CNN-LSTM deep learning model is introduced into Tibetan micro-blog emotion calculation,which makes up for the lack of research on minority language natural language processing,and has certain impetus to Tibetan studies.For the non-disclosure of Tibetan corpus,the Tibetan and the anti-sense sentiment dictionary are used to replace the antonyms of the emotional vocabulary in the Tibetan micro-blog corpus,further expanding the Tibetan micro-blog corpus to meet the requirements of deep learning to big data.After the Tibetan micro-blog’word segmentation,the Word2vec tool is used to train the Tibetan micro-blog’word vector model to improve the expression of the deep vector semantic information of the feature vector.Then,the trained word vector and the corresponding emotional tendency label are directly introduced into the CNN-LSTM model consisting of convolutional layer,pooling layer,flatten layer,LSTM layer,and the output at each layer will be batch normalization.Finally,the Softmax Classifier is used to affect the Tibetan micro-blog.Compared with LSTM and traditional sentiment lexicon,it shows that the proposed algorithm achieves better classification effect.
作者
孙本旺
田芳
SUN Ben-wang;TIAN Fang(Department of Computer Technology and Applications,Qinghai University,Xining 810016,China;Information Technology Center,Qinghai University,Xining 810016,China)
出处
《计算机技术与发展》
2019年第10期55-58,99,共5页
Computer Technology and Development
基金
国家自然科学基金(61461045)
青海省科技计划项目(2016-ZJ-743)
关键词
深度学习
藏文微博
词向量
情感计算
deep learning
Tibetan micro-blog
word vector
emotional calculation