摘要
提出了一种改进的文本表示模型提取文本特征词向量方法。首先构建基于词典索引和所对应的词性索引的double word-embedding列表的word-embedding词向量,其次,利用在此基础上Bi-LSTM循环神经网络对生成后的词向量进一步进行特征提取,最后,通过mean-pooling层处理句子向量后且使用了softmax层进行文本分类。实验验证了Bi-LSTM和double word-embedding神经网络相结合的模型训练效果与提取情况。实验结果表明,该模型不但能较好地处理高质量的文本特征向量提取和表达序列,而且比LSTM、LSTM+context window和Bi-LSTM这3种神经网络有较明显的表达效果。
Method of text representation model was proposed to extract word-embedding from text feature. Firstly, the word-embedding of the dual word-embedding list based on dictionary index and the corresponding part of speech index was created. Then, feature vectors was obtained further from these extracted word-embeddings by using Bi-LSTM recurrent neural network. Finally, the sentence vectors were processed by mean-pooling layer and text categorization was classified by softmax layer. The training effects and extraction performance of the combination model of Bi-LSTM and double word-embedding neural network were verified. The experimental results show that this model not only performs well in dealing with the high-quality text feature vector and the expression sequence, but also significantly outperforms other three kinds of neural networks, which includes LSTM, LSTM+context window andBi-LSTM.
作者
曾谁飞
张笑燕
杜晓峰
陆天波
ZENG Shui-fei ZHANG Xiao-yan DU Xiao-feng LU Tian-bo(School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China)
出处
《通信学报》
EI
CSCD
北大核心
2017年第4期86-98,共13页
Journal on Communications