摘要
文本特征提取(文本输入表示)作为文本分类技术的要点,其构建质量直接影响着分类系统的分类效果.现在最流行的文本输入表示——词向量(Word Vector)虽然考虑了词的相似性但忽略了局部词序特征,在一些情况下造成文本语义上的缺失和歪曲.为此,本文提出了一种结合N-Gram特征与Word2vec的词向量模型WordNG-Vec,其提取出的词向量(Word-NG向量),作为双通道卷积神经网络模型(DC-CNN)的输入.经过多组对比实验分析表明,在精确率(precision)和召回率(recall)和F1值三个评价指标下,本文提出的方法有效提高文本分类的效果.
Text feature extraction( text input representation) as the main point of text classification technology,its construction quality directly affects the classification effect of the classification system. Nowadays,the most popular text input representation—Word Vector,while taking into account the similarity of words but ignoring the local word order features,in some cases causes the lack of textual semantics and distortions. For this reason,this paper proposes a word vector model WordNG-Vec which combines N-Gram features with Word2 vec,and extracts the word vector( Word-NG vector) as the input of the dual-channel convolutional neural network model( DC-CNN). After several groups comparative experiments,it is shown that under the three evaluation indexes of precision,recall and F1,the proposed method can effectively improve the effect of text classification.
作者
王勇
何养明
邹辉
黎春
陈荟西
WANG Yong;HE Yang-ming;ZOU Hui;LI Chun;CHEN Hui-xi(College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第3期499-502,共4页
Journal of Chinese Computer Systems
基金
国家社会科学基金项目(17XXW005)资助