摘要
文本特征提取对短文本聚类效果至关重要,针对传统的基于统计学习的特征提取方法仅停留在特征词的层面,无法表达文本上下文语义特征的问题。基于此,笔者提出了一种基于word2vec词向量和卷积神经网络(Convolutional Neural Networks,CNN)的文本特征提取方法用于短文本聚类,首先利用word2vec工具训练大规模语料库中的词语,以低维向量的形式表征,然后利用CNN提取文本的深层语义特征,得到能够用于聚类的文本特征向量。实验结果表明,该方法可以有效提升短文本聚类的准确性。
Text feature extraction is very important for short text clustering.Traditional feature extraction methods based on statistical learning only stay at the level of feature words,which can not express the semantic features of text context.Based on this,the author proposes a text feature extraction method based on word 2vec and convolutional neural networks(CNN)for short text clustering.First,the words in large-scale corpus are trained by word 2vec tool,which are expressed in the form of low-dimensional vector,and then the deep semantic features of text are extracted by CNN to obtain the text that can be used for clustering This eigenvector.Experimental results show that this method can effectively improve the accuracy of short text clustering.
作者
杨俊峰
尹光花
Yang Junfeng;Yin Guanghua(School of Computer,Zhongyuan University of Technology,Zhengzhou Henan 450007,China)
出处
《信息与电脑》
2019年第24期20-22,共3页
Information & Computer