摘要
为解决传统中文文本语义相似度计算存在的语义及句法信息缺失、人工提取特征误差较大等问题,融合词向量与卷积神经网络方法,构建中文文本语义相似度计算模型,并给出WV-CNN(Word Vector-Convolutional Neural Network)文本语义相似度计算方法。通过Embedding层将词语向量化后的结果作为CNN的输入,CNN中设置了卷积、Dropout、池化和Flatten4层网络,经过参数选择、训练、调优后输出结果。选取第6届全国数据挖掘竞赛提供的数据集以及在百度的WebQa数据集作为实验对象,使用Accuracy值、F1值、AUC值、KS值4种评测指标进行对比实验。结果表明,WV-CNN具有更好的计算精度和效果。
In order to solve the problems of semantic similarity calculation in traditional Chinese text,such as lack of semantic information,lack of syntax information,and large errors in artificial extraction features,a fusion word vector and deep learning method were proposed to construct a semantic similarity calculation model of Chinese text,and the computation method of the semantic similarity in Chinese texts was given based on WV-CNN or Word Vector-Convolutional Neural Network.).The vectorization result of the word was used as the input of the CNN through the Embedding layer.The CNN was configured with a convolutional,Dropout,pooling,and Flatten four-layer network,through parameter selection.The results were output after training and tuning.The data set provided by the 6 th National Data Mining Contest and the WebQa data set of Baidu were selected as the experimental subjects,and four evaluation indexes such as Accuracy,F1,AUC,and KSwere used for contrast experiments.The results show that WVCNN has better computation accuracy and effect.
作者
张春英
李春虎
付其峰
ZHANG Chun-ying;LI Chun-hu;FU Qi-feng(College of Science,Ndrth China Universityof Science and Technology,Tangshan Hebei 063210,China;College of Information Engineering,North China University of Science and Technology,Tangshan Hebei 063210,China)
出处
《华北理工大学学报(自然科学版)》
CAS
2019年第1期123-132,共10页
Journal of North China University of Science and Technology:Natural Science Edition
基金
河北省自然科学基金(F2016209344
F2018209374)