摘要
提出了一种基于字面相关性匹配和语义相关性匹配的深度神经网络模型,用来计算信息检索中查询和文档的匹配得分。字面相关性匹配模型基于查询和文档之间的词共现矩阵,主要考虑查询和文档的字面匹配信息以及匹配词的位置信息;语义相关性匹配模型基于预训练的词向量,进一步通过卷积神经网络提取查询和文档之间不同位置的语义匹配信息,最后的匹配得分是这两个子模型的叠加。损失函数采用hinge loss,通过最大化正负样本之间的分数差来更新参数。实验结果表明,模型在验证集上的NDCG@3和NDCG@5分别可以达到0.790 4和0.818 3,相对于BM25以及单个的字面匹配或者语义匹配模型来说都有很大的提升,这也验证了字面匹配和语义匹配对于信息检索的重要性。
A deep neural network based on lexical correlation matching and semantic correlation matching is proposed,which can be used to calculate the matching score of a query and a document in the information retrieval task. The lexical relevance matching model is based upon the word co-occurrence matrix of a query and a document,which takes the word matching information into consideration,so as to consider the position information of the matching word. The semantic relevance matching model is grounded in pre-trained word vector,then the convolution network extracts the semantic matching information between a query and different positions of the documents,where the final matching score is the superposition of the two sub-models. Model parameters are updated in the training process by maximizing the fractional difference between positive and negative samples. Experimental results indicate that the NDCG@3 and NDCG@5 of the model can attain to 0. 790 4 and 0. 818 3 respectively on the validation set. which significantly outperforms the baselines,verifying the importance of word and semantic matching for information retrieval.
作者
张芳芳
曹兴超
ZHANFG Fang-fang1,2 , CAO Xing-chao1,2(School of Information Science and Technology, Peking University, Beijing 100871, China; 2. Computer Center, Peking University, Beijing 100871, Chin)
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2018年第3期46-53,共8页
Journal of Shandong University(Natural Science)