摘要
在中文文本信息中,同一个语义往往有多种不同的表达方法,不同的个体对同一个词语理解也会有一定的偏差,这将导致在信息检索时,出现查询项与检索数据"词不匹配"的问题.虽然,模糊检索是改善这一问题的有效方法之一,但仅仅利用已知信息进行模糊检索,已不能满足充斥着大规模无标定文本信息的网络时代的检索需要.提出一个基于词向量的模糊检索查询扩展方法,通过词向量计算查询项的相似词,进而进行查询项扩展.相比与传统的模糊检索方法,在同一测试集中,基于词向量的模糊查询扩展方法测评出的查全率、查准率以及两者的调和平均数均得到了有效提升.
There are different ways to express the same word sense in Chinese.When different individuals learn and understand the same words,deviations will appear.This results in term mismatch between queries and documents.A fuzzy document retrieval system is one of the effective method to solve the problem.However,it can not achieve satisfying results,when we deal with large-scale unmarked data.An approach to query expansion based on word embedding in fuzzy document retrieval is proposed to settle the issue in this paper.The word embedding,being trained in a large number of corpus with the continuous bag-of-words model,is used to gain the similar word,and then the fuzzy query is expanded.Compared with the traditional fuzzy retrieval method,the recall ratio,precision ratio and the harmonic average of them are all increased.
作者
陈淑巧
邱东
江海欢
CHEN Shuqiao;QIU Dong;JIANG Haihuan(College of Mathematics and Physics, Chongqing University of Posts and Telecommunications, Chongqing 400065)
出处
《四川师范大学学报(自然科学版)》
CAS
北大核心
2019年第1期92-97,共6页
Journal of Sichuan Normal University(Natural Science)
基金
国家自然科学基金(11671001和61472056)
关键词
词向量
模糊查询项扩展
信息检索
word embedding
fuzzy query expansion
information retrieval