期刊文献+

WordNet和词向量相结合的句子检索方法 被引量:3

WordNet and Word Embedding Based Sentence Retrieval Method
下载PDF
导出
摘要 针对当前句子检索方法中因数据稀疏而存在的"词不匹配"问题,提出了一种Word Net和词向量相结合的句子检索方法。首先在Word Net语义关系图中应用个性化PageRank算法计算与查询项最相关的同义词集合,实现查询项扩展,从而在一定程度上解决了查询项数据稀疏的问题;然后利用在大规模语料中训练神经网络语言模型获取的词向量对查询项和句子进行表示;最后引入WMD(word mover's distance)计算查询项与句子的语义相似度,从而利用语义信息进一步降低"词不匹配"问题带来的影响,将句子按相似度值从高到低排序作为句子检索结果。文章方法在TREC2003和TREC2004会议的项目中进行评测,MAP和R-Precision值相较于次优结果分别提高了13.29%和13.54%。 A WordNet and Word Embedding based sentence retrieval method is proposed in this paper to solve the vocabulary mismatch problem rooted in the sparsity of sentences and queries.Firstly,we run the personalized PageRank algorithm over the graph representation of WordNet concepts and relations to obtain concepts related to the queries,which could partially settle the sparsity of the queries.Secondly,the word embeddings that represent semantics of the query and sentence are gained through training in large-scale corpus with the Continous Skip-gram Model.Finally,the ranked list of retrieval results is achieved by applying Word Mover's Distance(WMD) to calculate semantic similarity of query and sentence,which can further handle the vocabulary mismatch problem.The evaluation on TREC2003 and TREC2004 reveals that the proposed method is significantly superior to the baseline sentence retrieval method.The MAP and R-Precision are 13.29% and13.54% higher than the second best result.
出处 《信息工程大学学报》 2017年第4期486-491,共6页 Journal of Information Engineering University
基金 国家社会科学基金资助项目(14BXW028)
关键词 WORDNET 查询项扩展 词向量 语义相似度 句子检索 WordNet query expansion word embedding semantic similarity sentence retrieval
  • 相关文献

参考文献1

二级参考文献14

  • 1A Ittycheriah,S Roukos.IBM's statistical question answering system-TREC 11[C].The 11th Text REtrieval Conference,Gaithersburg,Maryland,USA,2002
  • 2H Yang,T S Chua.The integration of lexical knowledge and external resources for question answering[C].The 11th Text REtrieval Conference,Maryland,USA,2002
  • 3A C Emmanuel,W B Croft,V Murdock.Answer passage retrieval for question answering[C].The 27th Annual Int'l Conf on Research and Development in Information Retrieval,Sheffield,UK,2004
  • 4V Murdock,W B Croft.Simple translation models for sentence retrieval in factoid question answering[C].The SIGIR 2004 Workshop on Information Retrieval for Question Answering,Sheffield,UK,2004
  • 5W Bruce Croft,John Lafferty.Language Modeling for Information Retrieval[M].Amsterdam,Netherlands:Kluwer Academic Publishers,2003
  • 6C Zhai,J Lafferty.A study of smoothing techniques for language modeling applied to ad hoc information retrieval[C].The ACM SIGIR Conf on Research and Development in Information Retrieval,New Orleans,USA,2001
  • 7A Berger,R Caruana,D Cohn,et al.Briding the lexical chasm:Statistical approaches to answer-finding[C].The 23rd Annual Conf on Research and Development in Information Retrieval,Athens,Greece,2000
  • 8T Hofmann.Probabilistic latent semantic indexing[C].The 22nd Annual Int'l SIGIR Conf on Research and Development in Information Retrieval,Berkeley,USA,1999
  • 9J Ponte,W Bruce Croft.A language modeling approach to information retrieval[C].The 1998 ACM SIGIR,Melbourne,Australia,1998
  • 10V Lavrenko,W B Croft.Relevance-based language models[C].The 2001 ACM SIGIR Conf on Research and Development in Information Retrieval,New Orleans,USA,2001

共引文献7

同被引文献10

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部