利用主题内容排序的伪相关反馈

Using Topic Content Ranking for Pseudo Relevance Feedback

下载PDF

导出

摘要传统的伪相关反馈(pseudo relevance feedback,PRF)方法,将文档作为基本抽取单元进行查询扩展,抽取粒度过大造成扩展源中噪音量的增加。研究利用主题分析技术来减轻扩展源的低质量现象。通过获取隐藏在伪相关文档集(pseudo-relevant set)各文档内容中的语义信息,并从中提取与用户查询相关的抽象主题内容作为基本抽取单元用于查询扩展。在NTCIR 8中文语料上,与传统PRF方法和基于主题模型的PRF方法相比较,实验结果表明该方法可以抽取出更符合用户查询的扩展词。此外,结果显示从更小的主题内容粒度出发进行查询扩展,可以有效提升检索性能。 Traditional pseudo relevance feedback(PRF)algorithms use the document as a unit to extract words for query expansion,which will increase the noise of expansion source due to the larger extraction unit.This paper exploits the topic analysis techniques so as to alleviate the low quality of expansion source condition.Obtain semantic information hidden in the content of each document of pseudo-relevant set,and extract the abstract topic content information according to the relevance of the user query,which is described as a basic extraction unit to be used for query expansion.Compared with the traditional PRF algorithms and the PRF based on topic model algorithm,the experimental results on NTCIR8dataset show that the scheme in this paper can effectively extract more appropriate expansion terms.In addition,the results also show that the scheme in this paper has a positive impact to improve the retrieval performance on a smaller topic content granularity level.

作者闫蓉高光来 YAN Rong;GAO Guanglai(College of Computer Science, Inner Mongolia University, Hohhot 010021, China)

机构地区内蒙古大学计算机学院

出处《计算机科学与探索》 CSCD 北大核心 2017年第5期814-821,共8页 Journal of Frontiers of Computer Science and Technology

基金国家自然科学基金No.61263037 内蒙古自然科学基金Nos.2014BS0604 2014MS0603~~

关键词主题模型主题内容伪相关反馈 topic model topic content pseudo relevance feedback (PRF)

分类号 TP391.3 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1闫蓉,高光来.面向词义消歧的词语相关度计算[J].计算机工程与应用,2012,48(27):109-113. 被引量：2

二级参考文献11

1许云,樊孝忠,张锋.基于知网的语义相关度计算[J].北京理工大学学报,2005,25(5):411-414. 被引量：53
2刘群李素建.基于《知网》的词汇语义相似度计算[C]..第三界汉语词汇语义研讨会[C].台北,2002..
3Mohammad S, Hirst G.Distributional measures as proxies for semantic relatedness[EB/OL]. (2005).http ://www.cs.to- ronto.edu/compling/Publications.
4Budanitsky A, Hirst G.Evaluation WordNet-based mea- sures of lexical semantic relatedness[J].Computational Lin- guistics, 2006,32( 1 ) : 13-47.
5Gao J F,Zhou M,Nie J ambiguity using a decaying Y.Resolving query translation model and syn- tactic dependence relations[C]//Proceedings of the 25th Annual International ACM search and Development in pere, Finland, 2002 : 183-190. SIGIR Conference on Re- Information Retrieval, Tam-.
6Wang Hongling, Lv Qiang, Xu Rui, et al.Knowledge-based computational modeling on semantic relevancy between words[C]//Proceedings of the 7th International Conference on Chinese Computing, Wuhan, China, 2007 : 186-190.
7董振东,董强.知网一知网简介[EB/OL].http://www.keen-age.com.
8Gale W A,Church K W,Yarowsky D.One sense per discourse[C]//Proceedings of the 4th ARPA Speech and Natural Language Workshop.San Francisco: Morgan Kaufmann, 1992: 233-237.
9闫蓉.基于语义相关度计算的汉语词义消歧方法研究[J].内蒙古大学学报（自然科学版）,2007,38(6):693-697. 被引量：2
10鲁松,白硕.自然语言处理中词语上下文有效范围的定量描述[J].计算机学报,2001,24(7):742-747. 被引量：47

共引文献1

1闫蓉,高光来.上下文边界可变的词义消歧[J].计算机工程与设计,2015,36(10):2843-2848. 被引量：2

1吴泽平.Excel文件的加密方法[J].办公自动化,2003(6):49-49.
2陈晓红.在Word 2000中快速将文档内容变成图片[J].大众软件,2002(16):80-80.
3岳小冰,蔡丽霞.数字签名保护Word文档[J].电脑爱好者,2015,0(6):56-57.
4王楠,赵娟,孟锐.基于授权的XML文档安全策略[J].价值工程,2012,31(19):239-240.
5技术[J].保密科学技术,2014(7):73-73.
6Word 2003 文档内容泄密探究[J].软件指南,2005(3):68-69.
7家庭娱乐／商务办公[J].电脑时空,2007(3):78-78.
8南湖秋水.用FreeMyPDF在线解密PDF文档[J].网友世界,2010(3):67-67.
9Office加油站[J].电脑迷,2008,0(11):82-82.
10吕碧波,赵军.基于相关文档池建模的查询扩展[J].中文信息学报,2006,20(3):78-83. 被引量：7

计算机科学与探索

2017年第5期

浏览历史

内容加载中请稍等...

利用主题内容排序的伪相关反馈

参考文献1

二级参考文献11

共引文献1

相关作者

相关机构

相关主题

浏览历史