INFORMATION RETRIEVAL FOR SHORT DOCUMENTS 被引量：2

INFORMATION RETRIEVAL FOR SHORT DOCUMENTS

下载PDF

导出

摘要 The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is available, the word-use variability problem will have substantial impact on the Information Retrieval （IR） performance. To solve the problem, a new technology to short document retrieval named Reference Document Model （RDM） is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold：（1） Pseudo feedback both for the query and the document; （2） Building the query model and the document model from reference documents; （3） Flexible indexing units, which can be ally linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference （TREC） test sets. Results also show that the shorter the document, the better the RDM performance. The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the ab-stract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be any linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.

作者 Qi Haoliang Li Mu Gao Jianfeng Li Sheng

机构地区 Ministry of Education - Microsoft Key Laboratory of Natural Language Processing and Speech （Harbin Institute of Technology Mtcrosoft Research Asia Microsoft Research

出处《Journal of Electronics(China)》 2006年第6期933-936,共4页 电子科学学刊（英文版）

基金 Supported by the Funds of Heilongjiang Outstanding Young Teacher (1151G037).

关键词 Information retrieval Short documents Reference Document Model （RDM） 信息恢复短文档基准文档模型信息论

分类号 TN911.2 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献10

1G. Salton,C. Buckley.Improving retrieval per- formance by relevance feedback[].Journal of the American Society for Information Science.1990
2R. Fidel.Individual variability in online searching behavior[].Proceedings of the American Society for Information Science (ASIS) th Annual Meeting.1985
3G. Salton,A. Wong, C.,S. Yang.A vector space model for information retrieval[].Communications of the ACM.1975
4M. J. Bates.Subject access in online catalogs: a de- sign model[].Journal of the American Society for In- formation Science and Technology.1986
5C. Raman,C. Harr,C. O. Simon, et al.Subwebs for specialized search[].Proceedings of the th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’).2004
6S. Deerwester,S. T. Dumais,G. W. Furnas, et al.Indexing by latent semantic analysis[].Journal of the American Society for Information Science.1990
7C. Zhai.Risk minimization and language modeling in text retrieval[]..2002
8D. Tarr,H. Borko.Factors influencing inter-indexer consistency[].Proceedings of the American Society for Information Science (ASIS) th Annual Meet- ing.1974
9J. Lafferty,C. Zhai.Document language models, query models, and risk minimization for informa- tion retrieval[].Proceedings of the th Annual In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’ ).2001
10D. Harman.Relevance feedback revisited[].Proceed- ings of the th Annual International ACM SIGIR Conference on Research and Development in In- formation Retrieval (SIGIR’).1992

同被引文献3

1Qi Haoliang,Li Sheng,Gao Jianfeng,Han Zhongyuan,Xia Xinsong.ORDINAL REGRESSION FOR INFORMATION RETRIEVAL[J].Journal of Electronics(China),2008,25(1):120-124. 被引量：2
2黄骥,姜利群,殷兆麟.一种新型的基于样本的Web信息检索技术[J].微计算机信息,2009,25(3):238-239. 被引量：5
3孙育华,韩中元,韩咏,李军.中文信息检索中多索引策略融合的研究[J].黑龙江工程学院学报,2009,23(4):44-46. 被引量：4

引证文献2

1韩中元,马威,孙育华,韩咏,崔硕.基于SQL Server 2008的小型信息检索系统框架的设计与实现[J].电脑编程技巧与维护,2010(10):44-46. 被引量：2
2韩中元,韩咏,马威,崔硕.中文信息检索中二元文法索引策略的改进[J].微计算机信息,2010,26(15):33-34. 被引量：2

二级引证文献3

1王长青.信息检索中快速索引文件的设计研究[J].佳木斯教育学院学报,2011(2):427-427. 被引量：1
2黎邦群.基于ASP的书目智能检索程序的设计与实现[J].中华医学图书情报杂志,2012,21(1):53-55. 被引量：2
3万福成,李冬晨,何向真,徐涛.面向信息检索的藏文文本索引策略研究[J].计算机工程与应用,2014,50(7):208-211. 被引量：1

1DU Jia-li,LIU Yuan-yuan,YU Ping-fang.Chinese-based research on subject-covered information retrieval supervised by textual semantic domain[J].通讯和计算机（中英文版）,2009,6(7):68-78.
2李国辉,武德峰,张军.Concept Framework for Audio Information Retrieval： ARF[J].Journal of Computer Science & Technology,2003,18(5):667-673.

Journal of Electronics(China)

2006年第6期

浏览历史

内容加载中请稍等...

INFORMATION RETRIEVAL FOR SHORT DOCUMENTS 被引量：2

参考文献10

同被引文献3

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史