We present a statistical method called Covering Topic Score (CTS) to predict query performance for information retrieval. Estimation is based on how well the topic of a user's query is covered by documents retrieve...We present a statistical method called Covering Topic Score (CTS) to predict query performance for information retrieval. Estimation is based on how well the topic of a user's query is covered by documents retrieved from a certain retrieval system. Our approach is conceptually simple and intuitive, and can be easily extended to incorporate features beyond bag- of-words such as phrases and proximity of terms. Experiments demonstrate that CTS significantly correlates with query performance in a variety of TREC test collections, and in particular CTS gains more prediction power benefiting from features of phrases and proximity of terms. We compare CTS with previous state-of-the-art methods for query performance prediction including clarity score and robustness score. Our experimental results show that CTS consistently performs better than, or at least as well as, these other methods. In addition to its high effectiveness, CTS is also shown to have very low computational complexity, meaning that it can be practical for real applications.展开更多
基金Supported by the National Natural Science Foundation of China under Grant No.60603094 (国家自然科学基金) the National Basic Research Program of China under Grant No.2004CB318109 (国家重点基础研究发展计划(973)) the Beijing Science and Technology Planning Program of China under Grant No.D0106008040291 (北京市科技计划)
基金the National Natural Science Foundation of China under Grant No.60603094the National Grand Fundamental Research 973 Program of China under Grant No.2004CB318109
文摘We present a statistical method called Covering Topic Score (CTS) to predict query performance for information retrieval. Estimation is based on how well the topic of a user's query is covered by documents retrieved from a certain retrieval system. Our approach is conceptually simple and intuitive, and can be easily extended to incorporate features beyond bag- of-words such as phrases and proximity of terms. Experiments demonstrate that CTS significantly correlates with query performance in a variety of TREC test collections, and in particular CTS gains more prediction power benefiting from features of phrases and proximity of terms. We compare CTS with previous state-of-the-art methods for query performance prediction including clarity score and robustness score. Our experimental results show that CTS consistently performs better than, or at least as well as, these other methods. In addition to its high effectiveness, CTS is also shown to have very low computational complexity, meaning that it can be practical for real applications.