A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phr...A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.展开更多
In search engines, different users may search for different information by issuing the same query. To satisfy more users with limited search results, search result diversification re-ranks the results to cover as many...In search engines, different users may search for different information by issuing the same query. To satisfy more users with limited search results, search result diversification re-ranks the results to cover as many user intents as possible. Most existing intent-aware diversification algorithms recognize user intents as subtopics, each of which is usually a word, a phrase, or a piece of description. In this paper, we leverage query facets to understand user intents in diversification, where each facet contains a group of words or phrases that explain an underlying intent of a query. We generate subtopics based on query facets and propose faceted diversification approaches. Experimental results on the public TREC 2009 dataset show that our faceted approaches outperform state-of-the-art diversification models.展开更多
The result merging for multiple Independent Resource Retrieval Systems (IRRSs), which is a key component in developing a meta-search engine, is a difficult problem that still not effectively solved. Most of the existi...The result merging for multiple Independent Resource Retrieval Systems (IRRSs), which is a key component in developing a meta-search engine, is a difficult problem that still not effectively solved. Most of the existing result merging methods, usually suffered a great influence from the usefulness weight of different IRRS results and overlap rate among them. In this paper, we proposed a scheme that being capable of coalescing and optimizing a group of existing multi-sources-retrieval merging results effectively by Discrete Particle Swarm Optimization (DPSO). The experimental results show that the DPSO, not only can overall outperform all the other result merging algorithms it employed, but also has better adaptability in application for unnecessarily taking into account different IRRS's usefulness weight and their overlap rate with respect to a concrete query. Compared to other result merging algorithms it employed, the DPSO's recognition precision can increase nearly 24.6%, while the precision standard deviation for different queries can decrease about 68.3%.展开更多
用户在搜索引擎结果页面的视觉注视行为,一直是信息检索领域的重要研究内容,有助于优化搜索引擎结果页面(Serach Engine Result Page,SERP)的布局,提升用户搜索效率。而针对用户在跨设备搜索情境下的SERP注视行为的研究还较少。本研究...用户在搜索引擎结果页面的视觉注视行为,一直是信息检索领域的重要研究内容,有助于优化搜索引擎结果页面(Serach Engine Result Page,SERP)的布局,提升用户搜索效率。而针对用户在跨设备搜索情境下的SERP注视行为的研究还较少。本研究通过跨设备搜索实验,对用户在不同跨设备情境下的SERP视觉行为分布展开研究。研究发现,用户在跨设备后,其视觉注意力相比之前有所分散,关注点减少。跨设备后,用户的"眼动熵"值在SERP的搜索结果列表中,呈现出总体上升的趋势。用户在跨设备后对SERP首屏的搜索结果区域内搜索结果摘要的关注度最高,对于记录跨设备历史信息的区域关注度提升最高,这说明搜索引擎为用户提供的跨设备历史信息能够有效地帮助用户恢复搜索任务,提高用户的搜索效率。在单条搜索结果区域内,跨设备前后用户的视觉分布不存在显著性差异。展开更多
基金Foundation item: Supported by the National Natural Science Foundation of China (60503020, 60503033, 60703086)Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow Uni-versity (KJS0714)+1 种基金Research Foundation of Nanjing University of Posts and Telecommunications (NY207052, NY207082)National Natural Science Foundation of Jiangsu (BK2006094).
文摘A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.
文摘In search engines, different users may search for different information by issuing the same query. To satisfy more users with limited search results, search result diversification re-ranks the results to cover as many user intents as possible. Most existing intent-aware diversification algorithms recognize user intents as subtopics, each of which is usually a word, a phrase, or a piece of description. In this paper, we leverage query facets to understand user intents in diversification, where each facet contains a group of words or phrases that explain an underlying intent of a query. We generate subtopics based on query facets and propose faceted diversification approaches. Experimental results on the public TREC 2009 dataset show that our faceted approaches outperform state-of-the-art diversification models.
基金Supported by the National Natural Science Foundation of China (No. 90818007)
文摘The result merging for multiple Independent Resource Retrieval Systems (IRRSs), which is a key component in developing a meta-search engine, is a difficult problem that still not effectively solved. Most of the existing result merging methods, usually suffered a great influence from the usefulness weight of different IRRS results and overlap rate among them. In this paper, we proposed a scheme that being capable of coalescing and optimizing a group of existing multi-sources-retrieval merging results effectively by Discrete Particle Swarm Optimization (DPSO). The experimental results show that the DPSO, not only can overall outperform all the other result merging algorithms it employed, but also has better adaptability in application for unnecessarily taking into account different IRRS's usefulness weight and their overlap rate with respect to a concrete query. Compared to other result merging algorithms it employed, the DPSO's recognition precision can increase nearly 24.6%, while the precision standard deviation for different queries can decrease about 68.3%.
文摘用户在搜索引擎结果页面的视觉注视行为,一直是信息检索领域的重要研究内容,有助于优化搜索引擎结果页面(Serach Engine Result Page,SERP)的布局,提升用户搜索效率。而针对用户在跨设备搜索情境下的SERP注视行为的研究还较少。本研究通过跨设备搜索实验,对用户在不同跨设备情境下的SERP视觉行为分布展开研究。研究发现,用户在跨设备后,其视觉注意力相比之前有所分散,关注点减少。跨设备后,用户的"眼动熵"值在SERP的搜索结果列表中,呈现出总体上升的趋势。用户在跨设备后对SERP首屏的搜索结果区域内搜索结果摘要的关注度最高,对于记录跨设备历史信息的区域关注度提升最高,这说明搜索引擎为用户提供的跨设备历史信息能够有效地帮助用户恢复搜索任务,提高用户的搜索效率。在单条搜索结果区域内,跨设备前后用户的视觉分布不存在显著性差异。