At present, how to enable Search Engine to construct user personal interest model initially, master user's personalized information timely and provide personalized services accurately have become the hotspot in the r...At present, how to enable Search Engine to construct user personal interest model initially, master user's personalized information timely and provide personalized services accurately have become the hotspot in the research of Search Engine area. Aiming at the problems of user model's construction and combining techniques of manual customization modeling and automatic analytical modeling, a User Interest Model (UIM) is proposed in the paper. On the basis of it, the corresponding establishment and update algorithms of User lnterest Profile (UIP) are presented subsequently. Simulation tests proved that the UIM proposed and corresponding algorithms could enhance the retrieval precision effectively and have superior adaptability.展开更多
1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文...1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文档的查询难以做到全面而精确。衡量Web文档查询的质量主要有两个方面:①是否能把所有相关的文档资源找出来,不要有所遗漏。展开更多
In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the char...In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user' s queries and clicked URLs present dramatic locality, which implies that query cache and 'hot click' cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution character-istics of web information are also analyzed, which demonstrates that the link popularity and replica pop-ularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.展开更多
基金Supported by the National Natural Science Foundation of China (50674086)the Doctoral Foundation of Ministry of Education of China (20060290508)the Youth Scientific Research Foundation of CUMT (0D060125)
文摘At present, how to enable Search Engine to construct user personal interest model initially, master user's personalized information timely and provide personalized services accurately have become the hotspot in the research of Search Engine area. Aiming at the problems of user model's construction and combining techniques of manual customization modeling and automatic analytical modeling, a User Interest Model (UIM) is proposed in the paper. On the basis of it, the corresponding establishment and update algorithms of User lnterest Profile (UIP) are presented subsequently. Simulation tests proved that the UIM proposed and corresponding algorithms could enhance the retrieval precision effectively and have superior adaptability.
文摘1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文档的查询难以做到全面而精确。衡量Web文档查询的质量主要有两个方面:①是否能把所有相关的文档资源找出来,不要有所遗漏。
基金This work was supported by the National Grand Fundamental Research of China ( Grant No. G1999032706).
文摘In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user' s queries and clicked URLs present dramatic locality, which implies that query cache and 'hot click' cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution character-istics of web information are also analyzed, which demonstrates that the link popularity and replica pop-ularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.