The procedure of hypertext induced topic search based on a semantic relation model is analyzed, and the reason for the topic drift of HITS algorithm was found to prove that Web pages are projected to a wrong latent se...The procedure of hypertext induced topic search based on a semantic relation model is analyzed, and the reason for the topic drift of HITS algorithm was found to prove that Web pages are projected to a wrong latent semantic basis. A new concept-generalized similarity is introduced and, based on this, a new topic distillation algorithm GSTDA(generalized similarity based topic distillation algorithm) was presented to improve the quality of topic distillation. GSTDA was applied not only to avoid the topic drift, but also to explore relative topics to user query. The experimental results on 10 queries show that GSTDA reduces topic drift rate by 10% to 58% compared to that of HITS(hypertext induced topic search) algorithm, and discovers several relative topics to queries that have multiple meanings.展开更多
从语义相关性角度分析超链归纳主题搜索(HITS)算法,发现其产生主题漂移的原因在于页面被投影到错误的语义基上,因此引入局部密集因子LDF(Local Density Factor)的概念。为了解决Web内容的重叠性,基于切平面的概念提出了一种新的主题提...从语义相关性角度分析超链归纳主题搜索(HITS)算法,发现其产生主题漂移的原因在于页面被投影到错误的语义基上,因此引入局部密集因子LDF(Local Density Factor)的概念。为了解决Web内容的重叠性,基于切平面的概念提出了一种新的主题提取算法(CPTDA)。CPTDA不但可以发现用户最感兴趣的主题页面集合,还可以发现与查询相关的其他页面集合。在10个查询上的实验结果表明,与HITS算法相比,CPTDA算法不仅可以减少30%-52%的主题漂移率,而且可以发现与查询相关的多个主题。展开更多
根据专业搜索引擎的特点,提出了一种新颖的基于词语共现与HITS算法的查询推荐算法QR-CH(Query Recom-mendation algorithm based on word Co-occurrence and HITS algorithm)。该算法一方面利用HITS算法对基于词语共现筛选出的关联词按...根据专业搜索引擎的特点,提出了一种新颖的基于词语共现与HITS算法的查询推荐算法QR-CH(Query Recom-mendation algorithm based on word Co-occurrence and HITS algorithm)。该算法一方面利用HITS算法对基于词语共现筛选出的关联词按语义关联性进行排序,选取排序靠前的关联词作为推荐词,提高了推荐词与原查询词的相关性;另一方面使用HITS算法排序关联文档,从查询结果文档集的角度来判断推荐是否冗余,降低了推荐词的冗余性。该算法将推荐相关的信息存储到知识树中,利用知识树实现查询推荐。实验结果表明QR-CH算法在推荐词的相关性和冗余词的判断方面均优于文献中已有的类似算法。展开更多
Microblogging, a popular social media service platform, has become a new information channel for users to receive and exchange the most up-to-date information on current events. Consequently, it is a crucial platform ...Microblogging, a popular social media service platform, has become a new information channel for users to receive and exchange the most up-to-date information on current events. Consequently, it is a crucial platform for detecting newly emerging events and for identifying influential spreaders who have the potential to actively disseminate knowledge about events through microblogs. However, traditional event detection models require human intervention to detect the number of topics to be explored, which significantly reduces the efficiency and accuracy of event detection. In addition, most existing methods focus only on event detection and are unable to identify either influential spreaders or key event-related posts, thus making it challenging to track momentous events in a timely manner. To address these problems, we propose a Hypertext-Induced Topic Search(HITS) based Topic-Decision method(TD-HITS), and a Latent Dirichlet Allocation(LDA) based Three-Step model(TS-LDA). TDHITS can automatically detect the number of topics as well as identify associated key posts in a large number of posts. TS-LDA can identify influential spreaders of hot event topics based on both post and user information.The experimental results, using a Twitter dataset, demonstrate the effectiveness of our proposed methods for both detecting events and identifying influential spreaders.展开更多
基金Supported by the Shaanxi Provincial Educational Depar tment Special-Purpose Technology and Research of China (06JK229)
文摘The procedure of hypertext induced topic search based on a semantic relation model is analyzed, and the reason for the topic drift of HITS algorithm was found to prove that Web pages are projected to a wrong latent semantic basis. A new concept-generalized similarity is introduced and, based on this, a new topic distillation algorithm GSTDA(generalized similarity based topic distillation algorithm) was presented to improve the quality of topic distillation. GSTDA was applied not only to avoid the topic drift, but also to explore relative topics to user query. The experimental results on 10 queries show that GSTDA reduces topic drift rate by 10% to 58% compared to that of HITS(hypertext induced topic search) algorithm, and discovers several relative topics to queries that have multiple meanings.
文摘从语义相关性角度分析超链归纳主题搜索(HITS)算法,发现其产生主题漂移的原因在于页面被投影到错误的语义基上,因此引入局部密集因子LDF(Local Density Factor)的概念。为了解决Web内容的重叠性,基于切平面的概念提出了一种新的主题提取算法(CPTDA)。CPTDA不但可以发现用户最感兴趣的主题页面集合,还可以发现与查询相关的其他页面集合。在10个查询上的实验结果表明,与HITS算法相比,CPTDA算法不仅可以减少30%-52%的主题漂移率,而且可以发现与查询相关的多个主题。
文摘根据专业搜索引擎的特点,提出了一种新颖的基于词语共现与HITS算法的查询推荐算法QR-CH(Query Recom-mendation algorithm based on word Co-occurrence and HITS algorithm)。该算法一方面利用HITS算法对基于词语共现筛选出的关联词按语义关联性进行排序,选取排序靠前的关联词作为推荐词,提高了推荐词与原查询词的相关性;另一方面使用HITS算法排序关联文档,从查询结果文档集的角度来判断推荐是否冗余,降低了推荐词的冗余性。该算法将推荐相关的信息存储到知识树中,利用知识树实现查询推荐。实验结果表明QR-CH算法在推荐词的相关性和冗余词的判断方面均优于文献中已有的类似算法。
基金supported by the National Natural Science Foundation of China(Nos.61502209 and 61502207)the Natural Science Foundation of Jiangsu Province of China(No.BK20130528)Visiting Research Fellow Program of Tongji University(No.8105142504)
文摘Microblogging, a popular social media service platform, has become a new information channel for users to receive and exchange the most up-to-date information on current events. Consequently, it is a crucial platform for detecting newly emerging events and for identifying influential spreaders who have the potential to actively disseminate knowledge about events through microblogs. However, traditional event detection models require human intervention to detect the number of topics to be explored, which significantly reduces the efficiency and accuracy of event detection. In addition, most existing methods focus only on event detection and are unable to identify either influential spreaders or key event-related posts, thus making it challenging to track momentous events in a timely manner. To address these problems, we propose a Hypertext-Induced Topic Search(HITS) based Topic-Decision method(TD-HITS), and a Latent Dirichlet Allocation(LDA) based Three-Step model(TS-LDA). TDHITS can automatically detect the number of topics as well as identify associated key posts in a large number of posts. TS-LDA can identify influential spreaders of hot event topics based on both post and user information.The experimental results, using a Twitter dataset, demonstrate the effectiveness of our proposed methods for both detecting events and identifying influential spreaders.