Detecting and using bursty pattems to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to dete...Detecting and using bursty pattems to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.展开更多
Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these...Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the Fl-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively.展开更多
Friend recommendation plays a key role in promoting user experience in online social networks(OSNs).However,existing studies usually neglect users’fine-grained interest as well as the evolving feature of interest,whi...Friend recommendation plays a key role in promoting user experience in online social networks(OSNs).However,existing studies usually neglect users’fine-grained interest as well as the evolving feature of interest,which may cause unsuitable recommendation.In particular,some OSNs,such as the online learning community,even have little work on friend recommendation.To this end,we strive to improve friend recommendation with fine-grained evolving interest in this paper.We take the online learning community as an application scenario,which is a special type of OSNs for people to learn courses online.Learning partners can help improve learners’learning effect and improve the attractiveness of platforms.We propose a learning partner recommendation framework based on the evolution of fine-grained learning interest(LPRF-E for short).We extract a sequence of learning interest tags that changes over time.Then,we explore the time feature to predict evolving learning interest.Next,we recommend learning partners by fine-grained interest similarity.We also refine the learning partner recommendation framework with users’social influence(denoted as LPRF-F for differentiation).Extensive experiments on two real datasets crawled from Chinese University MOOC and Douban Book validate that the proposed LPRF-E and LPRF-F models achieve a high accuracy(i.e.,approximate 50%improvements on the precision and the recall)and can recommend learning partners with high quality(e.g.,more experienced and helpful).展开更多
基金Acknowledgements The authors thank the anonymous reviewers for their valuable and constructive comments. The work was partially supported by the National Natural Science Foundation of China (Grant No. 61502502), the National Basic Research Program (973 Program) of China (2014CB340403), Beijing Natural Science Foundation (4162032), and the Open Fund of Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, North China University of Technology, China.
文摘Detecting and using bursty pattems to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.
基金It was supported by the National Basic Research 973 Program of China under Grant No. 2013CB329604, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of Ministry of Education of China under Grant No. IRT13059, and the National Natural Science Foundation of China under Grant Nos. 61273297, 61229301 and 61503114.
文摘Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the Fl-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively.
基金the National Natural Science Foundation of China under Grant Nos.62172149,61632009,62172159,and 62172372the Natural Science Foundation of Hunan Province of China under Grant No.2021JJ30137+1 种基金the Open Project of ZHEJIANG LAB under Grant No.2019KE0AB02the Natural Science Foundation of Zhejiang Province of China under Grant No.LZ21F030001.
文摘Friend recommendation plays a key role in promoting user experience in online social networks(OSNs).However,existing studies usually neglect users’fine-grained interest as well as the evolving feature of interest,which may cause unsuitable recommendation.In particular,some OSNs,such as the online learning community,even have little work on friend recommendation.To this end,we strive to improve friend recommendation with fine-grained evolving interest in this paper.We take the online learning community as an application scenario,which is a special type of OSNs for people to learn courses online.Learning partners can help improve learners’learning effect and improve the attractiveness of platforms.We propose a learning partner recommendation framework based on the evolution of fine-grained learning interest(LPRF-E for short).We extract a sequence of learning interest tags that changes over time.Then,we explore the time feature to predict evolving learning interest.Next,we recommend learning partners by fine-grained interest similarity.We also refine the learning partner recommendation framework with users’social influence(denoted as LPRF-F for differentiation).Extensive experiments on two real datasets crawled from Chinese University MOOC and Douban Book validate that the proposed LPRF-E and LPRF-F models achieve a high accuracy(i.e.,approximate 50%improvements on the precision and the recall)and can recommend learning partners with high quality(e.g.,more experienced and helpful).