Question answering (QA) over knowledge base (KB) aims to provide a structured answer from a knowledge base to a natural language question. In this task, a key step is how to represent and understand the natural langua...Question answering (QA) over knowledge base (KB) aims to provide a structured answer from a knowledge base to a natural language question. In this task, a key step is how to represent and understand the natural language query. In this paper, we propose to use tree-structured neural networks constructed based on the constituency tree to model natural language queries. We identify an interesting observation in the constituency tree: different constituents have their own semantic characteristics and might be suitable to solve different subtasks in a QA system. Based on this point, we incorporate the type information as an auxiliary supervision signal to improve the QA performance. We call our approach type-aware QA. We jointly characterize both the answer and its answer type in a unified neural network model with the attention mechanism. Instead of simply using the root representation, we represent the query by combining the representations of different constituents using task-specific attention weights. Extensive experiments on public datasets have demonstrated the effectiveness of our proposed model. More specially, the learned attention weights are quite useful in understanding the query. The produced representations for intermediate nodes can be used for analyzing the effectiveness of components in a QA system.展开更多
To develop a knowledge-aware recommender system,a key issue is how to obtain rich and structured knowledge base(KB)information for recommender system(RS)items.Existing data sets or methods either use side information ...To develop a knowledge-aware recommender system,a key issue is how to obtain rich and structured knowledge base(KB)information for recommender system(RS)items.Existing data sets or methods either use side information from original RSs(containing very few kinds of useful information)or utilize a private KB.In this paper,we present KB4Rec v1.0,a data set linking KB information for RSs.It has linked three widely used RS data sets with two popular KBs,namely Freebase and YAGO.Based on our linked data set,we first preform qualitative analysis experiments,and then we discuss the effect of two important factors(i.e.,popularity and recency)on whether a RS item can be linked to a KB entity.Finally,we compare several knowledge-aware recommendation algorithms on our linked data set.展开更多
Timeline generation is an important research task which can help users to have a quick understanding of the overall evolution of one given topic. Previous methods simply split the time span into fixed, equal time inte...Timeline generation is an important research task which can help users to have a quick understanding of the overall evolution of one given topic. Previous methods simply split the time span into fixed, equal time intervals without studying the role of the evolutionary patterns of the underlying topic in timeline generation. In addition, few of these methods take users' collective interests into considerations to generate timelines. We consider utilizing social media attention to address these two problems due to the facts: 1) social media is an important pool of real users' collective interests; 2) the information cascades generated in it might be good indicators for boundaries of topic phases. Employing Twitter as a basis, we propose to incorporate topic phases and user's collective interests which are learnt from social media into a unified timeline generation algorithm. We construct both one informativeness-oriented and three interestingness-oriented evaluation sets over five topics. We demonstrate that it is very effective to generate both informative and interesting timelines. In addition, our idea naturally leads to a novel presen- tation of timelines, i.e., phase based timelines, which can potentially improve user experience.展开更多
Detecting and using bursty pattems to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to dete...Detecting and using bursty pattems to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.展开更多
文摘Question answering (QA) over knowledge base (KB) aims to provide a structured answer from a knowledge base to a natural language question. In this task, a key step is how to represent and understand the natural language query. In this paper, we propose to use tree-structured neural networks constructed based on the constituency tree to model natural language queries. We identify an interesting observation in the constituency tree: different constituents have their own semantic characteristics and might be suitable to solve different subtasks in a QA system. Based on this point, we incorporate the type information as an auxiliary supervision signal to improve the QA performance. We call our approach type-aware QA. We jointly characterize both the answer and its answer type in a unified neural network model with the attention mechanism. Instead of simply using the root representation, we represent the query by combining the representations of different constituents using task-specific attention weights. Extensive experiments on public datasets have demonstrated the effectiveness of our proposed model. More specially, the learned attention weights are quite useful in understanding the query. The produced representations for intermediate nodes can be used for analyzing the effectiveness of components in a QA system.
基金The work was partially supported by National Natural Science Foundation of China under the grant numbers 61872369,61832017 and 61502502.
文摘To develop a knowledge-aware recommender system,a key issue is how to obtain rich and structured knowledge base(KB)information for recommender system(RS)items.Existing data sets or methods either use side information from original RSs(containing very few kinds of useful information)or utilize a private KB.In this paper,we present KB4Rec v1.0,a data set linking KB information for RSs.It has linked three widely used RS data sets with two popular KBs,namely Freebase and YAGO.Based on our linked data set,we first preform qualitative analysis experiments,and then we discuss the effect of two important factors(i.e.,popularity and recency)on whether a RS item can be linked to a KB entity.Finally,we compare several knowledge-aware recommendation algorithms on our linked data set.
文摘Timeline generation is an important research task which can help users to have a quick understanding of the overall evolution of one given topic. Previous methods simply split the time span into fixed, equal time intervals without studying the role of the evolutionary patterns of the underlying topic in timeline generation. In addition, few of these methods take users' collective interests into considerations to generate timelines. We consider utilizing social media attention to address these two problems due to the facts: 1) social media is an important pool of real users' collective interests; 2) the information cascades generated in it might be good indicators for boundaries of topic phases. Employing Twitter as a basis, we propose to incorporate topic phases and user's collective interests which are learnt from social media into a unified timeline generation algorithm. We construct both one informativeness-oriented and three interestingness-oriented evaluation sets over five topics. We demonstrate that it is very effective to generate both informative and interesting timelines. In addition, our idea naturally leads to a novel presen- tation of timelines, i.e., phase based timelines, which can potentially improve user experience.
基金Acknowledgements The authors thank the anonymous reviewers for their valuable and constructive comments. The work was partially supported by the National Natural Science Foundation of China (Grant No. 61502502), the National Basic Research Program (973 Program) of China (2014CB340403), Beijing Natural Science Foundation (4162032), and the Open Fund of Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, North China University of Technology, China.
文摘Detecting and using bursty pattems to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.