Knowledge Bases (KBs) are valuable resources of human knowledge which contribute to many applications. However, since they are manually maintained, there is a big lag between their contents and the upto-date informa...Knowledge Bases (KBs) are valuable resources of human knowledge which contribute to many applications. However, since they are manually maintained, there is a big lag between their contents and the upto-date information of entities. Considering a target entity in KBs, this paper investigates how Cumulative Citation Recommendation (CCR) can be used to effectively detect its worthy-citation documents in large volumes of stream data. Most global relevant models only consider semantic and temporat features of entity-document instances, which does not sufficiently exploit prior knowledge underlying entity-document instances. To tackle this problem, we present a Mixture of Experts (ME) model by introducing a latent layer to capture relationships between the entity-document instances and their latent class information. An extensive set of experiments was conducted on TREC-KBA-2013 dataset. The results show that the model can significantly achieve a better performance gain compared to state-of-the-art models in CCR.展开更多
It is widely common that mobile applications collect non-critical personally identifiable information(PII)from users'devices to the cloud by application service providers(ASPs)in a positive manner to provide preci...It is widely common that mobile applications collect non-critical personally identifiable information(PII)from users'devices to the cloud by application service providers(ASPs)in a positive manner to provide precise and recommending services.Meanwhile,Internet service providers(ISPs)or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services.However,it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack.In this paper,we address this challenge by presenting an efficient and light-weight approach,namely TPII,which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics.This approach only collects three features from HTTP fields as users'behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately.Without any priori knowledge,TPII can identify any types of PIIs from any mobile applications,which has a broad vision of applications.We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users.The experimental results show that the precision and recall of TPII are 91.72%and 94.51%respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour,reaching near to support 1Gbps wirespeed inspection in practice.Our approach provides network service providers a practical way to collect PIIs for better services.展开更多
Intuitively, not only do ratings include abundant information for learning user preferences, but also reviews accompanied by ratings. However, most existing recommender systems take rating scores for granted and disca...Intuitively, not only do ratings include abundant information for learning user preferences, but also reviews accompanied by ratings. However, most existing recommender systems take rating scores for granted and discard the wealth of information in accompanying reviews. In this paper, in order to exploit user profiles’ information embedded in both ratings and reviews exhaustively, we propose a Bayesian model that links a traditional Collaborative Filtering(CF) technique with a topic model seamlessly. By employing a topic model with the review text and aligning user review topics with 'user attitudes'(i.e., abstract rating patterns) over the same distribution, our method achieves greater accuracy than the traditional approach on the rating prediction task. Moreover, with review text information involved, latent user rating attitudes are interpretable and 'cold-start' problem can be alleviated.This property qualifies our method for serving as a 'recommender' task with very sparse datasets. Furthermore,unlike most related works, we treat each review as a document, not all reviews of each user or item together as one document, to fully exploit the reviews’ information. Experimental results on 25 real-world datasets demonstrate the superiority of our model over state-of-the-art methods.展开更多
基金supported by the National Key Research and Development Program of China(No.2016YFB1000902)the National Natural Science Foundation of China(Nos.61472040,61751217,and 61866038)+1 种基金Natural Science Basic Research Plan in Shaanxi Province of China(No.2016JM6082)PhD start project of Yan’an University(No.YDBK2018-09)
文摘Knowledge Bases (KBs) are valuable resources of human knowledge which contribute to many applications. However, since they are manually maintained, there is a big lag between their contents and the upto-date information of entities. Considering a target entity in KBs, this paper investigates how Cumulative Citation Recommendation (CCR) can be used to effectively detect its worthy-citation documents in large volumes of stream data. Most global relevant models only consider semantic and temporat features of entity-document instances, which does not sufficiently exploit prior knowledge underlying entity-document instances. To tackle this problem, we present a Mixture of Experts (ME) model by introducing a latent layer to capture relationships between the entity-document instances and their latent class information. An extensive set of experiments was conducted on TREC-KBA-2013 dataset. The results show that the model can significantly achieve a better performance gain compared to state-of-the-art models in CCR.
基金supported by the National Natural Science Foundation of China(Grant Nos.61672101,U1636119.6186603S,61962059)2018 College Students’Innovation and Entrepreneurship Training Program(D2018127)。
文摘It is widely common that mobile applications collect non-critical personally identifiable information(PII)from users'devices to the cloud by application service providers(ASPs)in a positive manner to provide precise and recommending services.Meanwhile,Internet service providers(ISPs)or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services.However,it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack.In this paper,we address this challenge by presenting an efficient and light-weight approach,namely TPII,which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics.This approach only collects three features from HTTP fields as users'behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately.Without any priori knowledge,TPII can identify any types of PIIs from any mobile applications,which has a broad vision of applications.We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users.The experimental results show that the precision and recall of TPII are 91.72%and 94.51%respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour,reaching near to support 1Gbps wirespeed inspection in practice.Our approach provides network service providers a practical way to collect PIIs for better services.
基金supported by the National Key Basic Research and Development (973) Program of China (No. 2013CB329600)the National Natural Science Foundation of China (Nos. 61472040 and 60873237)Beijing Higher Education Young Elite Teacher Project (No. YETP1198)
文摘Intuitively, not only do ratings include abundant information for learning user preferences, but also reviews accompanied by ratings. However, most existing recommender systems take rating scores for granted and discard the wealth of information in accompanying reviews. In this paper, in order to exploit user profiles’ information embedded in both ratings and reviews exhaustively, we propose a Bayesian model that links a traditional Collaborative Filtering(CF) technique with a topic model seamlessly. By employing a topic model with the review text and aligning user review topics with 'user attitudes'(i.e., abstract rating patterns) over the same distribution, our method achieves greater accuracy than the traditional approach on the rating prediction task. Moreover, with review text information involved, latent user rating attitudes are interpretable and 'cold-start' problem can be alleviated.This property qualifies our method for serving as a 'recommender' task with very sparse datasets. Furthermore,unlike most related works, we treat each review as a document, not all reviews of each user or item together as one document, to fully exploit the reviews’ information. Experimental results on 25 real-world datasets demonstrate the superiority of our model over state-of-the-art methods.