With the wide applications of sensor network technology in traffic information acquisition systems,a new measure will be quite necessary to evaluate spatially related properties of traffic information credibility.The ...With the wide applications of sensor network technology in traffic information acquisition systems,a new measure will be quite necessary to evaluate spatially related properties of traffic information credibility.The heterogeneity of spatial distribution of information credibility from sensor networks is analyzed and a new measure,information credibility function(ICF),is proposed to describe this heterogeneity.Three possible functional forms of sensor ICF and their corresponding expressions are presented.Then,two feasible operations of spatial superposition of sensor ICFs are discussed.Finally,a numerical example is introduced to show the calibration method of sensor ICF and obtain the spatially related properties of expressway in Beijing.The results show that the sensor ICF of expressway in Beijing possesses a negative exponent property.The traffic information is more abundant at or near the locations of sensor,while with the distance away from the sensor increasing,the traffic information credibility will be declined by an exponential trend.The new measure provides theoretical bases for the optimal locations of traffic sensor networks and the mechanism research of spatial distribution of traffic information credibility.展开更多
In this paper, we propose to detect a special group of microblog users: the "marionette" users, who are created or employed by backstage "puppeteers", either through programs or manually. Unlike normal users that...In this paper, we propose to detect a special group of microblog users: the "marionette" users, who are created or employed by backstage "puppeteers", either through programs or manually. Unlike normal users that access microblog for information sharing or social communication, the marionette users perform specific tasks to earn financial profits. For example, they follow certain users to increase their "statistical popularity", or retweet some tweets to amplify their "statistical impact". The fabricated follower or retweet counts not only mislead normal users to wrong information, but also seriously impair microblog-based applications, such as hot tweets selection and expert finding. In this paper, we study the important problem of detecting marionette users on microblog platforms. This problem is challenging because puppeteers are employing complicated strategies to generate marionette users that present similar behaviors as normal users. To tackle this challenge, we propose to take into account two types of discriminative information: 1) individual user tweeting behavior and 2) the social interactions among users. By integrating both information into a semi-supervised probabilistic model, we can effectively distinguish marionette users from normal ones. By applying the proposed model to one of the most popular microblog platforms (Sina Weibo) in China, we find that the model can detect marionette users with F-measure close to 0.9. In addition, we apply the proposed model to calculate the marionette ratio of the top 200 most followed microbloggers and the top 50 most retweeted posts in Sina Weibo. To accelerate the detecting speed and reduce feature generation cost, we further propose a light-weight model which utilizes fewer features to identify marionettes from retweeters.展开更多
Domain name system(DNS),as one of the most critical internet infrastructure,has been abused by various cyber attacks.Current malicious domain detection capabilities are limited by insufficient credible label informati...Domain name system(DNS),as one of the most critical internet infrastructure,has been abused by various cyber attacks.Current malicious domain detection capabilities are limited by insufficient credible label information,severe class imbalance,and incompact distribution of domain samples in different malicious activities.This paper proposes a malicious domain detection framework named PUMD,which innovatively introduces Positive and Unlabeled(PU)learning solution to solve the problem of insuffcient label information,adopts customized sample weight to improve the impact of class imbalance,and effectively constructs evidence features based on resource overlapping to reduce the intra-class distance of malicious samples.Besides,a feature selection strategy based on permutation importance and binning is proposed to screen the most informative detection features.Finally,we conduct experiments on the open source real DNS traffic dataset provided by QI-ANXIN Technology Group to evaluate the PUMD framework's abil-ity to capture potential command and control(C&C)domains for malicious activities.The experimental results prove that PUMD can achieve the best detection performance under different label frequencies and class imbalance ratios.展开更多
基金Project(61104164)supported by the National Natural Science Foundation of ChinaProject(2012AA112401)supported by the National High Technology Research and Development Program of ChinaProject(2012YJS059)supported by the Fundamental Research Funds for the Central Universities of China
文摘With the wide applications of sensor network technology in traffic information acquisition systems,a new measure will be quite necessary to evaluate spatially related properties of traffic information credibility.The heterogeneity of spatial distribution of information credibility from sensor networks is analyzed and a new measure,information credibility function(ICF),is proposed to describe this heterogeneity.Three possible functional forms of sensor ICF and their corresponding expressions are presented.Then,two feasible operations of spatial superposition of sensor ICFs are discussed.Finally,a numerical example is introduced to show the calibration method of sensor ICF and obtain the spatially related properties of expressway in Beijing.The results show that the sensor ICF of expressway in Beijing possesses a negative exponent property.The traffic information is more abundant at or near the locations of sensor,while with the distance away from the sensor increasing,the traffic information credibility will be declined by an exponential trend.The new measure provides theoretical bases for the optimal locations of traffic sensor networks and the mechanism research of spatial distribution of traffic information credibility.
文摘In this paper, we propose to detect a special group of microblog users: the "marionette" users, who are created or employed by backstage "puppeteers", either through programs or manually. Unlike normal users that access microblog for information sharing or social communication, the marionette users perform specific tasks to earn financial profits. For example, they follow certain users to increase their "statistical popularity", or retweet some tweets to amplify their "statistical impact". The fabricated follower or retweet counts not only mislead normal users to wrong information, but also seriously impair microblog-based applications, such as hot tweets selection and expert finding. In this paper, we study the important problem of detecting marionette users on microblog platforms. This problem is challenging because puppeteers are employing complicated strategies to generate marionette users that present similar behaviors as normal users. To tackle this challenge, we propose to take into account two types of discriminative information: 1) individual user tweeting behavior and 2) the social interactions among users. By integrating both information into a semi-supervised probabilistic model, we can effectively distinguish marionette users from normal ones. By applying the proposed model to one of the most popular microblog platforms (Sina Weibo) in China, we find that the model can detect marionette users with F-measure close to 0.9. In addition, we apply the proposed model to calculate the marionette ratio of the top 200 most followed microbloggers and the top 50 most retweeted posts in Sina Weibo. To accelerate the detecting speed and reduce feature generation cost, we further propose a light-weight model which utilizes fewer features to identify marionettes from retweeters.
基金This research is supported by National Key Research and Development Program of China(Nos.2021YFF0307203,2019QY1300)Youth Innovation Promotion Association CAS(No.2021156),the Strategic Priority Research Program of Chinese Academy of Sciences(No.XDC02040100)National Natural Science Foundation of China(No.61802404).
文摘Domain name system(DNS),as one of the most critical internet infrastructure,has been abused by various cyber attacks.Current malicious domain detection capabilities are limited by insufficient credible label information,severe class imbalance,and incompact distribution of domain samples in different malicious activities.This paper proposes a malicious domain detection framework named PUMD,which innovatively introduces Positive and Unlabeled(PU)learning solution to solve the problem of insuffcient label information,adopts customized sample weight to improve the impact of class imbalance,and effectively constructs evidence features based on resource overlapping to reduce the intra-class distance of malicious samples.Besides,a feature selection strategy based on permutation importance and binning is proposed to screen the most informative detection features.Finally,we conduct experiments on the open source real DNS traffic dataset provided by QI-ANXIN Technology Group to evaluate the PUMD framework's abil-ity to capture potential command and control(C&C)domains for malicious activities.The experimental results prove that PUMD can achieve the best detection performance under different label frequencies and class imbalance ratios.