Graphs are used in various disciplines such as telecommunication,biological networks,as well as social networks.In large-scale networks,it is challenging to detect the communities by learning the distinct properties o...Graphs are used in various disciplines such as telecommunication,biological networks,as well as social networks.In large-scale networks,it is challenging to detect the communities by learning the distinct properties of the graph.As deep learning hasmade contributions in a variety of domains,we try to use deep learning techniques to mine the knowledge from large-scale graph networks.In this paper,we aim to provide a strategy for detecting communities using deep autoencoders and obtain generic neural attention to graphs.The advantages of neural attention are widely seen in the field of NLP and computer vision,which has low computational complexity for large-scale graphs.The contributions of the paper are summarized as follows.Firstly,a transformer is utilized to downsample the first-order proximities of the graph into a latent space,which can result in the structural properties and eventually assist in detecting the communities.Secondly,the fine-tuning task is conducted by tuning variant hyperparameters cautiously,which is applied to multiple social networks(Facebook and Twitch).Furthermore,the objective function(crossentropy)is tuned by L0 regularization.Lastly,the reconstructed model forms communities that present the relationship between the groups.The proposed robust model provides good generalization and is applicable to obtaining not only the community structures in social networks but also the node classification.The proposed graph-transformer shows advanced performance on the social networks with the average NMIs of 0.67±0.04,0.198±0.02,0.228±0.02,and 0.68±0.03 on Wikipedia crocodiles,Github Developers,Twitch England,and Facebook Page-Page networks,respectively.展开更多
Social robot accounts controlled by artificial intelligence or humans are active in social networks,bringing negative impacts to network security and social life.Existing social robot detection methods based on graph ...Social robot accounts controlled by artificial intelligence or humans are active in social networks,bringing negative impacts to network security and social life.Existing social robot detection methods based on graph neural networks suffer from the problem of many social network nodes and complex relationships,which makes it difficult to accurately describe the difference between the topological relations of nodes,resulting in low detection accuracy of social robots.This paper proposes a social robot detection method with the use of an improved neural network.First,social relationship subgraphs are constructed by leveraging the user’s social network to disentangle intricate social relationships effectively.Then,a linear modulated graph attention residual network model is devised to extract the node and network topology features of the social relation subgraph,thereby generating comprehensive social relation subgraph features,and the feature-wise linear modulation module of the model can better learn the differences between the nodes.Next,user text content and behavioral gene sequences are extracted to construct social behavioral features combined with the social relationship subgraph features.Finally,social robots can be more accurately identified by combining user behavioral and relationship features.By carrying out experimental studies based on the publicly available datasets TwiBot-20 and Cresci-15,the suggested method’s detection accuracies can achieve 86.73%and 97.86%,respectively.Compared with the existing mainstream approaches,the accuracy of the proposed method is 2.2%and 1.35%higher on the two datasets.The results show that the method proposed in this paper can effectively detect social robots and maintain a healthy ecological environment of social networks.展开更多
Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spare messages detection. Ins...Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spare messages detection. Instead of conventional methods that focus on keywords or flow rate filtering, our system is based on mining under a more robust structure: the social network constructed with SMS. Several features, including static features, dynamic features and graph features, are proposed for describing activities of nodes in the network in various ways. Experimental results operated on real dataset prove the validity of our approach.展开更多
With the development of social media and the prevalence of mobile devices,an increasing number of people tend to use social media platforms to express their opinions and attitudes,leading to many online controversies....With the development of social media and the prevalence of mobile devices,an increasing number of people tend to use social media platforms to express their opinions and attitudes,leading to many online controversies.These online controversies can severely threaten social stability,making automatic detection of controversies particularly necessary.Most controversy detection methods currently focus on mining features from text semantics and propagation structures.However,these methods have two drawbacks:1)limited ability to capture structural features and failure to learn deeper structural features,and 2)neglecting the influence of topic information and ineffective utilization of topic features.In light of these phenomena,this paper proposes a social media controversy detection method called Dual Feature Enhanced Graph Convolutional Network(DFE-GCN).This method explores structural information at different scales from global and local perspectives to capture deeper structural features,enhancing the expressive power of structural features.Furthermore,to strengthen the influence of topic information,this paper utilizes attention mechanisms to enhance topic features after each graph convolutional layer,effectively using topic information.We validated our method on two different public datasets,and the experimental results demonstrate that our method achieves state-of-the-art performance compared to baseline methods.On the Weibo and Reddit datasets,the accuracy is improved by 5.92%and 3.32%,respectively,and the F1 score is improved by 1.99%and 2.17%,demonstrating the positive impact of enhanced structural features and topic features on controversy detection.展开更多
Malicious social robots are the disseminators of malicious information on social networks,which seriously affect information security and network environments.Efficient and reliable classification of social robots is ...Malicious social robots are the disseminators of malicious information on social networks,which seriously affect information security and network environments.Efficient and reliable classification of social robots is crucial for detecting information manipulation in social networks.Supervised classification based on manual feature extraction has been widely used in social robot detection.However,these methods not only involve the privacy of users but also ignore hidden feature information,especially the graph feature,and the label utilization rate of semi-supervised algorithms is low.Aiming at the problems of shallow feature extraction and low label utilization rate in existing social network robot detection methods,in this paper a robot detection scheme based on weighted network topology is proposed,which introduces an improved network representation learning algorithm to extract the local structure features of the network,and combined with the graph convolution network(GCN)algorithm based on the graph filter,to obtain the global structure features of the network.An end-to-end semi-supervised combination model(Semi-GSGCN)is established to detect malicious social robots.Experiments on a social network dataset(cresci-rtbust-2019)show that the proposed method has high versatility and effectiveness in detecting social robots.In addition,this method has a stronger insight into robots in social networks than other methods.展开更多
Spammer detection is to identify and block malicious activities performing users.Such users should be identified and terminated from social media to keep the social media process organic and to maintain the integrity ...Spammer detection is to identify and block malicious activities performing users.Such users should be identified and terminated from social media to keep the social media process organic and to maintain the integrity of online social spaces.Previous research aimed to find spammers based on hybrid approaches of graph mining,posted content,and metadata,using small and manually labeled datasets.However,such hybrid approaches are unscalable,not robust,particular dataset dependent,and require numerous parameters,complex graphs,and natural language processing(NLP)resources to make decisions,which makes spammer detection impractical for real-time detection.For example,graph mining requires neighbors’information,posted content-based approaches require multiple tweets from user profiles,then NLP resources to make decisions that are not applicable in a real-time environment.To fill the gap,firstly,we propose a REal-time Metadata based Spammer detection(REMS)model based on only metadata features to identify spammers,which takes the least number of parameters and provides adequate results.REMS is a scalable and robust model that uses only 19 metadata features of Twitter users to induce 73.81%F1-Score classification accuracy using a balanced training dataset(50%spam and 50%genuine users).The 19 features are 8 original and 11 derived features from the original features of Twitter users,identified with extensive experiments and analysis.Secondly,we present the largest and most diverse dataset of published research,comprising 211 K spam users and 1 million genuine users.The diversity of the dataset can be measured as it comprises users who posted 2.1 million Tweets on seven topics(100 hashtags)from 6 different geographical locations.The REMS’s superior classification performance with multiple machine and deep learning methods indicates that only metadata features have the potential to identify spammers rather than focusing on volatile posted content and complex graph structures.Dataset and REMS’s codes are available on GitHub(www.github.com/mhadnanali/REMS).展开更多
In this paper, we try to systematically study how to perform doctor recommendation in medical social net- works (MSNs). Specifically, employing a real-world medical dataset as the source in our work, we propose iBol...In this paper, we try to systematically study how to perform doctor recommendation in medical social net- works (MSNs). Specifically, employing a real-world medical dataset as the source in our work, we propose iBole, a novel hybrid multi-layer architecture, to solve this problem. First, we mine doctor-patient relationships/ties via a time-constraint probability factor graph model (TPFG). Second, we extract network features for ranking nodes. Finally, we propose RWR- Model, a doctor recommendation model via the random walk with restart method. Our real-world experiments validate the effectiveness of the proposed methods. Experimental results show that we obtain good accuracy in mining doctor-patient relationships from the network, and the doctor recommendation performance is better than that of the baseline algorithms: traditional Ranking SVM (RSVM) and the individual doctor recommendation model (IDR-Model). The results of our RWR-Model are more reasonable and satisfactory than those of the baseline approaches.展开更多
In industrial control systems,the utilization of deep learning based methods achieves improvements for anomaly detection.However,most current methods ignore the association of inner components in industrial control sy...In industrial control systems,the utilization of deep learning based methods achieves improvements for anomaly detection.However,most current methods ignore the association of inner components in industrial control systems.In industrial control systems,an anomaly component may affect the neighboring components;therefore,the connective relationship can help us to detect anomalies effectively.In this paper,we propose a centrality-aware graph convolution network(CAGCN)for anomaly detection in industrial control systems.Unlike the traditional graph convolution network(GCN)model,we utilize the concept of centrality to enhance the ability of graph convolution networks to deal with the inner relationship in industrial control systems.Our experiments show that compared with GCN,our CAGCN has a better ability to utilize this relationship between components in industrial control systems.The performances of the model are evaluated on the Secure Water Treatment(SWaT)dataset and the Water Distribution(WADI)dataset,the two most common industrial control systems datasets in the field of industrial anomaly detection.The experimental results show that our CAGCN achieves better results on precision,recall,and F1 score than the state-of-the-art methods.展开更多
The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data gener...The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.展开更多
This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core...This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core of this mining system are four sets of parallel graphmining algorithms programmed in the BSP parallel model and one set of data extractiontransformationload ing (ETE) algorithms implemented in MapReduce. To invoke these algorithm sets, we designed a workflow engine which optimized for cloud computing. Finally, a welldesigned data management function enables users to view, delete and input data in the Ha doop distributed file system (HDFS). Experiments on artificial data show that the components of graphmining algorithm in MBGM are efficient.展开更多
Community based churn prediction,or the assignment of recognising the influence of a customer’s community in churn prediction has become an important concern for firms in many different industries.While churn predi...Community based churn prediction,or the assignment of recognising the influence of a customer’s community in churn prediction has become an important concern for firms in many different industries.While churn prediction until recent times have focused only on transactional dataset(targeted approach),the untargeted approach through product advisement,digital marketing and expressions in customer’s opinion on the social media like Twitter,have not been fully harnessed.Although this data source has become an important influencing factor with lasting impact on churn management.Since Social Network Analysis(SNA)has become a blended approach for churn prediction and management in modern era,customers residing online predominantly and collectively decide and determines the momentum of churn prediction,retention and decision support.In existing SNA approaches,customers are classified as churner or non-churner(1 or 0).Oftentimes,the customer’s opinion is also neglected and the network structure of community members are not exploited.Consequently,the pattern and influential abilities of customers’opinion on relative members of the community are not analysed.Thus,the research developed a Churn Service Information Graph(CSIG)to define a quadruple churn category(churner,potential churner,inertia customer,premium customer)for non-opinionated customers via the power of relative affinity around opinionated customers on a direct node to node SNA.The essence is to use data mining technique to investigate the patterns of opinion between people in a network or group.Consequently,every member of the online social network community is dynamically classified into a churn category for an improved targeted customer acquisition,retention and/or decision supports in churn management.展开更多
Rumor detection has become an emerging and active research field in recent years.At the core is to model the rumor characteristics inherent in rich information,such as propagation patterns in social network and semant...Rumor detection has become an emerging and active research field in recent years.At the core is to model the rumor characteristics inherent in rich information,such as propagation patterns in social network and semantic patterns in post content,and differentiate them from the truth.However,existing works on rumor detection fall short in modeling heterogeneous information,either using one single information source only(e.g.,social network,or post content)or ignoring the relations among multiple sources(e.g.,fusing social and content features via simple concatenation).Therefore,they possibly have drawbacks in comprehensively understanding the rumors,and detecting them accurately.In this work,we explore contrastive self-supervised learning on heterogeneous information sources,so as to reveal their relations and characterize rumors better.Technically,we supplement the main supervised task of detection with an auxiliary self-supervised task,which enriches post representations via post self-discrimination.Specifically,given two heterogeneous views of a post(i.e.,representations encoding social patterns and semantic patterns),the discrimination is done by maximizing the mutual information between different views of the same post compared to that of other posts.We devise cluster-wise and instance-wise approaches to generate the views and conduct the discrimination,considering different relations of information sources.We term this framework as self-supervised rumor detection(SRD).Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.展开更多
基金The research is funded by the Researchers Supporting Project at King Saud University(Project#RSP-2021/305).
文摘Graphs are used in various disciplines such as telecommunication,biological networks,as well as social networks.In large-scale networks,it is challenging to detect the communities by learning the distinct properties of the graph.As deep learning hasmade contributions in a variety of domains,we try to use deep learning techniques to mine the knowledge from large-scale graph networks.In this paper,we aim to provide a strategy for detecting communities using deep autoencoders and obtain generic neural attention to graphs.The advantages of neural attention are widely seen in the field of NLP and computer vision,which has low computational complexity for large-scale graphs.The contributions of the paper are summarized as follows.Firstly,a transformer is utilized to downsample the first-order proximities of the graph into a latent space,which can result in the structural properties and eventually assist in detecting the communities.Secondly,the fine-tuning task is conducted by tuning variant hyperparameters cautiously,which is applied to multiple social networks(Facebook and Twitch).Furthermore,the objective function(crossentropy)is tuned by L0 regularization.Lastly,the reconstructed model forms communities that present the relationship between the groups.The proposed robust model provides good generalization and is applicable to obtaining not only the community structures in social networks but also the node classification.The proposed graph-transformer shows advanced performance on the social networks with the average NMIs of 0.67±0.04,0.198±0.02,0.228±0.02,and 0.68±0.03 on Wikipedia crocodiles,Github Developers,Twitch England,and Facebook Page-Page networks,respectively.
基金This work was supported in part by the National Natural Science Foundation of China under Grants 62273272,62303375 and 61873277in part by the Key Research and Development Program of Shaanxi Province under Grant 2023-YBGY-243+2 种基金in part by the Natural Science Foundation of Shaanxi Province under Grants 2022JQ-606 and 2020-JQ758in part by the Research Plan of Department of Education of Shaanxi Province under Grant 21JK0752in part by the Youth Innovation Team of Shaanxi Universities.
文摘Social robot accounts controlled by artificial intelligence or humans are active in social networks,bringing negative impacts to network security and social life.Existing social robot detection methods based on graph neural networks suffer from the problem of many social network nodes and complex relationships,which makes it difficult to accurately describe the difference between the topological relations of nodes,resulting in low detection accuracy of social robots.This paper proposes a social robot detection method with the use of an improved neural network.First,social relationship subgraphs are constructed by leveraging the user’s social network to disentangle intricate social relationships effectively.Then,a linear modulated graph attention residual network model is devised to extract the node and network topology features of the social relation subgraph,thereby generating comprehensive social relation subgraph features,and the feature-wise linear modulation module of the model can better learn the differences between the nodes.Next,user text content and behavioral gene sequences are extracted to construct social behavioral features combined with the social relationship subgraph features.Finally,social robots can be more accurately identified by combining user behavioral and relationship features.By carrying out experimental studies based on the publicly available datasets TwiBot-20 and Cresci-15,the suggested method’s detection accuracies can achieve 86.73%and 97.86%,respectively.Compared with the existing mainstream approaches,the accuracy of the proposed method is 2.2%and 1.35%higher on the two datasets.The results show that the method proposed in this paper can effectively detect social robots and maintain a healthy ecological environment of social networks.
基金supported by the National Natural Science Foundation of China under Grant No. 60873158the National Basic Research 973 Program of China under Grant No. 2010CB327902+1 种基金the Fundamental Research Funds for the Central Universities of Chinathe Opening Funding of the State Key Laboratory of Virtual Reality Technology and Systems of China
文摘Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spare messages detection. Instead of conventional methods that focus on keywords or flow rate filtering, our system is based on mining under a more robust structure: the social network constructed with SMS. Several features, including static features, dynamic features and graph features, are proposed for describing activities of nodes in the network in various ways. Experimental results operated on real dataset prove the validity of our approach.
基金funded by the Natural Science Foundation of China Grant No.202204120017the Autonomous Region Science and Technology Program Grant No.2022B01008-2the Autonomous Region Science and Technology Program Grant No.2020A02001-1.
文摘With the development of social media and the prevalence of mobile devices,an increasing number of people tend to use social media platforms to express their opinions and attitudes,leading to many online controversies.These online controversies can severely threaten social stability,making automatic detection of controversies particularly necessary.Most controversy detection methods currently focus on mining features from text semantics and propagation structures.However,these methods have two drawbacks:1)limited ability to capture structural features and failure to learn deeper structural features,and 2)neglecting the influence of topic information and ineffective utilization of topic features.In light of these phenomena,this paper proposes a social media controversy detection method called Dual Feature Enhanced Graph Convolutional Network(DFE-GCN).This method explores structural information at different scales from global and local perspectives to capture deeper structural features,enhancing the expressive power of structural features.Furthermore,to strengthen the influence of topic information,this paper utilizes attention mechanisms to enhance topic features after each graph convolutional layer,effectively using topic information.We validated our method on two different public datasets,and the experimental results demonstrate that our method achieves state-of-the-art performance compared to baseline methods.On the Weibo and Reddit datasets,the accuracy is improved by 5.92%and 3.32%,respectively,and the F1 score is improved by 1.99%and 2.17%,demonstrating the positive impact of enhanced structural features and topic features on controversy detection.
基金This research was funded by the National Key R&D Program of China[Grant Number 2017YFB0802703]Beijing Natural Science Foundation[Grant Number 4202002]+1 种基金the research project of the Department of Computer Science in BJUT[Grant Number 2019JSJKY004]Beijing Municipal Postdoc Science Foundation[No Grant Number]and Beijing Chaoyang District Postdoc Science Foundation[No Grant Number].
文摘Malicious social robots are the disseminators of malicious information on social networks,which seriously affect information security and network environments.Efficient and reliable classification of social robots is crucial for detecting information manipulation in social networks.Supervised classification based on manual feature extraction has been widely used in social robot detection.However,these methods not only involve the privacy of users but also ignore hidden feature information,especially the graph feature,and the label utilization rate of semi-supervised algorithms is low.Aiming at the problems of shallow feature extraction and low label utilization rate in existing social network robot detection methods,in this paper a robot detection scheme based on weighted network topology is proposed,which introduces an improved network representation learning algorithm to extract the local structure features of the network,and combined with the graph convolution network(GCN)algorithm based on the graph filter,to obtain the global structure features of the network.An end-to-end semi-supervised combination model(Semi-GSGCN)is established to detect malicious social robots.Experiments on a social network dataset(cresci-rtbust-2019)show that the proposed method has high versatility and effectiveness in detecting social robots.In addition,this method has a stronger insight into robots in social networks than other methods.
基金supported by the Guangzhou Government Project(Grant No.62216235)the National Natural Science Foundation of China(Grant Nos.61573328,622260-1).
文摘Spammer detection is to identify and block malicious activities performing users.Such users should be identified and terminated from social media to keep the social media process organic and to maintain the integrity of online social spaces.Previous research aimed to find spammers based on hybrid approaches of graph mining,posted content,and metadata,using small and manually labeled datasets.However,such hybrid approaches are unscalable,not robust,particular dataset dependent,and require numerous parameters,complex graphs,and natural language processing(NLP)resources to make decisions,which makes spammer detection impractical for real-time detection.For example,graph mining requires neighbors’information,posted content-based approaches require multiple tweets from user profiles,then NLP resources to make decisions that are not applicable in a real-time environment.To fill the gap,firstly,we propose a REal-time Metadata based Spammer detection(REMS)model based on only metadata features to identify spammers,which takes the least number of parameters and provides adequate results.REMS is a scalable and robust model that uses only 19 metadata features of Twitter users to induce 73.81%F1-Score classification accuracy using a balanced training dataset(50%spam and 50%genuine users).The 19 features are 8 original and 11 derived features from the original features of Twitter users,identified with extensive experiments and analysis.Secondly,we present the largest and most diverse dataset of published research,comprising 211 K spam users and 1 million genuine users.The diversity of the dataset can be measured as it comprises users who posted 2.1 million Tweets on seven topics(100 hashtags)from 6 different geographical locations.The REMS’s superior classification performance with multiple machine and deep learning methods indicates that only metadata features have the potential to identify spammers rather than focusing on volatile posted content and complex graph structures.Dataset and REMS’s codes are available on GitHub(www.github.com/mhadnanali/REMS).
基金the the National High Technology Research and Development 863 Program of China under Grant No. 2015AA124102, the Hebei Natural Science Foundation of China under Grant No. F2015203280, and the National Natural Science Foundation of China under Grant Nos. 61303130, 61272466, and 61303233.
文摘In this paper, we try to systematically study how to perform doctor recommendation in medical social net- works (MSNs). Specifically, employing a real-world medical dataset as the source in our work, we propose iBole, a novel hybrid multi-layer architecture, to solve this problem. First, we mine doctor-patient relationships/ties via a time-constraint probability factor graph model (TPFG). Second, we extract network features for ranking nodes. Finally, we propose RWR- Model, a doctor recommendation model via the random walk with restart method. Our real-world experiments validate the effectiveness of the proposed methods. Experimental results show that we obtain good accuracy in mining doctor-patient relationships from the network, and the doctor recommendation performance is better than that of the baseline algorithms: traditional Ranking SVM (RSVM) and the individual doctor recommendation model (IDR-Model). The results of our RWR-Model are more reasonable and satisfactory than those of the baseline approaches.
基金supported by the Chinese Academy of Sciences through the Strategic Priority Research Program under Grant No.XDC02020400.
文摘In industrial control systems,the utilization of deep learning based methods achieves improvements for anomaly detection.However,most current methods ignore the association of inner components in industrial control systems.In industrial control systems,an anomaly component may affect the neighboring components;therefore,the connective relationship can help us to detect anomalies effectively.In this paper,we propose a centrality-aware graph convolution network(CAGCN)for anomaly detection in industrial control systems.Unlike the traditional graph convolution network(GCN)model,we utilize the concept of centrality to enhance the ability of graph convolution networks to deal with the inner relationship in industrial control systems.Our experiments show that compared with GCN,our CAGCN has a better ability to utilize this relationship between components in industrial control systems.The performances of the model are evaluated on the Secure Water Treatment(SWaT)dataset and the Water Distribution(WADI)dataset,the two most common industrial control systems datasets in the field of industrial anomaly detection.The experimental results show that our CAGCN achieves better results on precision,recall,and F1 score than the state-of-the-art methods.
基金supported by the National Key Basic Research and Department (973) Program of China (No. 2013CB329603)the National Natural Science Foundation of China (Nos. 61074128, 61375058, and 71231002)
文摘The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.
基金supported by ZTE Industry-Academia-Research Cooperaton Funds
文摘This paper proposes an analytical mining tool for big graph data based on MapReduce and bulk synchronous parallel (BSP) com puting model. The tool is named Mapreduce and BSP based Graphmining tool (MBGM). The core of this mining system are four sets of parallel graphmining algorithms programmed in the BSP parallel model and one set of data extractiontransformationload ing (ETE) algorithms implemented in MapReduce. To invoke these algorithm sets, we designed a workflow engine which optimized for cloud computing. Finally, a welldesigned data management function enables users to view, delete and input data in the Ha doop distributed file system (HDFS). Experiments on artificial data show that the components of graphmining algorithm in MBGM are efficient.
文摘Community based churn prediction,or the assignment of recognising the influence of a customer’s community in churn prediction has become an important concern for firms in many different industries.While churn prediction until recent times have focused only on transactional dataset(targeted approach),the untargeted approach through product advisement,digital marketing and expressions in customer’s opinion on the social media like Twitter,have not been fully harnessed.Although this data source has become an important influencing factor with lasting impact on churn management.Since Social Network Analysis(SNA)has become a blended approach for churn prediction and management in modern era,customers residing online predominantly and collectively decide and determines the momentum of churn prediction,retention and decision support.In existing SNA approaches,customers are classified as churner or non-churner(1 or 0).Oftentimes,the customer’s opinion is also neglected and the network structure of community members are not exploited.Consequently,the pattern and influential abilities of customers’opinion on relative members of the community are not analysed.Thus,the research developed a Churn Service Information Graph(CSIG)to define a quadruple churn category(churner,potential churner,inertia customer,premium customer)for non-opinionated customers via the power of relative affinity around opinionated customers on a direct node to node SNA.The essence is to use data mining technique to investigate the patterns of opinion between people in a network or group.Consequently,every member of the online social network community is dynamically classified into a churn category for an improved targeted customer acquisition,retention and/or decision supports in churn management.
基金supported by the National Key Research and Development Program of China(2020AAA0106000)the National Natural Science Foundation of China(Grant Nos.U21B2026,62121002)the CCCD Key Lab of Ministry of Culture and Tourism.
文摘Rumor detection has become an emerging and active research field in recent years.At the core is to model the rumor characteristics inherent in rich information,such as propagation patterns in social network and semantic patterns in post content,and differentiate them from the truth.However,existing works on rumor detection fall short in modeling heterogeneous information,either using one single information source only(e.g.,social network,or post content)or ignoring the relations among multiple sources(e.g.,fusing social and content features via simple concatenation).Therefore,they possibly have drawbacks in comprehensively understanding the rumors,and detecting them accurately.In this work,we explore contrastive self-supervised learning on heterogeneous information sources,so as to reveal their relations and characterize rumors better.Technically,we supplement the main supervised task of detection with an auxiliary self-supervised task,which enriches post representations via post self-discrimination.Specifically,given two heterogeneous views of a post(i.e.,representations encoding social patterns and semantic patterns),the discrimination is done by maximizing the mutual information between different views of the same post compared to that of other posts.We devise cluster-wise and instance-wise approaches to generate the views and conduct the discrimination,considering different relations of information sources.We term this framework as self-supervised rumor detection(SRD).Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.