Community detection is an important methodology for understanding the intrinsic structure and function of a realworld network. In this paper, we propose an effective and efficient algorithm, called Dominant Label Prop...Community detection is an important methodology for understanding the intrinsic structure and function of a realworld network. In this paper, we propose an effective and efficient algorithm, called Dominant Label Propagation Algorithm(Abbreviated as DLPA), to detect communities in complex networks. The algorithm simulates a special voting process to detect overlapping and non-overlapping community structure in complex networks simultaneously. Our algorithm is very efficient, since its computational complexity is almost linear to the number of edges in the network. Experimental results on both real-world and synthetic networks show that our algorithm also possesses high accuracies on detecting community structure in networks.展开更多
A large number of community discovery algorithms have been proposed in the last decade. Recently, the sharp increase of network scale has become a great challenge for traditional community discovery algorithms. Label ...A large number of community discovery algorithms have been proposed in the last decade. Recently, the sharp increase of network scale has become a great challenge for traditional community discovery algorithms. Label propagation algorithm is a semi-supervised machine learning method, which has linear time complexity when coping with large scale networks. However, the output result has less stability and the quality of the output communities still remains to be improved. Therefore, we propose a novel coreleader based label propagation algorithm for community detection called CLBLPA. Firstly, we find core leaders of potential community by using a greedy method. Then we utilize the label influence potential to guide the process of label propagation. Thus we can accelerate the convergence of algorithm and improve the stability of the output. Experimental results on synthetic datasets and real networks show that CLBLPA can significantly improve the quality of the output communities.展开更多
Community detection is a fundamental work to analyse the structural and functional properties of complex networks. The label propagation algorithm (LPA) is a near linear time algorithm to find a good community struc...Community detection is a fundamental work to analyse the structural and functional properties of complex networks. The label propagation algorithm (LPA) is a near linear time algorithm to find a good community structure. Despite various subsequent advances, an important issue of this algorithm has not yet been properly addressed. Random update orders within the algorithm severely hamper the stability of the identified community structure. In this paper, we executed the basic label propagation algorithm on networks multiple times, to obtain a set of consensus partitions. Based on these consensus partitions, we created a consensus weighted graph. In this consensus weighted graph, the weight value of the edge was the proportion value that the number of node pairs allocated in the same cluster was divided by the total number of partitions. Then, we introduced consensus weight to indicate the direction of label propagation. In label update steps, by computing the mixing value of consensus weight and label frequency, a node adopted the label which has the maximum mixing value instead of the most frequent one. For extending to different networks, we introduced a proportion parameter to adjust the proportion of consensus weight and label frequency in computing mixing value. Finally, we proposed an approach named the label propagation algorithm with consensus weight (LPAcw), and the experimental results showed that the LPAcw could enhance considerably both the stability and the accuracy of community partitions.展开更多
The increased capacity and availability of the Intemet has led to a wide variety of applications. Intemet traffic characterization and application identification is important for network management. In this paper, bas...The increased capacity and availability of the Intemet has led to a wide variety of applications. Intemet traffic characterization and application identification is important for network management. In this paper, based on detailed flow data collected from the public networks of Intemet Service Providers, we construct a flow graph to model the interactions among users. Considering traffic from different applications, we analyze the community structure of the flow graph in terms of cormmunity size, degree distribution of the community, community overlap, and overlap modularity. The near linear time community detection algorithm in complex networks, the Label Propagation Algorithm (LPA), is extended to the flow graph for application identification. We propose a new initialization and label propagation and update scheme. Experimental results show that the proposed algorithm has high accuracy and efficiency.展开更多
Sparse coding and supervised dictionary learning have rapidly developed in recent years,and achieved impressive performance in image classification. However, there is usually a limited number of labeled training sampl...Sparse coding and supervised dictionary learning have rapidly developed in recent years,and achieved impressive performance in image classification. However, there is usually a limited number of labeled training samples and a huge amount of unlabeled data in practical image classification,which degrades the discrimination of the learned dictionary. How to effectively utilize unlabeled training data and explore the information hidden in unlabeled data has drawn much attention of researchers. In this paper, we propose a novel discriminative semisupervised dictionary learning method using label propagation(SSD-LP). Specifically, we utilize a label propagation algorithm based on class-specific reconstruction errors to accurately estimate the identities of unlabeled training samples, and develop an algorithm for optimizing the discriminative dictionary and discriminative coding vectors simultaneously.Extensive experiments on face recognition, digit recognition, and texture classification demonstrate the effectiveness of the proposed method.展开更多
Knowledge of noun phrase anaphoricity might be profitably exploited in coreference resolution to bypass the resolution of non-anaphoric noun phrases. However, it is surprising to notice that recent attempts to incorpo...Knowledge of noun phrase anaphoricity might be profitably exploited in coreference resolution to bypass the resolution of non-anaphoric noun phrases. However, it is surprising to notice that recent attempts to incorporate automatically acquired anaphoricity information into coreferenee resolution systems have been far from expectation. This paper proposes a global learning method in determining the anaphoricity of noun phrases via a label propagation algorithm to improve learning-based coreference resolution. In order to eliminate the huge computational burden in the label propagation algorithm, we employ the weighted support vectors as the critical instances in the training texts. In addition, two kinds of kernels, i.e instances to represent all the anaphoricity-labeled NP , the feature-based RBF (Radial Basis Function) kernel and the convolution tree kernel with approximate matching, are explored to compute the anaphoricity similarity between two noun phrases. Experiments on the ACE2003 corpus demonstrate the great effectiveness of our method in anaphoricity determination of noun phrases and its application in learning-based coreference resolution.展开更多
The spread of rumors and diseases threatens the development of society,it is of great practical significance to locate propagation source quickly and accurately when rumors or epidemic outbreaks occur.However,the topo...The spread of rumors and diseases threatens the development of society,it is of great practical significance to locate propagation source quickly and accurately when rumors or epidemic outbreaks occur.However,the topological structure of online social network changes with time,which makes it very difficult to locate the propagation source.There are few studies focus on propagation source identification in dynamic networks.However,it is usually necessary to know the propagation model in advance.In this paper the label propagation algorithm is proposed to locate propagation source in temporal network.Then the propagation source was identified by hierarchical processing of dynamic networks and label propagation backwards without any underlying information dissemination model.Different propagation models were applied for comparative experiments on static and dynamic networks.Experimental results verify the effectiveness of the algorithm on temporal networks.展开更多
Numerous works prove that existing neighbor-averaging graph neural networks(GNNs)cannot efficiently catch structure features,and many works show that injecting structure,distance,position,or spatial features can signi...Numerous works prove that existing neighbor-averaging graph neural networks(GNNs)cannot efficiently catch structure features,and many works show that injecting structure,distance,position,or spatial features can significantly improve the performance of GNNs,however,injecting high-level structure and distance into GNNs is an intuitive but untouched idea.This work sheds light on this issue and proposes a scheme to enhance graph attention networks(GATs)by encoding distance and hop-wise structure statistics.Firstly,the hop-wise structure and distributional distance information are extracted based on several hop-wise ego-nets of every target node.Secondly,the derived structure information,distance information,and intrinsic features are encoded into the same vector space and then added together to get initial embedding vectors.Thirdly,the derived embedding vectors are fed into GATs,such as GAT and adaptive graph diffusion network(AGDN)to get the soft labels.Fourthly,the soft labels are fed into correct and smooth(C&S)to conduct label propagation and get final predictions.Experiments show that the distance and hop-wise structures encoding enhanced graph attention networks(DHSEGATs)achieve a competitive result.展开更多
Proteins usually bind together to form complexes, which play an important role in cellular activities. Many graph clustering methods have been proposed to identify protein complexes by finding dense regions in protein...Proteins usually bind together to form complexes, which play an important role in cellular activities. Many graph clustering methods have been proposed to identify protein complexes by finding dense regions in protein-protein interaction networks. We present a novel framework (CPL) that detects protein complexes by propagating labels through interactions in a network, in which labels denote complex identifiers. With proper propagation in CPL, proteins in the same complex will be assigned with the same labels. CPL does not make any strong assumptions about the topological structures of the complexes, as in previous methods. Tile CPL algorithm is tested on several publicly available yeast protein-protein interaction networks and compared with several state-of-the-art methods. The results suggest that CPL performs better than the existing methods. An analysis of the functional homogeneity based on a gene ontology analysis shows that the detected complexes of CPL are highly biologically relevant.展开更多
Image segmentation is one of the most basic tasks in computer vision and remains an initial step of many applications. In this paper, we focus on interactive image segmentation(IIS), often referred to as foreground–b...Image segmentation is one of the most basic tasks in computer vision and remains an initial step of many applications. In this paper, we focus on interactive image segmentation(IIS), often referred to as foreground–background separation or object extraction, guided by user interaction. We provide an overview of the IIS literature by covering more than 150 publications, especially recent works that have not been surveyed before. Moreover, we try to give a comprehensive classification of them according to different viewpoints and present a general and concise comparison of the most recent published works. Furthermore, we survey widely used datasets,evaluation metrics, and available resources in the field of IIS.展开更多
Non-negative Tucker decomposition(NTD) has been developed as a crucial method for non-negative tensor data representation.However, NTD is essentially an unsupervised method and cannot take advantage of label informati...Non-negative Tucker decomposition(NTD) has been developed as a crucial method for non-negative tensor data representation.However, NTD is essentially an unsupervised method and cannot take advantage of label information. In this paper, we claim that the low-dimensional representation extracted by NTD can be treated as the predicted soft-clustering coefficient matrix and can therefore be learned jointly with label propagation in a unified framework. The proposed method can extract the physicallymeaningful and parts-based representation of tensor data in their natural form while fully exploring the potential ability of the given labels with a nearest neighbors graph. In addition, an efficient accelerated proximal gradient(APG) algorithm is developed to solve the optimization problem. Finally, the experimental results on five benchmark image data sets for semi-supervised clustering and classification tasks demonstrate the superiority of this method over state-of-the-art methods.展开更多
Graph clustering,i.e.,partitioning nodes or data points into non-overlapping clusters,can be beneficial in a large varieties of computer vision and machine learning applications.However,main graph clustering schemes,s...Graph clustering,i.e.,partitioning nodes or data points into non-overlapping clusters,can be beneficial in a large varieties of computer vision and machine learning applications.However,main graph clustering schemes,such as spectral clustering,cannot be applied to a large network due to prohibitive computational complexity required.While there exist methods applicable to large networks,these methods do not offer convincing comparisons against known ground truth.For the first time,this work conducts clustering algorithm performance evaluations on large networks(consisting of one million nodes)with ground truth information.Ideas and concepts from game theory are applied towards graph clustering to formulate a new proposed algorithm,Game Theoretical Approach for Clustering(GTAC).This theoretical framework is shown to be a generalization of both the Label Propagation and Louvain methods,offering an additional means of derivation and analysis.GTAC introduces a tuning parameter which allows variable algorithm performance in accordance with application needs.Experimentation shows that these GTAC algorithms offer scalability and tunability towards big data applications.展开更多
Carbon stars and DZ white dwarfs are two types of rare objects in the Galaxy. In this paper, we have applied the label propagation algorithm to search for these two types of stars from Data Release Eight (DR8) of th...Carbon stars and DZ white dwarfs are two types of rare objects in the Galaxy. In this paper, we have applied the label propagation algorithm to search for these two types of stars from Data Release Eight (DR8) of the Sloan Digital Sky Survey (SDSS), which is verified to be efficient by calculating precision and recall. From nearly two million spectra including stars, galaxies and QSOs, we have found 260 new carbon stars in which 96 stars have been identified as dwarfs and 7 identified as giants, and 11 composition spectrum systems (each of them consists of a white dwarf and a carbon star). Similarly, using the label propagation method, we have obtained 29 new DZ white dwarfs from SDSS DR8. Compared with PCA reconstructed spectra, the 29 findings are typical DZ white dwarfs. We have also investigated their proper motions by comparing them with proper motion distribution of 9,374 white dwarfs, and fotmd that they satisfy the current observed white dwarfs by SDSS generally have large proper motions. In addition, we have estimated their effective temperatures by fitting the polynomial relationship between effective temperature and g-r color of known DZ white dwarfs, and found 12 of the 29 new DZ white dwarfs are cool, in which nine are between 6,000 K and 6,600 K, and three are below 6,000 K.展开更多
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61173093 and 61202182)the Postdoctoral Science Foundation of China(Grant No.2012 M521776)+2 种基金the Fundamental Research Funds for the Central Universities of Chinathe Postdoctoral Science Foundation of Shannxi Province,Chinathe Natural Science Basic Research Plan of Shaanxi Province,China(Grant Nos.2013JM8019 and 2014JQ8359)
文摘Community detection is an important methodology for understanding the intrinsic structure and function of a realworld network. In this paper, we propose an effective and efficient algorithm, called Dominant Label Propagation Algorithm(Abbreviated as DLPA), to detect communities in complex networks. The algorithm simulates a special voting process to detect overlapping and non-overlapping community structure in complex networks simultaneously. Our algorithm is very efficient, since its computational complexity is almost linear to the number of edges in the network. Experimental results on both real-world and synthetic networks show that our algorithm also possesses high accuracies on detecting community structure in networks.
基金supported by the National Natural Science Foundation of China under Grant No. 61272277, 41301409, 41571390the Fundamental Research Funds for the Central Universities under Grant No. 274742
文摘A large number of community discovery algorithms have been proposed in the last decade. Recently, the sharp increase of network scale has become a great challenge for traditional community discovery algorithms. Label propagation algorithm is a semi-supervised machine learning method, which has linear time complexity when coping with large scale networks. However, the output result has less stability and the quality of the output communities still remains to be improved. Therefore, we propose a novel coreleader based label propagation algorithm for community detection called CLBLPA. Firstly, we find core leaders of potential community by using a greedy method. Then we utilize the label influence potential to guide the process of label propagation. Thus we can accelerate the convergence of algorithm and improve the stability of the output. Experimental results on synthetic datasets and real networks show that CLBLPA can significantly improve the quality of the output communities.
基金supported by the National Natural Science Foundation of China(Grant No.61370073)the China Scholarship Council,China(Grant No.201306070037)
文摘Community detection is a fundamental work to analyse the structural and functional properties of complex networks. The label propagation algorithm (LPA) is a near linear time algorithm to find a good community structure. Despite various subsequent advances, an important issue of this algorithm has not yet been properly addressed. Random update orders within the algorithm severely hamper the stability of the identified community structure. In this paper, we executed the basic label propagation algorithm on networks multiple times, to obtain a set of consensus partitions. Based on these consensus partitions, we created a consensus weighted graph. In this consensus weighted graph, the weight value of the edge was the proportion value that the number of node pairs allocated in the same cluster was divided by the total number of partitions. Then, we introduced consensus weight to indicate the direction of label propagation. In label update steps, by computing the mixing value of consensus weight and label frequency, a node adopted the label which has the maximum mixing value instead of the most frequent one. For extending to different networks, we introduced a proportion parameter to adjust the proportion of consensus weight and label frequency in computing mixing value. Finally, we proposed an approach named the label propagation algorithm with consensus weight (LPAcw), and the experimental results showed that the LPAcw could enhance considerably both the stability and the accuracy of community partitions.
基金the National Natural Science Foundation of China under Grant No.61171098,the Fundamental Research Funds for the Central Universities of China,the 111 Project of China under Grant No.B08004
文摘The increased capacity and availability of the Intemet has led to a wide variety of applications. Intemet traffic characterization and application identification is important for network management. In this paper, based on detailed flow data collected from the public networks of Intemet Service Providers, we construct a flow graph to model the interactions among users. Considering traffic from different applications, we analyze the community structure of the flow graph in terms of cormmunity size, degree distribution of the community, community overlap, and overlap modularity. The near linear time community detection algorithm in complex networks, the Label Propagation Algorithm (LPA), is extended to the flow graph for application identification. We propose a new initialization and label propagation and update scheme. Experimental results show that the proposed algorithm has high accuracy and efficiency.
基金partially supported by the National Natural Science Foundation for Young Scientists of China(No.61402289)the National Science Foundation of Guangdong Province(No.2014A030313558)
文摘Sparse coding and supervised dictionary learning have rapidly developed in recent years,and achieved impressive performance in image classification. However, there is usually a limited number of labeled training samples and a huge amount of unlabeled data in practical image classification,which degrades the discrimination of the learned dictionary. How to effectively utilize unlabeled training data and explore the information hidden in unlabeled data has drawn much attention of researchers. In this paper, we propose a novel discriminative semisupervised dictionary learning method using label propagation(SSD-LP). Specifically, we utilize a label propagation algorithm based on class-specific reconstruction errors to accurately estimate the identities of unlabeled training samples, and develop an algorithm for optimizing the discriminative dictionary and discriminative coding vectors simultaneously.Extensive experiments on face recognition, digit recognition, and texture classification demonstrate the effectiveness of the proposed method.
基金Supported by the National Natural Science Foundation of China under Grant Nos.60873150,90920004 and 61003153
文摘Knowledge of noun phrase anaphoricity might be profitably exploited in coreference resolution to bypass the resolution of non-anaphoric noun phrases. However, it is surprising to notice that recent attempts to incorporate automatically acquired anaphoricity information into coreferenee resolution systems have been far from expectation. This paper proposes a global learning method in determining the anaphoricity of noun phrases via a label propagation algorithm to improve learning-based coreference resolution. In order to eliminate the huge computational burden in the label propagation algorithm, we employ the weighted support vectors as the critical instances in the training texts. In addition, two kinds of kernels, i.e instances to represent all the anaphoricity-labeled NP , the feature-based RBF (Radial Basis Function) kernel and the convolution tree kernel with approximate matching, are explored to compute the anaphoricity similarity between two noun phrases. Experiments on the ACE2003 corpus demonstrate the great effectiveness of our method in anaphoricity determination of noun phrases and its application in learning-based coreference resolution.
文摘The spread of rumors and diseases threatens the development of society,it is of great practical significance to locate propagation source quickly and accurately when rumors or epidemic outbreaks occur.However,the topological structure of online social network changes with time,which makes it very difficult to locate the propagation source.There are few studies focus on propagation source identification in dynamic networks.However,it is usually necessary to know the propagation model in advance.In this paper the label propagation algorithm is proposed to locate propagation source in temporal network.Then the propagation source was identified by hierarchical processing of dynamic networks and label propagation backwards without any underlying information dissemination model.Different propagation models were applied for comparative experiments on static and dynamic networks.Experimental results verify the effectiveness of the algorithm on temporal networks.
文摘Numerous works prove that existing neighbor-averaging graph neural networks(GNNs)cannot efficiently catch structure features,and many works show that injecting structure,distance,position,or spatial features can significantly improve the performance of GNNs,however,injecting high-level structure and distance into GNNs is an intuitive but untouched idea.This work sheds light on this issue and proposes a scheme to enhance graph attention networks(GATs)by encoding distance and hop-wise structure statistics.Firstly,the hop-wise structure and distributional distance information are extracted based on several hop-wise ego-nets of every target node.Secondly,the derived structure information,distance information,and intrinsic features are encoded into the same vector space and then added together to get initial embedding vectors.Thirdly,the derived embedding vectors are fed into GATs,such as GAT and adaptive graph diffusion network(AGDN)to get the soft labels.Fourthly,the soft labels are fed into correct and smooth(C&S)to conduct label propagation and get final predictions.Experiments show that the distance and hop-wise structures encoding enhanced graph attention networks(DHSEGATs)achieve a competitive result.
基金supported by the National Natural Science Foundation of China under Grant Nos.61271346,61172098,and91335112the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant No.20112302110040the Fundamental Research Funds for the Central Universities of China under Grant No.HIT.KISTP.201418
文摘Proteins usually bind together to form complexes, which play an important role in cellular activities. Many graph clustering methods have been proposed to identify protein complexes by finding dense regions in protein-protein interaction networks. We present a novel framework (CPL) that detects protein complexes by propagating labels through interactions in a network, in which labels denote complex identifiers. With proper propagation in CPL, proteins in the same complex will be assigned with the same labels. CPL does not make any strong assumptions about the topological structures of the complexes, as in previous methods. Tile CPL algorithm is tested on several publicly available yeast protein-protein interaction networks and compared with several state-of-the-art methods. The results suggest that CPL performs better than the existing methods. An analysis of the functional homogeneity based on a gene ontology analysis shows that the detected complexes of CPL are highly biologically relevant.
文摘Image segmentation is one of the most basic tasks in computer vision and remains an initial step of many applications. In this paper, we focus on interactive image segmentation(IIS), often referred to as foreground–background separation or object extraction, guided by user interaction. We provide an overview of the IIS literature by covering more than 150 publications, especially recent works that have not been surveyed before. Moreover, we try to give a comprehensive classification of them according to different viewpoints and present a general and concise comparison of the most recent published works. Furthermore, we survey widely used datasets,evaluation metrics, and available resources in the field of IIS.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.62073087,U191140003,6197309 and 61973090)the Key-Area Research and Development Program of Guangdong Province(Grant Nos.2019B010154002 and 2019B010118001)。
文摘Non-negative Tucker decomposition(NTD) has been developed as a crucial method for non-negative tensor data representation.However, NTD is essentially an unsupervised method and cannot take advantage of label information. In this paper, we claim that the low-dimensional representation extracted by NTD can be treated as the predicted soft-clustering coefficient matrix and can therefore be learned jointly with label propagation in a unified framework. The proposed method can extract the physicallymeaningful and parts-based representation of tensor data in their natural form while fully exploring the potential ability of the given labels with a nearest neighbors graph. In addition, an efficient accelerated proximal gradient(APG) algorithm is developed to solve the optimization problem. Finally, the experimental results on five benchmark image data sets for semi-supervised clustering and classification tasks demonstrate the superiority of this method over state-of-the-art methods.
文摘Graph clustering,i.e.,partitioning nodes or data points into non-overlapping clusters,can be beneficial in a large varieties of computer vision and machine learning applications.However,main graph clustering schemes,such as spectral clustering,cannot be applied to a large network due to prohibitive computational complexity required.While there exist methods applicable to large networks,these methods do not offer convincing comparisons against known ground truth.For the first time,this work conducts clustering algorithm performance evaluations on large networks(consisting of one million nodes)with ground truth information.Ideas and concepts from game theory are applied towards graph clustering to formulate a new proposed algorithm,Game Theoretical Approach for Clustering(GTAC).This theoretical framework is shown to be a generalization of both the Label Propagation and Louvain methods,offering an additional means of derivation and analysis.GTAC introduces a tuning parameter which allows variable algorithm performance in accordance with application needs.Experimentation shows that these GTAC algorithms offer scalability and tunability towards big data applications.
基金funded by the National Natural Science Foundation of China(Grant Nos.10973021 and 11303036)SDSS-III has been provided by the Alfred P.Sloan Foundation+2 种基金the Participating Institutionsthe National Science Foundationthe U.S.Department of Energy Ofce of Science
文摘Carbon stars and DZ white dwarfs are two types of rare objects in the Galaxy. In this paper, we have applied the label propagation algorithm to search for these two types of stars from Data Release Eight (DR8) of the Sloan Digital Sky Survey (SDSS), which is verified to be efficient by calculating precision and recall. From nearly two million spectra including stars, galaxies and QSOs, we have found 260 new carbon stars in which 96 stars have been identified as dwarfs and 7 identified as giants, and 11 composition spectrum systems (each of them consists of a white dwarf and a carbon star). Similarly, using the label propagation method, we have obtained 29 new DZ white dwarfs from SDSS DR8. Compared with PCA reconstructed spectra, the 29 findings are typical DZ white dwarfs. We have also investigated their proper motions by comparing them with proper motion distribution of 9,374 white dwarfs, and fotmd that they satisfy the current observed white dwarfs by SDSS generally have large proper motions. In addition, we have estimated their effective temperatures by fitting the polynomial relationship between effective temperature and g-r color of known DZ white dwarfs, and found 12 of the 29 new DZ white dwarfs are cool, in which nine are between 6,000 K and 6,600 K, and three are below 6,000 K.