A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age a...A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age and nominal ones like blood type are exploited to characterize account details,business transactions or individuals.However,only a few standard clustering techniques and consensus clusteringmethods are provided to examine such a data thus far.Given this insight,the paper introduces novel extensions of link-based cluster ensemble,LCEWCT and LCEWTQ that are accurate for analyzing mixed-type data.They promote diversity within an ensemble through different initializations of the k-prototypes algorithm as base clusterings and then refine the summarized data using a link-based approach.Based on the evaluationmetric of NMI(NormalizedMutual Information)that is averaged across different combinations of benchmark datasets and experimental settings,these new models reach the improved level of 0.34,while the best model found in the literature obtains only around the mark of 0.24.Besides,parameter analysis included herein helps to enhance their performance even further,given relations of clustering quality and algorithmic variables specific to the underlying link-based models.Moreover,another significant factor of ensemble size is examined in such a way to justify a tradeoff between complexity and accuracy.展开更多
Consensus clustering aims to fuse several existing basic partitions into an integrated one; this has been widely recognized as a promising tool for multi-source and heterogeneous data clustering. Owing to robust and h...Consensus clustering aims to fuse several existing basic partitions into an integrated one; this has been widely recognized as a promising tool for multi-source and heterogeneous data clustering. Owing to robust and high-quality performance over traditional clustering methods, consensus clustering attracts much attention, and much efforts have been devoted to develop this field. In the literature, the K-means-based Consensus Clustering(KCC) transforms the consensus clustering problem into a classical K-means clustering with theoretical supports and shows the advantages over the state-of-the-art methods. Although KCC inherits the merits from K-means,it suffers from the initialization sensitivity. Moreover, the current consensus clustering framework separates the basic partition generation and fusion into two disconnected parts. To solve the above two challenges, a novel clustering algorithm, named Greedy optimization of K-means-based Consensus Clustering(GKCC) is proposed.Inspired by the well-known greedy K-means that aims to solve the sensitivity of K-means initialization, GKCC seamlessly combines greedy K-means and KCC together, achieves the merits inherited by GKCC and overcomes the drawbacks of the precursors. Moreover, a 59-sampling strategy is conducted to provide high-quality basic partitions and accelerate the algorithmic speed. Extensive experiments on 36 benchmark datasets demonstrate the significant advantages of GKCC over KCC and KCC++ in terms of the objective function values and standard deviations and external cluster validity.展开更多
Partitioning a complex power network into a number of sub-zones can help realize a divide-and-conquer’management structure for the whole system,such as voltage and reactive power control,coherency identification,powe...Partitioning a complex power network into a number of sub-zones can help realize a divide-and-conquer’management structure for the whole system,such as voltage and reactive power control,coherency identification,power system restoration,etc.Extensive partitioning methods have been proposed by defining various distances,applying different clustering methods,or formulating varying optimization models for one specific objective.However,a power network partition may serve two or more objectives,where a trade-off among these objectives is required.This paper proposes a novel weighted consensus clustering-based approach for bi-objective power network partition.By varying the weights of different partitions for different objectives,Pareto improvement can be explored based on the node-based and subset-based consensus clustering methods.Case studies on the IEEE 300-bus test system are conducted to verify the effectiveness and superiority of our proposed method.展开更多
This paper presents a novel consensus clustering(CC)approach for a document repository concerning power substations(PSD)and contributes to the intangible asset management of power systems.A domain ontology model,i.e.,...This paper presents a novel consensus clustering(CC)approach for a document repository concerning power substations(PSD)and contributes to the intangible asset management of power systems.A domain ontology model,i.e.,substation ontology(SONT),is applied to modify the traditional vector space model(VSM)for document representation,which is concerned with the semantic relationship between terms.A new document representation is generated using a term mutual information matrix with the aid of SONT.In addition,compared with two other novel CC algorithms,i.e.,non-negative matrix factorisation-based CC(NNMF-CC)and information theory-based CC(INT-CC),weighted partition via kernel-based CC algorithm(WPK-CC)is utilised to solve the CC issue for PSD.Meanwhile,genetic algorithms(GA)were applied to WPK-CC for PSD,as there are limitations in the original WPK-CC for document clustering.Subsequently,selected mechanisms in each GA’s procedure are compared and improved,resulting in comprehensive parameter settings for the PSD CC.Four simulation studies have been designed,in which the results are evaluated by purity validation method and show that the SONT-based document representation and improved WPK-CC,via modified GA,significantly improve the performance of the PSD CC.展开更多
This paper investigates the cluster consensus problem for second-order multi-agent systems by applying the pinning control method to a small collection of the agents. Consensus is attained independently for different ...This paper investigates the cluster consensus problem for second-order multi-agent systems by applying the pinning control method to a small collection of the agents. Consensus is attained independently for different agent clusters according to the community structure generated by the group partition of the underlying graph and sufficient conditions for both cluster and general consensus are obtained by using results from algebraic graph theory and the LaSalle Invariance Principle. Finally, some simple simulations are presented to illustrate the technique.展开更多
Community detection is a fundamental work to analyse the structural and functional properties of complex networks. The label propagation algorithm (LPA) is a near linear time algorithm to find a good community struc...Community detection is a fundamental work to analyse the structural and functional properties of complex networks. The label propagation algorithm (LPA) is a near linear time algorithm to find a good community structure. Despite various subsequent advances, an important issue of this algorithm has not yet been properly addressed. Random update orders within the algorithm severely hamper the stability of the identified community structure. In this paper, we executed the basic label propagation algorithm on networks multiple times, to obtain a set of consensus partitions. Based on these consensus partitions, we created a consensus weighted graph. In this consensus weighted graph, the weight value of the edge was the proportion value that the number of node pairs allocated in the same cluster was divided by the total number of partitions. Then, we introduced consensus weight to indicate the direction of label propagation. In label update steps, by computing the mixing value of consensus weight and label frequency, a node adopted the label which has the maximum mixing value instead of the most frequent one. For extending to different networks, we introduced a proportion parameter to adjust the proportion of consensus weight and label frequency in computing mixing value. Finally, we proposed an approach named the label propagation algorithm with consensus weight (LPAcw), and the experimental results showed that the LPAcw could enhance considerably both the stability and the accuracy of community partitions.展开更多
Consensus clustering is the problem of coordinating clustering information about the same data set coming from different runs of the same algorithm. Consensus clustering is becoming a state-of-the-art approach in an i...Consensus clustering is the problem of coordinating clustering information about the same data set coming from different runs of the same algorithm. Consensus clustering is becoming a state-of-the-art approach in an increasing number of applications. However, determining the optimal cluster number is still an open problem. In this paper, we propose a novel consensus clustering algorithm that is based on the Minkowski distance. Fusing with the Newman greedy algorithm in complex networks, the proposed clustering algorithm can automatically set the number of clusters. It is less sensitive to noise and can integrate solutions from multiple samples of data or attributes for processing data in the processing industry. A numerical simulation is also given to demonstrate the effectiveness of the proposed algorithm. Finally, this consensus clustering algorithm is applied to a froth flotation process.展开更多
This paper studies the cluster consensus of multi-agent systems(MASs)with objective optimization on directed and detail balanced networks,in which the global optimization objective function is a linear combination of ...This paper studies the cluster consensus of multi-agent systems(MASs)with objective optimization on directed and detail balanced networks,in which the global optimization objective function is a linear combination of local objective functions of all agents.Firstly,a directed and detail balanced network is constructed that depends on the weights of the global objective function,and two kinds of novel continuous-time optimization algorithms are proposed based on time-invariant and timevarying objective functions.Secondly,by using fixed-time stability theory and convex optimization theory,some sufficient conditions are obtained to ensure that all agents'states reach cluster consensus within a fixed-time,and asymptotically converge to the optimal solution of the global objective function.Finally,two examples are presented to show the efficacy of the theoretical results.展开更多
In gene expression profiling studies,including single-cell RNA sequencing(scRNA-seq)analyses,the identification and characterization of co-expressed genes provides critical information on cell identity and function.Ge...In gene expression profiling studies,including single-cell RNA sequencing(scRNA-seq)analyses,the identification and characterization of co-expressed genes provides critical information on cell identity and function.Gene co-expression clustering in scRNA-seq data presents certain challenges.We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately,and produce results that substantially limit biological expectations of co-expressed genes.Herein,we present single-cell Latent-variable Model(scLM),a gene coclustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context.Importantly,scLM can simultaneously cluster multiple single-cell datasets,i.e.,consensus clustering,enabling users to leverage single-cell data from multiple sources for novel comparative analysis.scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets.Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy.To illustrate the biological insights of scLM,we apply it to our in-house and public experimental scRNA-seq datasets.scLM identifies novel functional gene modules and refines cell states,which facilitates mechanism discovery and understanding of complex biosystems such as cancers.A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM.展开更多
This paper investigates the second-order nonlinear multi-agent systems subject to the cluster-delay consensus.The multi-agent systems consist of leader and agents,whose dynamics are second-order nonlinear.The objectiv...This paper investigates the second-order nonlinear multi-agent systems subject to the cluster-delay consensus.The multi-agent systems consist of leader and agents,whose dynamics are second-order nonlinear.The objective is that the agents track the leader asymptotically with different time delays,i.e.,the agents in different groups reach delay consensus,while the agents in the same group reach identical consensus.To guarantee the cluster-delay consensus for the second-order multi-agent systems,a new control protocol is proposed.Then some corresponding conditions for cluster-delay consensus are derived by using Lyapunov directed method and matrix theory.Finally,the effectiveness of the theoretical analysis results are verified by some numerical simulations.展开更多
Objectives:Allergic rhinitis(AR)refers to a form of respiratory inflammation that mainly affects the sinonasal mucosa.The purpose of this study was to explore the level of immune cell infiltration and the pathogenesis...Objectives:Allergic rhinitis(AR)refers to a form of respiratory inflammation that mainly affects the sinonasal mucosa.The purpose of this study was to explore the level of immune cell infiltration and the pathogenesis of AR.Methods:We performed a comprehensive analysis of two gene expression profiles(GSE50223 and GSE50101,a total of 30 patients with AR and 31 healthy controls).CIBERSORT was used to evaluate the immune cell infiltration levels.Weighted gene coexpression network analysis was applied to explore potential genes or gene modules related to immune status,and enrichment analyses including gene ontology,Kyoto Encyclopedia of Genes and Genomes,gene set enrichment analysis,and gene set variation analysis,were performed to analyze the potential mechanisms in AR.A protein–protein interaction network was constructed to investigate the hub genes,and consensus clustering was conducted to identify the molecular subtypes of AR.Results:Compared to the healthy controls,patients with AR had high abundance levels and proportions of CD4+memory‐activated T cells.One hundred and eight immune‐related differentially expressed genes were identified.Enrichment analysis suggested that AR was mainly related to leukocyte cell‐cell adhesion,cytokine‐cytokine receptor interaction,T‐cell activation,and T‐cell receptor signaling pathway.Ten hub genes,includingTYROBP,CSF1R,TLR8,FCER1G,SPI1,ITGAM,CYBB,FCGR2A,CCR1,andHCK,which were related to immune response,might be crucial to the pathogenesis of AR.Three molecular subtypes with significantly different immune statuses were identified.Conclusion:This study improves our understanding of the molecular mechanisms in AR via comprehensive strategies and provides potential diagnostic biomarkers and therapeutic targets of AR.展开更多
基金This work is funded by Newton Institutional Links 2020-21 project:623718881,jointly by British Council and National Research Council of Thailand(www.british council.org).The first author is the project PI with the other participating as a Co-I.
文摘A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age and nominal ones like blood type are exploited to characterize account details,business transactions or individuals.However,only a few standard clustering techniques and consensus clusteringmethods are provided to examine such a data thus far.Given this insight,the paper introduces novel extensions of link-based cluster ensemble,LCEWCT and LCEWTQ that are accurate for analyzing mixed-type data.They promote diversity within an ensemble through different initializations of the k-prototypes algorithm as base clusterings and then refine the summarized data using a link-based approach.Based on the evaluationmetric of NMI(NormalizedMutual Information)that is averaged across different combinations of benchmark datasets and experimental settings,these new models reach the improved level of 0.34,while the best model found in the literature obtains only around the mark of 0.24.Besides,parameter analysis included herein helps to enhance their performance even further,given relations of clustering quality and algorithmic variables specific to the underlying link-based models.Moreover,another significant factor of ensemble size is examined in such a way to justify a tradeoff between complexity and accuracy.
基金supported in part by the National Natural Science Foundation of China (No. 71471009)
文摘Consensus clustering aims to fuse several existing basic partitions into an integrated one; this has been widely recognized as a promising tool for multi-source and heterogeneous data clustering. Owing to robust and high-quality performance over traditional clustering methods, consensus clustering attracts much attention, and much efforts have been devoted to develop this field. In the literature, the K-means-based Consensus Clustering(KCC) transforms the consensus clustering problem into a classical K-means clustering with theoretical supports and shows the advantages over the state-of-the-art methods. Although KCC inherits the merits from K-means,it suffers from the initialization sensitivity. Moreover, the current consensus clustering framework separates the basic partition generation and fusion into two disconnected parts. To solve the above two challenges, a novel clustering algorithm, named Greedy optimization of K-means-based Consensus Clustering(GKCC) is proposed.Inspired by the well-known greedy K-means that aims to solve the sensitivity of K-means initialization, GKCC seamlessly combines greedy K-means and KCC together, achieves the merits inherited by GKCC and overcomes the drawbacks of the precursors. Moreover, a 59-sampling strategy is conducted to provide high-quality basic partitions and accelerate the algorithmic speed. Extensive experiments on 36 benchmark datasets demonstrate the significant advantages of GKCC over KCC and KCC++ in terms of the objective function values and standard deviations and external cluster validity.
基金supported in part by the National Key R&D Program of China(No.2016YFB0900100)the Major Smart Grid Joint Project of National Natural Science Foundation of China and State Grid(No.U1766212).
文摘Partitioning a complex power network into a number of sub-zones can help realize a divide-and-conquer’management structure for the whole system,such as voltage and reactive power control,coherency identification,power system restoration,etc.Extensive partitioning methods have been proposed by defining various distances,applying different clustering methods,or formulating varying optimization models for one specific objective.However,a power network partition may serve two or more objectives,where a trade-off among these objectives is required.This paper proposes a novel weighted consensus clustering-based approach for bi-objective power network partition.By varying the weights of different partitions for different objectives,Pareto improvement can be explored based on the node-based and subset-based consensus clustering methods.Case studies on the IEEE 300-bus test system are conducted to verify the effectiveness and superiority of our proposed method.
基金supported by the National Natural Science Foundation of China(No.51477054)Guangdong Innovative Research Team Program(No.201001N0104744201).
文摘This paper presents a novel consensus clustering(CC)approach for a document repository concerning power substations(PSD)and contributes to the intangible asset management of power systems.A domain ontology model,i.e.,substation ontology(SONT),is applied to modify the traditional vector space model(VSM)for document representation,which is concerned with the semantic relationship between terms.A new document representation is generated using a term mutual information matrix with the aid of SONT.In addition,compared with two other novel CC algorithms,i.e.,non-negative matrix factorisation-based CC(NNMF-CC)and information theory-based CC(INT-CC),weighted partition via kernel-based CC algorithm(WPK-CC)is utilised to solve the CC issue for PSD.Meanwhile,genetic algorithms(GA)were applied to WPK-CC for PSD,as there are limitations in the original WPK-CC for document clustering.Subsequently,selected mechanisms in each GA’s procedure are compared and improved,resulting in comprehensive parameter settings for the PSD CC.Four simulation studies have been designed,in which the results are evaluated by purity validation method and show that the SONT-based document representation and improved WPK-CC,via modified GA,significantly improve the performance of the PSD CC.
基金Project supported by the National Natural Science Foundation of China (Grant No. 70571059)
文摘This paper investigates the cluster consensus problem for second-order multi-agent systems by applying the pinning control method to a small collection of the agents. Consensus is attained independently for different agent clusters according to the community structure generated by the group partition of the underlying graph and sufficient conditions for both cluster and general consensus are obtained by using results from algebraic graph theory and the LaSalle Invariance Principle. Finally, some simple simulations are presented to illustrate the technique.
基金supported by the National Natural Science Foundation of China(Grant No.61370073)the China Scholarship Council,China(Grant No.201306070037)
文摘Community detection is a fundamental work to analyse the structural and functional properties of complex networks. The label propagation algorithm (LPA) is a near linear time algorithm to find a good community structure. Despite various subsequent advances, an important issue of this algorithm has not yet been properly addressed. Random update orders within the algorithm severely hamper the stability of the identified community structure. In this paper, we executed the basic label propagation algorithm on networks multiple times, to obtain a set of consensus partitions. Based on these consensus partitions, we created a consensus weighted graph. In this consensus weighted graph, the weight value of the edge was the proportion value that the number of node pairs allocated in the same cluster was divided by the total number of partitions. Then, we introduced consensus weight to indicate the direction of label propagation. In label update steps, by computing the mixing value of consensus weight and label frequency, a node adopted the label which has the maximum mixing value instead of the most frequent one. For extending to different networks, we introduced a proportion parameter to adjust the proportion of consensus weight and label frequency in computing mixing value. Finally, we proposed an approach named the label propagation algorithm with consensus weight (LPAcw), and the experimental results showed that the LPAcw could enhance considerably both the stability and the accuracy of community partitions.
基金supported by National High Technology Research and Development Program(863Program)(No.2013AA040301-3)National Natural Science Foundation of China(Nos.61473319 and 61104135)+1 种基金the Key Project of National Natural Science Foundation of China(Nos.61621062 and 61134006)the Innovation Research Funds of Central South University(No.2016CX014)
文摘Consensus clustering is the problem of coordinating clustering information about the same data set coming from different runs of the same algorithm. Consensus clustering is becoming a state-of-the-art approach in an increasing number of applications. However, determining the optimal cluster number is still an open problem. In this paper, we propose a novel consensus clustering algorithm that is based on the Minkowski distance. Fusing with the Newman greedy algorithm in complex networks, the proposed clustering algorithm can automatically set the number of clusters. It is less sensitive to noise and can integrate solutions from multiple samples of data or attributes for processing data in the processing industry. A numerical simulation is also given to demonstrate the effectiveness of the proposed algorithm. Finally, this consensus clustering algorithm is applied to a froth flotation process.
基金supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region under Grant No.2023D01C162in part by the National Natural Science Foundation of China under Grant Nos.62003289 and 62163035+1 种基金in part by the China Postdoctoral Science Foundation under Grant No.2021M690400in part by the Special Project for Local Science and Technology Development Guided by the Central Government under Grant No.ZYYD2022A05。
文摘This paper studies the cluster consensus of multi-agent systems(MASs)with objective optimization on directed and detail balanced networks,in which the global optimization objective function is a linear combination of local objective functions of all agents.Firstly,a directed and detail balanced network is constructed that depends on the weights of the global objective function,and two kinds of novel continuous-time optimization algorithms are proposed based on time-invariant and timevarying objective functions.Secondly,by using fixed-time stability theory and convex optimization theory,some sufficient conditions are obtained to ensure that all agents'states reach cluster consensus within a fixed-time,and asymptotically converge to the optimal solution of the global objective function.Finally,two examples are presented to show the efficacy of the theoretical results.
基金the Cancer Genomics,Tumor Tissue Repository,and Bioinformatics Shared Resources under the NCI Cancer Center Support Grant to the Comprehensive Cancer Center of Wake Forest University Health Sciences,USA(Grant No.P30CA012197)。
文摘In gene expression profiling studies,including single-cell RNA sequencing(scRNA-seq)analyses,the identification and characterization of co-expressed genes provides critical information on cell identity and function.Gene co-expression clustering in scRNA-seq data presents certain challenges.We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately,and produce results that substantially limit biological expectations of co-expressed genes.Herein,we present single-cell Latent-variable Model(scLM),a gene coclustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context.Importantly,scLM can simultaneously cluster multiple single-cell datasets,i.e.,consensus clustering,enabling users to leverage single-cell data from multiple sources for novel comparative analysis.scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets.Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy.To illustrate the biological insights of scLM,we apply it to our in-house and public experimental scRNA-seq datasets.scLM identifies novel functional gene modules and refines cell states,which facilitates mechanism discovery and understanding of complex biosystems such as cancers.A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM.
基金supported by the Fundamental Research Funds for the Central Universities under Grant No.2017JBM067the National Natural Science Foundation of China under Grant Nos.61503016 and 61403019the National Key Research and Development Program of China under Grant No.2017YFB0103202。
文摘This paper investigates the second-order nonlinear multi-agent systems subject to the cluster-delay consensus.The multi-agent systems consist of leader and agents,whose dynamics are second-order nonlinear.The objective is that the agents track the leader asymptotically with different time delays,i.e.,the agents in different groups reach delay consensus,while the agents in the same group reach identical consensus.To guarantee the cluster-delay consensus for the second-order multi-agent systems,a new control protocol is proposed.Then some corresponding conditions for cluster-delay consensus are derived by using Lyapunov directed method and matrix theory.Finally,the effectiveness of the theoretical analysis results are verified by some numerical simulations.
基金Beijing Health Technologies Promotion Program(Grant/Award Number:BHTPP202007)Research and Development Foundation of Peking University People’s Hospital(Grant/Award Number:RDL2021‐05)Capital Health Research and Development of Special Fund(Grant/Award Number:2020‐1‐2051)。
文摘Objectives:Allergic rhinitis(AR)refers to a form of respiratory inflammation that mainly affects the sinonasal mucosa.The purpose of this study was to explore the level of immune cell infiltration and the pathogenesis of AR.Methods:We performed a comprehensive analysis of two gene expression profiles(GSE50223 and GSE50101,a total of 30 patients with AR and 31 healthy controls).CIBERSORT was used to evaluate the immune cell infiltration levels.Weighted gene coexpression network analysis was applied to explore potential genes or gene modules related to immune status,and enrichment analyses including gene ontology,Kyoto Encyclopedia of Genes and Genomes,gene set enrichment analysis,and gene set variation analysis,were performed to analyze the potential mechanisms in AR.A protein–protein interaction network was constructed to investigate the hub genes,and consensus clustering was conducted to identify the molecular subtypes of AR.Results:Compared to the healthy controls,patients with AR had high abundance levels and proportions of CD4+memory‐activated T cells.One hundred and eight immune‐related differentially expressed genes were identified.Enrichment analysis suggested that AR was mainly related to leukocyte cell‐cell adhesion,cytokine‐cytokine receptor interaction,T‐cell activation,and T‐cell receptor signaling pathway.Ten hub genes,includingTYROBP,CSF1R,TLR8,FCER1G,SPI1,ITGAM,CYBB,FCGR2A,CCR1,andHCK,which were related to immune response,might be crucial to the pathogenesis of AR.Three molecular subtypes with significantly different immune statuses were identified.Conclusion:This study improves our understanding of the molecular mechanisms in AR via comprehensive strategies and provides potential diagnostic biomarkers and therapeutic targets of AR.