Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hie...Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hierarchical, use the distance function to measure the dissimilarities among actors. These distance functions need to fulfill various properties, including the triangle inequality (TI). However, in some cases, the triangle inequality might be violated, impacting the quality of the resulting clusters. With experiments, this paper explains how TI violates while performing traditional clustering techniques: k-medoids, hierarchical, DENGRAPH, and spectral clustering on social networks and how the violation of TI affects the quality of the resulting clusters.展开更多
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world da...In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.展开更多
Path-based clustering algorithms typically generate clusters by optimizing a benchmark function.Most optimiza-tion methods in clustering algorithms often offer solutions close to the general optimal value.This study a...Path-based clustering algorithms typically generate clusters by optimizing a benchmark function.Most optimiza-tion methods in clustering algorithms often offer solutions close to the general optimal value.This study achieves the global optimum value for the criterion function in a shorter time using the minimax distance,Maximum Spanning Tree“MST”,and meta-heuristic algorithms,including Genetic Algorithm“GA”and Particle Swarm Optimization“PSO”.The Fast Path-based Clustering“FPC”algorithm proposed in this paper can find cluster centers correctly in most datasets and quickly perform clustering operations.The FPC does this operation using MST,the minimax distance,and a new hybrid meta-heuristic algorithm in a few rounds of algorithm iterations.This algorithm can achieve the global optimal value,and the main clustering process of the algorithm has a computational complexity of O�k2×n�.However,due to the complexity of the minimum distance algorithm,the total computational complexity is O�n2�.Experimental results of FPC on synthetic datasets with arbitrary shapes demonstrate that the algorithm is resistant to noise and outliers and can correctly identify clusters of varying sizes and numbers.In addition,the FPC requires the number of clusters as the only parameter to perform the clustering process.A comparative analysis of FPC and other clustering algorithms in this domain indicates that FPC exhibits superior speed,stability,and performance.展开更多
Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The signif...Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.展开更多
The study delves into the expanding role of network platforms in our daily lives, encompassing various mediums like blogs, forums, online chats, and prominent social media platforms such as Facebook, Twitter, and Inst...The study delves into the expanding role of network platforms in our daily lives, encompassing various mediums like blogs, forums, online chats, and prominent social media platforms such as Facebook, Twitter, and Instagram. While these platforms offer avenues for self-expression and community support, they concurrently harbor negative impacts, fostering antisocial behaviors like phishing, impersonation, hate speech, cyberbullying, cyberstalking, cyberterrorism, fake news propagation, spamming, and fraud. Notably, individuals also leverage these platforms to connect with authorities and seek aid during disasters. The overarching objective of this research is to address the dual nature of network platforms by proposing innovative methodologies aimed at enhancing their positive aspects and mitigating their negative repercussions. To achieve this, the study introduces a weight learning method grounded in multi-linear attribute ranking. This approach serves to evaluate the significance of attribute combinations across all feature spaces. Additionally, a novel clustering method based on tensors is proposed to elevate the quality of clustering while effectively distinguishing selected features. The methodology incorporates a weighted average similarity matrix and optionally integrates weighted Euclidean distance, contributing to a more nuanced understanding of attribute importance. The analysis of the proposed methods yields significant findings. The weight learning method proves instrumental in discerning the importance of attribute combinations, shedding light on key aspects within feature spaces. Simultaneously, the clustering method based on tensors exhibits improved efficacy in enhancing clustering quality and feature distinction. This not only advances our understanding of attribute importance but also paves the way for more nuanced data analysis methodologies. In conclusion, this research underscores the pivotal role of network platforms in contemporary society, emphasizing their potential for both positive contributions and adverse consequences. The proposed methodologies offer novel approaches to address these dualities, providing a foundation for future research and practical applications. Ultimately, this study contributes to the ongoing discourse on optimizing the utility of network platforms while minimizing their negative impacts.展开更多
Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have ...Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.展开更多
Hyperspectral imagery encompasses spectral and spatial dimensions,reflecting the material properties of objects.Its application proves crucial in search and rescue,concealed target identification,and crop growth analy...Hyperspectral imagery encompasses spectral and spatial dimensions,reflecting the material properties of objects.Its application proves crucial in search and rescue,concealed target identification,and crop growth analysis.Clustering is an important method of hyperspectral analysis.The vast data volume of hyperspectral imagery,coupled with redundant information,poses significant challenges in swiftly and accurately extracting features for subsequent analysis.The current hyperspectral feature clustering methods,which are mostly studied from space or spectrum,do not have strong interpretability,resulting in poor comprehensibility of the algorithm.So,this research introduces a feature clustering algorithm for hyperspectral imagery from an interpretability perspective.It commences with a simulated perception process,proposing an interpretable band selection algorithm to reduce data dimensions.Following this,amulti-dimensional clustering algorithm,rooted in fuzzy and kernel clustering,is developed to highlight intra-class similarities and inter-class differences.An optimized P systemis then introduced to enhance computational efficiency.This system coordinates all cells within a mapping space to compute optimal cluster centers,facilitating parallel computation.This approach diminishes sensitivity to initial cluster centers and augments global search capabilities,thus preventing entrapment in local minima and enhancing clustering performance.Experiments conducted on 300 datasets,comprising both real and simulated data.The results show that the average accuracy(ACC)of the proposed algorithm is 0.86 and the combination measure(CM)is 0.81.展开更多
The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces ...The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.展开更多
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims...Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.展开更多
Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, t...Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.展开更多
In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinfo...In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinforcement learning(DRL),significant progress has been made in the field of network optimization as DRL holds great promise for improving network performance and efficiency.In this work,our focus delves into the intricate challenge of joint cooperation clustering and downlink power control within CF-mMIMO networks.Leveraging the potent deep deterministic policy gradient(DDPG)algorithm,our objective is to maximize the proportional fairness(PF)for user rates,thereby aiming to achieve optimal network performance and resource utilization.Moreover,we harness the concept of“divide and conquer”strategy,introducing two innovative methods termed alternating DDPG(A-DDPG)and hierarchical DDPG(H-DDPG).These approaches aim to decompose the intricate joint optimization problem into more manageable sub-problems,thereby facilitating a more efficient resolution process.Our findings unequivo-cally showcase the superior efficacy of our proposed DDPG approach over the baseline schemes in both clustering and downlink power control.Furthermore,the A-DDPG and H-DDPG obtain higher performance gain than DDPG with lower computational complexity.展开更多
Implementing machine learning algorithms in the non-conducive environment of the vehicular network requires some adaptations due to the high computational complexity of these algorithms.K-clustering algorithms are sim...Implementing machine learning algorithms in the non-conducive environment of the vehicular network requires some adaptations due to the high computational complexity of these algorithms.K-clustering algorithms are simplistic,with fast performance and relative accuracy.However,their implementation depends on the initial selection of clusters number(K),the initial clusters’centers,and the clustering metric.This paper investigated using Scott’s histogram formula to estimate the K number and the Link Expiration Time(LET)as a clustering metric.Realistic traffic flows were considered for three maps,namely Highway,Traffic Light junction,and Roundabout junction,to study the effect of road layout on estimating the K number.A fast version of the PAM algorithm was used for clustering with a modification to reduce time complexity.The Affinity propagation algorithm sets the baseline for the estimated K number,and the Medoid Silhouette method is used to quantify the clustering.OMNET++,Veins,and SUMO were used to simulate the traffic,while the related algorithms were implemented in Python.The Scott’s formula estimation of the K number only matched the baseline when the road layout was simple.Moreover,the clustering algorithm required one iteration on average to converge when used with LET.展开更多
Target maneuver recognition is a prerequisite for air combat situation awareness,trajectory prediction,threat assessment and maneuver decision.To get rid of the dependence of the current target maneuver recognition me...Target maneuver recognition is a prerequisite for air combat situation awareness,trajectory prediction,threat assessment and maneuver decision.To get rid of the dependence of the current target maneuver recognition method on empirical criteria and sample data,and automatically and adaptively complete the task of extracting the target maneuver pattern,in this paper,an air combat maneuver pattern extraction based on time series segmentation and clustering analysis is proposed by combining autoencoder,G-G clustering algorithm and the selective ensemble clustering analysis algorithm.Firstly,the autoencoder is used to extract key features of maneuvering trajectory to remove the impacts of redundant variables and reduce the data dimension;Then,taking the time information into account,the segmentation of Maneuver characteristic time series is realized with the improved FSTS-AEGG algorithm,and a large number of maneuver primitives are extracted;Finally,the maneuver primitives are grouped into some categories by using the selective ensemble multiple time series clustering algorithm,which can prove that each class represents a maneuver action.The maneuver pattern extraction method is applied to small scale air combat trajectory and can recognize and correctly partition at least 71.3%of maneuver actions,indicating that the method is effective and satisfies the requirements for engineering accuracy.In addition,this method can provide data support for various target maneuvering recognition methods proposed in the literature,greatly reduce the workload and improve the recognition accuracy.展开更多
The observation error model of the underwater acous-tic positioning system is an important factor to influence the positioning accuracy of the underwater target.For the position inconsistency error caused by consideri...The observation error model of the underwater acous-tic positioning system is an important factor to influence the positioning accuracy of the underwater target.For the position inconsistency error caused by considering the underwater tar-get as a mass point,as well as the observation system error,the traditional error model best estimation trajectory(EMBET)with little observed data and too many parameters can lead to the ill-condition of the parameter model.In this paper,a multi-station fusion system error model based on the optimal polynomial con-straint is constructed,and the corresponding observation sys-tem error identification based on improved spectral clustering is designed.Firstly,the reduced parameter unified modeling for the underwater target position parameters and the system error is achieved through the polynomial optimization.Then a multi-sta-tion non-oriented graph network is established,which can address the problem of the inaccurate identification for the sys-tem errors.Moreover,the similarity matrix of the spectral cluster-ing is improved,and the iterative identification for the system errors based on the improved spectral clustering is proposed.Finally,the comprehensive measured data of long baseline lake test and sea test show that the proposed method can accu-rately identify the system errors,and moreover can improve the positioning accuracy for the underwater target positioning.展开更多
Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional...Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.展开更多
Energy efficiency is the prime concern in Wireless Sensor Networks(WSNs) as maximized energy consumption without essentially limits the energy stability and network lifetime. Clustering is the significant approach ess...Energy efficiency is the prime concern in Wireless Sensor Networks(WSNs) as maximized energy consumption without essentially limits the energy stability and network lifetime. Clustering is the significant approach essential for minimizing unnecessary transmission energy consumption with sustained network lifetime. This clustering process is identified as the Non-deterministic Polynomial(NP)-hard optimization problems which has the maximized probability of being solved through metaheuristic algorithms.This adoption of hybrid metaheuristic algorithm concentrates on the identification of the optimal or nearoptimal solutions which aids in better energy stability during Cluster Head(CH) selection. In this paper,Hybrid Seagull and Whale Optimization Algorithmbased Dynamic Clustering Protocol(HSWOA-DCP)is proposed with the exploitation benefits of WOA and exploration merits of SEOA to optimal CH selection for maintaining energy stability with prolonged network lifetime. This HSWOA-DCP adopted the modified version of SEagull Optimization Algorithm(SEOA) to handle the problem of premature convergence and computational accuracy which is maximally possible during CH selection. The inclusion of SEOA into WOA improved the global searching capability during the selection of CH and prevents worst fitness nodes from being selected as CH, since the spiral attacking behavior of SEOA is similar to the bubble-net characteristics of WOA. This CH selection integrates the spiral attacking principles of SEOA and contraction surrounding mechanism of WOA for improving computation accuracy to prevent frequent election process. It also included the strategy of levy flight strategy into SEOA for potentially avoiding premature convergence to attain better trade-off between the rate of exploration and exploitation in a more effective manner. The simulation results of the proposed HSWOADCP confirmed better network survivability rate, network residual energy and network overall throughput on par with the competitive CH selection schemes under different number of data transmission rounds.The statistical analysis of the proposed HSWOA-DCP scheme also confirmed its energy stability with respect to ANOVA test.展开更多
Deep multi-view subspace clustering (DMVSC) based on self-expression has attracted increasing attention dueto its outstanding performance and nonlinear application. However, most existing methods neglect that viewpriv...Deep multi-view subspace clustering (DMVSC) based on self-expression has attracted increasing attention dueto its outstanding performance and nonlinear application. However, most existing methods neglect that viewprivatemeaningless information or noise may interfere with the learning of self-expression, which may lead to thedegeneration of clustering performance. In this paper, we propose a novel framework of Contrastive Consistencyand Attentive Complementarity (CCAC) for DMVsSC. CCAC aligns all the self-expressions of multiple viewsand fuses them based on their discrimination, so that it can effectively explore consistent and complementaryinformation for achieving precise clustering. Specifically, the view-specific self-expression is learned by a selfexpressionlayer embedded into the auto-encoder network for each view. To guarantee consistency across views andreduce the effect of view-private information or noise, we align all the view-specific self-expressions by contrastivelearning. The aligned self-expressions are assigned adaptive weights by channel attention mechanism according totheir discrimination. Then they are fused by convolution kernel to obtain consensus self-expression withmaximumcomplementarity ofmultiple views. Extensive experimental results on four benchmark datasets and one large-scaledataset of the CCAC method outperformother state-of-the-artmethods, demonstrating its clustering effectiveness.展开更多
In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared...In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.展开更多
文摘Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hierarchical, use the distance function to measure the dissimilarities among actors. These distance functions need to fulfill various properties, including the triangle inequality (TI). However, in some cases, the triangle inequality might be violated, impacting the quality of the resulting clusters. With experiments, this paper explains how TI violates while performing traditional clustering techniques: k-medoids, hierarchical, DENGRAPH, and spectral clustering on social networks and how the violation of TI affects the quality of the resulting clusters.
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
基金supported in part by the National Natural Science Foundation of China under Grant 62171203in part by the Jiangsu Province“333 Project”High-Level Talent Cultivation Subsidized Project+2 种基金in part by the SuzhouKey Supporting Subjects for Health Informatics under Grant SZFCXK202147in part by the Changshu Science and Technology Program under Grants CS202015 and CS202246in part by Changshu Key Laboratory of Medical Artificial Intelligence and Big Data under Grants CYZ202301 and CS202314.
文摘In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.
文摘Path-based clustering algorithms typically generate clusters by optimizing a benchmark function.Most optimiza-tion methods in clustering algorithms often offer solutions close to the general optimal value.This study achieves the global optimum value for the criterion function in a shorter time using the minimax distance,Maximum Spanning Tree“MST”,and meta-heuristic algorithms,including Genetic Algorithm“GA”and Particle Swarm Optimization“PSO”.The Fast Path-based Clustering“FPC”algorithm proposed in this paper can find cluster centers correctly in most datasets and quickly perform clustering operations.The FPC does this operation using MST,the minimax distance,and a new hybrid meta-heuristic algorithm in a few rounds of algorithm iterations.This algorithm can achieve the global optimal value,and the main clustering process of the algorithm has a computational complexity of O�k2×n�.However,due to the complexity of the minimum distance algorithm,the total computational complexity is O�n2�.Experimental results of FPC on synthetic datasets with arbitrary shapes demonstrate that the algorithm is resistant to noise and outliers and can correctly identify clusters of varying sizes and numbers.In addition,the FPC requires the number of clusters as the only parameter to perform the clustering process.A comparative analysis of FPC and other clustering algorithms in this domain indicates that FPC exhibits superior speed,stability,and performance.
文摘Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.
基金sponsored by the National Natural Science Foundation of P.R.China(Nos.62102194 and 62102196)Six Talent Peaks Project of Jiangsu Province(No.RJFW-111)Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.KYCX23_1087 and KYCX22_1027).
文摘The study delves into the expanding role of network platforms in our daily lives, encompassing various mediums like blogs, forums, online chats, and prominent social media platforms such as Facebook, Twitter, and Instagram. While these platforms offer avenues for self-expression and community support, they concurrently harbor negative impacts, fostering antisocial behaviors like phishing, impersonation, hate speech, cyberbullying, cyberstalking, cyberterrorism, fake news propagation, spamming, and fraud. Notably, individuals also leverage these platforms to connect with authorities and seek aid during disasters. The overarching objective of this research is to address the dual nature of network platforms by proposing innovative methodologies aimed at enhancing their positive aspects and mitigating their negative repercussions. To achieve this, the study introduces a weight learning method grounded in multi-linear attribute ranking. This approach serves to evaluate the significance of attribute combinations across all feature spaces. Additionally, a novel clustering method based on tensors is proposed to elevate the quality of clustering while effectively distinguishing selected features. The methodology incorporates a weighted average similarity matrix and optionally integrates weighted Euclidean distance, contributing to a more nuanced understanding of attribute importance. The analysis of the proposed methods yields significant findings. The weight learning method proves instrumental in discerning the importance of attribute combinations, shedding light on key aspects within feature spaces. Simultaneously, the clustering method based on tensors exhibits improved efficacy in enhancing clustering quality and feature distinction. This not only advances our understanding of attribute importance but also paves the way for more nuanced data analysis methodologies. In conclusion, this research underscores the pivotal role of network platforms in contemporary society, emphasizing their potential for both positive contributions and adverse consequences. The proposed methodologies offer novel approaches to address these dualities, providing a foundation for future research and practical applications. Ultimately, this study contributes to the ongoing discourse on optimizing the utility of network platforms while minimizing their negative impacts.
基金supported by the National Key Research and Development Program of China(No.2022YFB3304400)the National Natural Science Foundation of China(Nos.6230311,62303111,62076060,61932007,and 62176083)the Key Research and Development Program of Jiangsu Province of China(No.BE2022157).
文摘Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.
基金Yulin Science and Technology Bureau production Project“Research on Smart Agricultural Product Traceability System”(No.CXY-2022-64)Light of West China(No.XAB2022YN10)+1 种基金The China Postdoctoral Science Foundation(No.2023M740760)Shaanxi Province Key Research and Development Plan(No.2024SF-YBXM-678).
文摘Hyperspectral imagery encompasses spectral and spatial dimensions,reflecting the material properties of objects.Its application proves crucial in search and rescue,concealed target identification,and crop growth analysis.Clustering is an important method of hyperspectral analysis.The vast data volume of hyperspectral imagery,coupled with redundant information,poses significant challenges in swiftly and accurately extracting features for subsequent analysis.The current hyperspectral feature clustering methods,which are mostly studied from space or spectrum,do not have strong interpretability,resulting in poor comprehensibility of the algorithm.So,this research introduces a feature clustering algorithm for hyperspectral imagery from an interpretability perspective.It commences with a simulated perception process,proposing an interpretable band selection algorithm to reduce data dimensions.Following this,amulti-dimensional clustering algorithm,rooted in fuzzy and kernel clustering,is developed to highlight intra-class similarities and inter-class differences.An optimized P systemis then introduced to enhance computational efficiency.This system coordinates all cells within a mapping space to compute optimal cluster centers,facilitating parallel computation.This approach diminishes sensitivity to initial cluster centers and augments global search capabilities,thus preventing entrapment in local minima and enhancing clustering performance.Experiments conducted on 300 datasets,comprising both real and simulated data.The results show that the average accuracy(ACC)of the proposed algorithm is 0.86 and the combination measure(CM)is 0.81.
基金sponsored by the National Natural Science Foundation of China(Nos.61972208,62102194 and 62102196)National Natural Science Foundation of China(Youth Project)(No.62302237)+3 种基金Six Talent Peaks Project of Jiangsu Province(No.RJFW-111),China Postdoctoral Science Foundation Project(No.2018M640509)Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.KYCX22_1019,KYCX23_1087,KYCX22_1027,KYCX23_1087,SJCX24_0339 and SJCX24_0346)Innovative Training Program for College Students of Nanjing University of Posts and Telecommunications(No.XZD2019116)Nanjing University of Posts and Telecommunications College Students Innovation Training Program(Nos.XZD2019116,XYB2019331).
文摘The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.
基金This research was funded by the National Natural Science Foundation of China(Grant No.72001190)by the Ministry of Education’s Humanities and Social Science Project via the China Ministry of Education(Grant No.20YJC630173)by Zhejiang A&F University(Grant No.2022LFR062).
文摘Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.
基金supported in part by NUS startup grantthe National Natural Science Foundation of China (52076037)。
文摘Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.
基金supported by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515012015supported in part by the National Natural Science Foundation of China under Grant 62201336+4 种基金in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515011541supported in part by the National Natural Science Foundation of China under Grant 62371344in part by the Fundamental Research Funds for the Central Universitiessupported in part by Knowledge Innovation Program of Wuhan-Shuguang Project under Grant 2023010201020316in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515010247。
文摘In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinforcement learning(DRL),significant progress has been made in the field of network optimization as DRL holds great promise for improving network performance and efficiency.In this work,our focus delves into the intricate challenge of joint cooperation clustering and downlink power control within CF-mMIMO networks.Leveraging the potent deep deterministic policy gradient(DDPG)algorithm,our objective is to maximize the proportional fairness(PF)for user rates,thereby aiming to achieve optimal network performance and resource utilization.Moreover,we harness the concept of“divide and conquer”strategy,introducing two innovative methods termed alternating DDPG(A-DDPG)and hierarchical DDPG(H-DDPG).These approaches aim to decompose the intricate joint optimization problem into more manageable sub-problems,thereby facilitating a more efficient resolution process.Our findings unequivo-cally showcase the superior efficacy of our proposed DDPG approach over the baseline schemes in both clustering and downlink power control.Furthermore,the A-DDPG and H-DDPG obtain higher performance gain than DDPG with lower computational complexity.
文摘Implementing machine learning algorithms in the non-conducive environment of the vehicular network requires some adaptations due to the high computational complexity of these algorithms.K-clustering algorithms are simplistic,with fast performance and relative accuracy.However,their implementation depends on the initial selection of clusters number(K),the initial clusters’centers,and the clustering metric.This paper investigated using Scott’s histogram formula to estimate the K number and the Link Expiration Time(LET)as a clustering metric.Realistic traffic flows were considered for three maps,namely Highway,Traffic Light junction,and Roundabout junction,to study the effect of road layout on estimating the K number.A fast version of the PAM algorithm was used for clustering with a modification to reduce time complexity.The Affinity propagation algorithm sets the baseline for the estimated K number,and the Medoid Silhouette method is used to quantify the clustering.OMNET++,Veins,and SUMO were used to simulate the traffic,while the related algorithms were implemented in Python.The Scott’s formula estimation of the K number only matched the baseline when the road layout was simple.Moreover,the clustering algorithm required one iteration on average to converge when used with LET.
基金supported by the National Natural Science Foundation of China (Project No.72301293)。
文摘Target maneuver recognition is a prerequisite for air combat situation awareness,trajectory prediction,threat assessment and maneuver decision.To get rid of the dependence of the current target maneuver recognition method on empirical criteria and sample data,and automatically and adaptively complete the task of extracting the target maneuver pattern,in this paper,an air combat maneuver pattern extraction based on time series segmentation and clustering analysis is proposed by combining autoencoder,G-G clustering algorithm and the selective ensemble clustering analysis algorithm.Firstly,the autoencoder is used to extract key features of maneuvering trajectory to remove the impacts of redundant variables and reduce the data dimension;Then,taking the time information into account,the segmentation of Maneuver characteristic time series is realized with the improved FSTS-AEGG algorithm,and a large number of maneuver primitives are extracted;Finally,the maneuver primitives are grouped into some categories by using the selective ensemble multiple time series clustering algorithm,which can prove that each class represents a maneuver action.The maneuver pattern extraction method is applied to small scale air combat trajectory and can recognize and correctly partition at least 71.3%of maneuver actions,indicating that the method is effective and satisfies the requirements for engineering accuracy.In addition,this method can provide data support for various target maneuvering recognition methods proposed in the literature,greatly reduce the workload and improve the recognition accuracy.
基金This work was supported by the National Natural Science Foundation of China(61903086,61903366,62001115)the Natural Science Foundation of Hunan Province(2019JJ50745,2020JJ4280,2021JJ40133)the Fundamentals and Basic of Applications Research Foundation of Guangdong Province(2019A1515110136).
文摘The observation error model of the underwater acous-tic positioning system is an important factor to influence the positioning accuracy of the underwater target.For the position inconsistency error caused by considering the underwater tar-get as a mass point,as well as the observation system error,the traditional error model best estimation trajectory(EMBET)with little observed data and too many parameters can lead to the ill-condition of the parameter model.In this paper,a multi-station fusion system error model based on the optimal polynomial con-straint is constructed,and the corresponding observation sys-tem error identification based on improved spectral clustering is designed.Firstly,the reduced parameter unified modeling for the underwater target position parameters and the system error is achieved through the polynomial optimization.Then a multi-sta-tion non-oriented graph network is established,which can address the problem of the inaccurate identification for the sys-tem errors.Moreover,the similarity matrix of the spectral cluster-ing is improved,and the iterative identification for the system errors based on the improved spectral clustering is proposed.Finally,the comprehensive measured data of long baseline lake test and sea test show that the proposed method can accu-rately identify the system errors,and moreover can improve the positioning accuracy for the underwater target positioning.
基金National Natural Science Foundation of China Nos.61962054 and 62372353.
文摘Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.
文摘Energy efficiency is the prime concern in Wireless Sensor Networks(WSNs) as maximized energy consumption without essentially limits the energy stability and network lifetime. Clustering is the significant approach essential for minimizing unnecessary transmission energy consumption with sustained network lifetime. This clustering process is identified as the Non-deterministic Polynomial(NP)-hard optimization problems which has the maximized probability of being solved through metaheuristic algorithms.This adoption of hybrid metaheuristic algorithm concentrates on the identification of the optimal or nearoptimal solutions which aids in better energy stability during Cluster Head(CH) selection. In this paper,Hybrid Seagull and Whale Optimization Algorithmbased Dynamic Clustering Protocol(HSWOA-DCP)is proposed with the exploitation benefits of WOA and exploration merits of SEOA to optimal CH selection for maintaining energy stability with prolonged network lifetime. This HSWOA-DCP adopted the modified version of SEagull Optimization Algorithm(SEOA) to handle the problem of premature convergence and computational accuracy which is maximally possible during CH selection. The inclusion of SEOA into WOA improved the global searching capability during the selection of CH and prevents worst fitness nodes from being selected as CH, since the spiral attacking behavior of SEOA is similar to the bubble-net characteristics of WOA. This CH selection integrates the spiral attacking principles of SEOA and contraction surrounding mechanism of WOA for improving computation accuracy to prevent frequent election process. It also included the strategy of levy flight strategy into SEOA for potentially avoiding premature convergence to attain better trade-off between the rate of exploration and exploitation in a more effective manner. The simulation results of the proposed HSWOADCP confirmed better network survivability rate, network residual energy and network overall throughput on par with the competitive CH selection schemes under different number of data transmission rounds.The statistical analysis of the proposed HSWOA-DCP scheme also confirmed its energy stability with respect to ANOVA test.
文摘Deep multi-view subspace clustering (DMVSC) based on self-expression has attracted increasing attention dueto its outstanding performance and nonlinear application. However, most existing methods neglect that viewprivatemeaningless information or noise may interfere with the learning of self-expression, which may lead to thedegeneration of clustering performance. In this paper, we propose a novel framework of Contrastive Consistencyand Attentive Complementarity (CCAC) for DMVsSC. CCAC aligns all the self-expressions of multiple viewsand fuses them based on their discrimination, so that it can effectively explore consistent and complementaryinformation for achieving precise clustering. Specifically, the view-specific self-expression is learned by a selfexpressionlayer embedded into the auto-encoder network for each view. To guarantee consistency across views andreduce the effect of view-private information or noise, we align all the view-specific self-expressions by contrastivelearning. The aligned self-expressions are assigned adaptive weights by channel attention mechanism according totheir discrimination. Then they are fused by convolution kernel to obtain consensus self-expression withmaximumcomplementarity ofmultiple views. Extensive experimental results on four benchmark datasets and one large-scaledataset of the CCAC method outperformother state-of-the-artmethods, demonstrating its clustering effectiveness.
基金This work was supported by Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-M202300502,KJQN201800539).
文摘In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.