The study delves into the expanding role of network platforms in our daily lives, encompassing various mediums like blogs, forums, online chats, and prominent social media platforms such as Facebook, Twitter, and Inst...The study delves into the expanding role of network platforms in our daily lives, encompassing various mediums like blogs, forums, online chats, and prominent social media platforms such as Facebook, Twitter, and Instagram. While these platforms offer avenues for self-expression and community support, they concurrently harbor negative impacts, fostering antisocial behaviors like phishing, impersonation, hate speech, cyberbullying, cyberstalking, cyberterrorism, fake news propagation, spamming, and fraud. Notably, individuals also leverage these platforms to connect with authorities and seek aid during disasters. The overarching objective of this research is to address the dual nature of network platforms by proposing innovative methodologies aimed at enhancing their positive aspects and mitigating their negative repercussions. To achieve this, the study introduces a weight learning method grounded in multi-linear attribute ranking. This approach serves to evaluate the significance of attribute combinations across all feature spaces. Additionally, a novel clustering method based on tensors is proposed to elevate the quality of clustering while effectively distinguishing selected features. The methodology incorporates a weighted average similarity matrix and optionally integrates weighted Euclidean distance, contributing to a more nuanced understanding of attribute importance. The analysis of the proposed methods yields significant findings. The weight learning method proves instrumental in discerning the importance of attribute combinations, shedding light on key aspects within feature spaces. Simultaneously, the clustering method based on tensors exhibits improved efficacy in enhancing clustering quality and feature distinction. This not only advances our understanding of attribute importance but also paves the way for more nuanced data analysis methodologies. In conclusion, this research underscores the pivotal role of network platforms in contemporary society, emphasizing their potential for both positive contributions and adverse consequences. The proposed methodologies offer novel approaches to address these dualities, providing a foundation for future research and practical applications. Ultimately, this study contributes to the ongoing discourse on optimizing the utility of network platforms while minimizing their negative impacts.展开更多
In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared...In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.展开更多
In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world da...In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.展开更多
Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have ...Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.展开更多
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
Path-based clustering algorithms typically generate clusters by optimizing a benchmark function.Most optimiza-tion methods in clustering algorithms often offer solutions close to the general optimal value.This study a...Path-based clustering algorithms typically generate clusters by optimizing a benchmark function.Most optimiza-tion methods in clustering algorithms often offer solutions close to the general optimal value.This study achieves the global optimum value for the criterion function in a shorter time using the minimax distance,Maximum Spanning Tree“MST”,and meta-heuristic algorithms,including Genetic Algorithm“GA”and Particle Swarm Optimization“PSO”.The Fast Path-based Clustering“FPC”algorithm proposed in this paper can find cluster centers correctly in most datasets and quickly perform clustering operations.The FPC does this operation using MST,the minimax distance,and a new hybrid meta-heuristic algorithm in a few rounds of algorithm iterations.This algorithm can achieve the global optimal value,and the main clustering process of the algorithm has a computational complexity of O�k2×n�.However,due to the complexity of the minimum distance algorithm,the total computational complexity is O�n2�.Experimental results of FPC on synthetic datasets with arbitrary shapes demonstrate that the algorithm is resistant to noise and outliers and can correctly identify clusters of varying sizes and numbers.In addition,the FPC requires the number of clusters as the only parameter to perform the clustering process.A comparative analysis of FPC and other clustering algorithms in this domain indicates that FPC exhibits superior speed,stability,and performance.展开更多
Contrastive learning is a significant research direction in the field of deep learning.However,existing data augmentation methods often lead to issues such as semantic drift in generated views while the complexity of ...Contrastive learning is a significant research direction in the field of deep learning.However,existing data augmentation methods often lead to issues such as semantic drift in generated views while the complexity of model pre-training limits further improvement in the performance of existing methods.To address these challenges,we propose the Efficient Clustering Network based on Matrix Factorization(ECN-MF).Specifically,we design a batched low-rank Singular Value Decomposition(SVD)algorithm for data augmentation to eliminate redundant information and uncover major patterns of variation and key information in the data.Additionally,we design a Mutual Information-Enhanced Clustering Module(MI-ECM)to accelerate the training process by leveraging a simple architecture to bring samples from the same cluster closer while pushing samples from other clusters apart.Extensive experiments on six datasets demonstrate that ECN-MF exhibits more effective performance compared to state-of-the-art algorithms.展开更多
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims...Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.展开更多
Hyperspectral imagery encompasses spectral and spatial dimensions,reflecting the material properties of objects.Its application proves crucial in search and rescue,concealed target identification,and crop growth analy...Hyperspectral imagery encompasses spectral and spatial dimensions,reflecting the material properties of objects.Its application proves crucial in search and rescue,concealed target identification,and crop growth analysis.Clustering is an important method of hyperspectral analysis.The vast data volume of hyperspectral imagery,coupled with redundant information,poses significant challenges in swiftly and accurately extracting features for subsequent analysis.The current hyperspectral feature clustering methods,which are mostly studied from space or spectrum,do not have strong interpretability,resulting in poor comprehensibility of the algorithm.So,this research introduces a feature clustering algorithm for hyperspectral imagery from an interpretability perspective.It commences with a simulated perception process,proposing an interpretable band selection algorithm to reduce data dimensions.Following this,amulti-dimensional clustering algorithm,rooted in fuzzy and kernel clustering,is developed to highlight intra-class similarities and inter-class differences.An optimized P systemis then introduced to enhance computational efficiency.This system coordinates all cells within a mapping space to compute optimal cluster centers,facilitating parallel computation.This approach diminishes sensitivity to initial cluster centers and augments global search capabilities,thus preventing entrapment in local minima and enhancing clustering performance.Experiments conducted on 300 datasets,comprising both real and simulated data.The results show that the average accuracy(ACC)of the proposed algorithm is 0.86 and the combination measure(CM)is 0.81.展开更多
The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces ...The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.展开更多
Implementing machine learning algorithms in the non-conducive environment of the vehicular network requires some adaptations due to the high computational complexity of these algorithms.K-clustering algorithms are sim...Implementing machine learning algorithms in the non-conducive environment of the vehicular network requires some adaptations due to the high computational complexity of these algorithms.K-clustering algorithms are simplistic,with fast performance and relative accuracy.However,their implementation depends on the initial selection of clusters number(K),the initial clusters’centers,and the clustering metric.This paper investigated using Scott’s histogram formula to estimate the K number and the Link Expiration Time(LET)as a clustering metric.Realistic traffic flows were considered for three maps,namely Highway,Traffic Light junction,and Roundabout junction,to study the effect of road layout on estimating the K number.A fast version of the PAM algorithm was used for clustering with a modification to reduce time complexity.The Affinity propagation algorithm sets the baseline for the estimated K number,and the Medoid Silhouette method is used to quantify the clustering.OMNET++,Veins,and SUMO were used to simulate the traffic,while the related algorithms were implemented in Python.The Scott’s formula estimation of the K number only matched the baseline when the road layout was simple.Moreover,the clustering algorithm required one iteration on average to converge when used with LET.展开更多
The exploration of exotic shapes and properties of atomic nuclei,e.g.,αcluster and toroidal shape,is a fascinating field in nuclear physics.To study the decay of these nuclei,a novel detector aimed at detecting multi...The exploration of exotic shapes and properties of atomic nuclei,e.g.,αcluster and toroidal shape,is a fascinating field in nuclear physics.To study the decay of these nuclei,a novel detector aimed at detecting multipleα-particle events was designed and constructed.The detector comprises two layers of double-sided silicon strip detectors(DSSD)and a cesium iodide scintillator array coupled with silicon photomultipliers array as light sensors,which has the advantages of their small size,fast response,and large dynamic range.DSSDs coupled with cesium iodide crystal arrays are used to distinguish multipleαhits.The detector array has a compact and integrated design that can be adapted to different experimental conditions.The detector array was simulated using Geant4,and the excitation energy spectra of someα-clustering nuclei were reconstructed to demonstrate the performance.The simulation results show that the detector array has excellent angular and energy resolutions,enabling effective reconstruction of the nuclear excited state by multipleαparticle events.This detector offers a new and powerful tool for nuclear physics experiments and has the potential to discover interesting physical phenomena related to exotic nuclear structures and their decay mechanisms.展开更多
Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional...Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.展开更多
Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, t...Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.展开更多
In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinfo...In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinforcement learning(DRL),significant progress has been made in the field of network optimization as DRL holds great promise for improving network performance and efficiency.In this work,our focus delves into the intricate challenge of joint cooperation clustering and downlink power control within CF-mMIMO networks.Leveraging the potent deep deterministic policy gradient(DDPG)algorithm,our objective is to maximize the proportional fairness(PF)for user rates,thereby aiming to achieve optimal network performance and resource utilization.Moreover,we harness the concept of“divide and conquer”strategy,introducing two innovative methods termed alternating DDPG(A-DDPG)and hierarchical DDPG(H-DDPG).These approaches aim to decompose the intricate joint optimization problem into more manageable sub-problems,thereby facilitating a more efficient resolution process.Our findings unequivo-cally showcase the superior efficacy of our proposed DDPG approach over the baseline schemes in both clustering and downlink power control.Furthermore,the A-DDPG and H-DDPG obtain higher performance gain than DDPG with lower computational complexity.展开更多
基金sponsored by the National Natural Science Foundation of P.R.China(Nos.62102194 and 62102196)Six Talent Peaks Project of Jiangsu Province(No.RJFW-111)Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.KYCX23_1087 and KYCX22_1027).
文摘The study delves into the expanding role of network platforms in our daily lives, encompassing various mediums like blogs, forums, online chats, and prominent social media platforms such as Facebook, Twitter, and Instagram. While these platforms offer avenues for self-expression and community support, they concurrently harbor negative impacts, fostering antisocial behaviors like phishing, impersonation, hate speech, cyberbullying, cyberstalking, cyberterrorism, fake news propagation, spamming, and fraud. Notably, individuals also leverage these platforms to connect with authorities and seek aid during disasters. The overarching objective of this research is to address the dual nature of network platforms by proposing innovative methodologies aimed at enhancing their positive aspects and mitigating their negative repercussions. To achieve this, the study introduces a weight learning method grounded in multi-linear attribute ranking. This approach serves to evaluate the significance of attribute combinations across all feature spaces. Additionally, a novel clustering method based on tensors is proposed to elevate the quality of clustering while effectively distinguishing selected features. The methodology incorporates a weighted average similarity matrix and optionally integrates weighted Euclidean distance, contributing to a more nuanced understanding of attribute importance. The analysis of the proposed methods yields significant findings. The weight learning method proves instrumental in discerning the importance of attribute combinations, shedding light on key aspects within feature spaces. Simultaneously, the clustering method based on tensors exhibits improved efficacy in enhancing clustering quality and feature distinction. This not only advances our understanding of attribute importance but also paves the way for more nuanced data analysis methodologies. In conclusion, this research underscores the pivotal role of network platforms in contemporary society, emphasizing their potential for both positive contributions and adverse consequences. The proposed methodologies offer novel approaches to address these dualities, providing a foundation for future research and practical applications. Ultimately, this study contributes to the ongoing discourse on optimizing the utility of network platforms while minimizing their negative impacts.
基金This work was supported by Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-M202300502,KJQN201800539).
文摘In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.
基金supported in part by the National Natural Science Foundation of China under Grant 62171203in part by the Jiangsu Province“333 Project”High-Level Talent Cultivation Subsidized Project+2 种基金in part by the SuzhouKey Supporting Subjects for Health Informatics under Grant SZFCXK202147in part by the Changshu Science and Technology Program under Grants CS202015 and CS202246in part by Changshu Key Laboratory of Medical Artificial Intelligence and Big Data under Grants CYZ202301 and CS202314.
文摘In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.
基金supported by the National Key Research and Development Program of China(No.2022YFB3304400)the National Natural Science Foundation of China(Nos.6230311,62303111,62076060,61932007,and 62176083)the Key Research and Development Program of Jiangsu Province of China(No.BE2022157).
文摘Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
文摘Path-based clustering algorithms typically generate clusters by optimizing a benchmark function.Most optimiza-tion methods in clustering algorithms often offer solutions close to the general optimal value.This study achieves the global optimum value for the criterion function in a shorter time using the minimax distance,Maximum Spanning Tree“MST”,and meta-heuristic algorithms,including Genetic Algorithm“GA”and Particle Swarm Optimization“PSO”.The Fast Path-based Clustering“FPC”algorithm proposed in this paper can find cluster centers correctly in most datasets and quickly perform clustering operations.The FPC does this operation using MST,the minimax distance,and a new hybrid meta-heuristic algorithm in a few rounds of algorithm iterations.This algorithm can achieve the global optimal value,and the main clustering process of the algorithm has a computational complexity of O�k2×n�.However,due to the complexity of the minimum distance algorithm,the total computational complexity is O�n2�.Experimental results of FPC on synthetic datasets with arbitrary shapes demonstrate that the algorithm is resistant to noise and outliers and can correctly identify clusters of varying sizes and numbers.In addition,the FPC requires the number of clusters as the only parameter to perform the clustering process.A comparative analysis of FPC and other clustering algorithms in this domain indicates that FPC exhibits superior speed,stability,and performance.
基金supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+3 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211)Innovative Research Project for Graduate Students in Hainan Province(Grant Nos.Qhys2023-96,Qhys2023-95).
文摘Contrastive learning is a significant research direction in the field of deep learning.However,existing data augmentation methods often lead to issues such as semantic drift in generated views while the complexity of model pre-training limits further improvement in the performance of existing methods.To address these challenges,we propose the Efficient Clustering Network based on Matrix Factorization(ECN-MF).Specifically,we design a batched low-rank Singular Value Decomposition(SVD)algorithm for data augmentation to eliminate redundant information and uncover major patterns of variation and key information in the data.Additionally,we design a Mutual Information-Enhanced Clustering Module(MI-ECM)to accelerate the training process by leveraging a simple architecture to bring samples from the same cluster closer while pushing samples from other clusters apart.Extensive experiments on six datasets demonstrate that ECN-MF exhibits more effective performance compared to state-of-the-art algorithms.
基金This research was funded by the National Natural Science Foundation of China(Grant No.72001190)by the Ministry of Education’s Humanities and Social Science Project via the China Ministry of Education(Grant No.20YJC630173)by Zhejiang A&F University(Grant No.2022LFR062).
文摘Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.
基金Yulin Science and Technology Bureau production Project“Research on Smart Agricultural Product Traceability System”(No.CXY-2022-64)Light of West China(No.XAB2022YN10)+1 种基金The China Postdoctoral Science Foundation(No.2023M740760)Shaanxi Province Key Research and Development Plan(No.2024SF-YBXM-678).
文摘Hyperspectral imagery encompasses spectral and spatial dimensions,reflecting the material properties of objects.Its application proves crucial in search and rescue,concealed target identification,and crop growth analysis.Clustering is an important method of hyperspectral analysis.The vast data volume of hyperspectral imagery,coupled with redundant information,poses significant challenges in swiftly and accurately extracting features for subsequent analysis.The current hyperspectral feature clustering methods,which are mostly studied from space or spectrum,do not have strong interpretability,resulting in poor comprehensibility of the algorithm.So,this research introduces a feature clustering algorithm for hyperspectral imagery from an interpretability perspective.It commences with a simulated perception process,proposing an interpretable band selection algorithm to reduce data dimensions.Following this,amulti-dimensional clustering algorithm,rooted in fuzzy and kernel clustering,is developed to highlight intra-class similarities and inter-class differences.An optimized P systemis then introduced to enhance computational efficiency.This system coordinates all cells within a mapping space to compute optimal cluster centers,facilitating parallel computation.This approach diminishes sensitivity to initial cluster centers and augments global search capabilities,thus preventing entrapment in local minima and enhancing clustering performance.Experiments conducted on 300 datasets,comprising both real and simulated data.The results show that the average accuracy(ACC)of the proposed algorithm is 0.86 and the combination measure(CM)is 0.81.
基金sponsored by the National Natural Science Foundation of China(Nos.61972208,62102194 and 62102196)National Natural Science Foundation of China(Youth Project)(No.62302237)+3 种基金Six Talent Peaks Project of Jiangsu Province(No.RJFW-111),China Postdoctoral Science Foundation Project(No.2018M640509)Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.KYCX22_1019,KYCX23_1087,KYCX22_1027,KYCX23_1087,SJCX24_0339 and SJCX24_0346)Innovative Training Program for College Students of Nanjing University of Posts and Telecommunications(No.XZD2019116)Nanjing University of Posts and Telecommunications College Students Innovation Training Program(Nos.XZD2019116,XYB2019331).
文摘The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.
文摘Implementing machine learning algorithms in the non-conducive environment of the vehicular network requires some adaptations due to the high computational complexity of these algorithms.K-clustering algorithms are simplistic,with fast performance and relative accuracy.However,their implementation depends on the initial selection of clusters number(K),the initial clusters’centers,and the clustering metric.This paper investigated using Scott’s histogram formula to estimate the K number and the Link Expiration Time(LET)as a clustering metric.Realistic traffic flows were considered for three maps,namely Highway,Traffic Light junction,and Roundabout junction,to study the effect of road layout on estimating the K number.A fast version of the PAM algorithm was used for clustering with a modification to reduce time complexity.The Affinity propagation algorithm sets the baseline for the estimated K number,and the Medoid Silhouette method is used to quantify the clustering.OMNET++,Veins,and SUMO were used to simulate the traffic,while the related algorithms were implemented in Python.The Scott’s formula estimation of the K number only matched the baseline when the road layout was simple.Moreover,the clustering algorithm required one iteration on average to converge when used with LET.
基金supported by the Strategic Priority Research Program of Chinese Academy of Sciences(No.XDB34030000)the National Key Research and Development Program of China(No.2022YFA1602404)+1 种基金National Natural Science Foundation(Nos.U1832129 and 11975210)Youth Innovation Promotion Association CAS(No.2017309)。
文摘The exploration of exotic shapes and properties of atomic nuclei,e.g.,αcluster and toroidal shape,is a fascinating field in nuclear physics.To study the decay of these nuclei,a novel detector aimed at detecting multipleα-particle events was designed and constructed.The detector comprises two layers of double-sided silicon strip detectors(DSSD)and a cesium iodide scintillator array coupled with silicon photomultipliers array as light sensors,which has the advantages of their small size,fast response,and large dynamic range.DSSDs coupled with cesium iodide crystal arrays are used to distinguish multipleαhits.The detector array has a compact and integrated design that can be adapted to different experimental conditions.The detector array was simulated using Geant4,and the excitation energy spectra of someα-clustering nuclei were reconstructed to demonstrate the performance.The simulation results show that the detector array has excellent angular and energy resolutions,enabling effective reconstruction of the nuclear excited state by multipleαparticle events.This detector offers a new and powerful tool for nuclear physics experiments and has the potential to discover interesting physical phenomena related to exotic nuclear structures and their decay mechanisms.
基金National Natural Science Foundation of China Nos.61962054 and 62372353.
文摘Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.
基金supported in part by NUS startup grantthe National Natural Science Foundation of China (52076037)。
文摘Although many multi-view clustering(MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters. In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space. The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation. To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm(SRMVEC). Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object. From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers. Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering. Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory. Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.
基金supported by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515012015supported in part by the National Natural Science Foundation of China under Grant 62201336+4 种基金in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515011541supported in part by the National Natural Science Foundation of China under Grant 62371344in part by the Fundamental Research Funds for the Central Universitiessupported in part by Knowledge Innovation Program of Wuhan-Shuguang Project under Grant 2023010201020316in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515010247。
文摘In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinforcement learning(DRL),significant progress has been made in the field of network optimization as DRL holds great promise for improving network performance and efficiency.In this work,our focus delves into the intricate challenge of joint cooperation clustering and downlink power control within CF-mMIMO networks.Leveraging the potent deep deterministic policy gradient(DDPG)algorithm,our objective is to maximize the proportional fairness(PF)for user rates,thereby aiming to achieve optimal network performance and resource utilization.Moreover,we harness the concept of“divide and conquer”strategy,introducing two innovative methods termed alternating DDPG(A-DDPG)and hierarchical DDPG(H-DDPG).These approaches aim to decompose the intricate joint optimization problem into more manageable sub-problems,thereby facilitating a more efficient resolution process.Our findings unequivo-cally showcase the superior efficacy of our proposed DDPG approach over the baseline schemes in both clustering and downlink power control.Furthermore,the A-DDPG and H-DDPG obtain higher performance gain than DDPG with lower computational complexity.