Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
We study the structural and dynamical properties of A209 based on Chandra and XMM-Newton observations.We obtain detailed temperature,pressure,and entropy maps with the contour binning method,and find a hot region in t...We study the structural and dynamical properties of A209 based on Chandra and XMM-Newton observations.We obtain detailed temperature,pressure,and entropy maps with the contour binning method,and find a hot region in the NW direction.The X-ray brightness residual map and corresponding temperature profiles reveal a possible shock front in the NW direction and a cold front feature in the SE direction.Combined with the galaxy luminosity density map we propose a weak merger scenario.A young sub-cluster passing from the SE to NW direction could explain the optical subpeak,the intracluster medium temperature map,the X-ray surface brightness excess,and the X-ray peak offset together.展开更多
Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hie...Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hierarchical, use the distance function to measure the dissimilarities among actors. These distance functions need to fulfill various properties, including the triangle inequality (TI). However, in some cases, the triangle inequality might be violated, impacting the quality of the resulting clusters. With experiments, this paper explains how TI violates while performing traditional clustering techniques: k-medoids, hierarchical, DENGRAPH, and spectral clustering on social networks and how the violation of TI affects the quality of the resulting clusters.展开更多
In recent years,many unknown protocols are constantly emerging,and they bring severe challenges to network security and network management.Existing unknown protocol recognition methods suffer from weak feature extract...In recent years,many unknown protocols are constantly emerging,and they bring severe challenges to network security and network management.Existing unknown protocol recognition methods suffer from weak feature extraction ability,and they cannot mine the discriminating features of the protocol data thoroughly.To address the issue,we propose an unknown application layer protocol recognition method based on deep clustering.Deep clustering which consists of the deep neural network and the clustering algorithm can automatically extract the features of the input and cluster the data based on the extracted features.Compared with the traditional clustering methods,deep clustering boasts of higher clustering accuracy.The proposed method utilizes network-in-network(NIN),channel attention,spatial attention and Bidirectional Long Short-term memory(BLSTM)to construct an autoencoder to extract the spatial-temporal features of the protocol data,and utilizes the unsupervised clustering algorithm to recognize the unknown protocols based on the features.The method firstly extracts the application layer protocol data from the network traffic and transforms the data into one-dimensional matrix.Secondly,the autoencoder is pretrained,and the protocol data is compressed into low dimensional latent space by the autoencoder and the initial clustering is performed with K-Means.Finally,the clustering loss is calculated and the classification model is optimized according to the clustering loss.The classification results can be obtained when the classification model is optimal.Compared with the existing unknown protocol recognition methods,the proposed method utilizes deep clustering to cluster the unknown protocols,and it can mine the key features of the protocol data and recognize the unknown protocols accurately.Experimental results show that the proposed method can effectively recognize the unknown protocols,and its performance is better than other methods.展开更多
Open clusters(OCs)serve as invaluable tracers for investigating the properties and evolution of stars and galaxies.Despite recent advancements in machine learning clustering algorithms,accurately discerning such clust...Open clusters(OCs)serve as invaluable tracers for investigating the properties and evolution of stars and galaxies.Despite recent advancements in machine learning clustering algorithms,accurately discerning such clusters remains challenging.We re-visited the 3013 samples generated with a hybrid clustering algorithm of FoF and pyUPMASK.A multi-view clustering(MvC)ensemble method was applied,which analyzes each member star of the OC from three perspectives—proper motion,spatial position,and composite views—before integrating the clustering outcomes to deduce more reliable cluster memberships.Based on the MvC results,we further excluded cluster candidates with fewer than ten member stars and obtained 1256 OC candidates.After isochrone fitting and visual inspection,we identified 506 candidate OCs in the Milky Way.In addition to the 493 previously reported candidates,we finally discovered 13 high-confidence new candidate clusters.展开更多
The evolution of dislocation loops in austenitic steels irradiated with Fe^(+)is investigated using cluster dynamics(CD)simulations by developing a CD model.The CD predictions are compared with experimental results in...The evolution of dislocation loops in austenitic steels irradiated with Fe^(+)is investigated using cluster dynamics(CD)simulations by developing a CD model.The CD predictions are compared with experimental results in the literature.The number density and average diameter of the dislocation loops obtained from the CD simulations are in good agreement with the experimental data obtained from transmission electron microscopy(TEM)observations of Fe~+-irradiated Solution Annealed 304,Cold Worked 316,and HR3 austenitic steels in the literature.The CD simulation results demonstrate that the diffusion of in-cascade interstitial clusters plays a major role in the dislocation loop density and dislocation loop growth;in particular,for the HR3 austenitic steel,the CD model has verified the effect of temperature on the density and size of the dislocation loops.展开更多
Developing highly active alloy catalysts that surpass the performance of platinum group metals in the oxygen reduction reaction(ORR)is critical in electrocatalysis.Gold-based single-atom alloy(AuSAA)clusters are gaini...Developing highly active alloy catalysts that surpass the performance of platinum group metals in the oxygen reduction reaction(ORR)is critical in electrocatalysis.Gold-based single-atom alloy(AuSAA)clusters are gaining recognition as promising alternatives due to their potential for high activity.However,enhancing its activity of AuSAA clusters remains challenging due to limited insights into its actual active site in alkaline environments.Herein,we studied a variety of Au_(54)M_(1) SAA cluster catalysts and revealed the operando formed MO_(x)(OH)_(y) complex acts as the crucial active site for catalyzing the ORR under the basic solution condition.The observed volcano plot indicates that Au_(54)Co_(1),Au_(54)M_(1),and Au_(54)Ru_(1) clusters can be the optimal Au_(54)M_(1) SAA cluster catalysts for the ORR.Our findings offer new insights into the actual active sites of AuSAA cluster catalysts,which will inform rational catalyst design in experimental settings.展开更多
Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have ...Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.展开更多
In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set f...In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set for ATC cybersecurity attacks is constructed by setting the feature states,adding recursive features,and determining the feature criticality.The expected information gain and entropy of the feature data are computed to determine the information gain of the feature data and reduce the interference of similar feature data.An autoencoder is introduced into the AI(artificial intelligence)algorithm to encode and decode the characteristics of ATC network security attack behavior to reduce the dimensionality of the ATC network security attack behavior data.Based on the above processing,an unsupervised learning algorithm for clustering detection of ATC network security attacks is designed.First,determine the distance between the clustering clusters of ATC network security attack behavior characteristics,calculate the clustering threshold,and construct the initial clustering center.Then,the new average value of all feature objects in each cluster is recalculated as the new cluster center.Second,it traverses all objects in a cluster of ATC network security attack behavior feature data.Finally,the cluster detection of ATC network security attack behavior is completed by the computation of objective functions.The experiment took three groups of experimental attack behavior data sets as the test object,and took the detection rate,false detection rate and recall rate as the test indicators,and selected three similar methods for comparative test.The experimental results show that the detection rate of this method is about 98%,the false positive rate is below 1%,and the recall rate is above 97%.Research shows that this method can improve the detection performance of security attacks in air traffic control network.展开更多
In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world da...In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.展开更多
In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared...In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.展开更多
Numerous wireless networks have emerged that can be used for short communication ranges where the infrastructure-based networks may fail because of their installation and cost.One of them is a sensor network with embe...Numerous wireless networks have emerged that can be used for short communication ranges where the infrastructure-based networks may fail because of their installation and cost.One of them is a sensor network with embedded sensors working as the primary nodes,termed Wireless Sensor Networks(WSNs),in which numerous sensors are connected to at least one Base Station(BS).These sensors gather information from the environment and transmit it to a BS or gathering location.WSNs have several challenges,including throughput,energy usage,and network lifetime concerns.Different strategies have been applied to get over these restrictions.Clustering may,therefore,be thought of as the best way to solve such issues.Consequently,it is crucial to analyze effective Cluster Head(CH)selection to maximize efficiency throughput,extend the network lifetime,and minimize energy consumption.This paper proposed an Accelerated Particle Swarm Optimization(APSO)algorithm based on the Low Energy Adaptive Clustering Hierarchy(LEACH),Neighboring Based Energy Efficient Routing(NBEER),Cooperative Energy Efficient Routing(CEER),and Cooperative Relay Neighboring Based Energy Efficient Routing(CR-NBEER)techniques.With the help of APSO in the implementation of the WSN,the main methodology of this article has taken place.The simulation findings in this study demonstrated that the suggested approach uses less energy,with respective energy consumption ranges of 0.1441 to 0.013 for 5 CH,1.003 to 0.0521 for 10 CH,and 0.1734 to 0.0911 for 15 CH.The sending packets ratio was also raised for all three CH selection scenarios,increasing from 659 to 1730.The number of dead nodes likewise dropped for the given combination,falling between 71 and 66.The network lifetime was deemed to have risen based on the results found.A hybrid with a few valuable parameters can further improve the suggested APSO-based protocol.Similar to underwater,WSN can make use of the proposed protocol.The overall results have been evaluated and compared with the existing approaches of sensor networks.展开更多
Path-based clustering algorithms typically generate clusters by optimizing a benchmark function.Most optimiza-tion methods in clustering algorithms often offer solutions close to the general optimal value.This study a...Path-based clustering algorithms typically generate clusters by optimizing a benchmark function.Most optimiza-tion methods in clustering algorithms often offer solutions close to the general optimal value.This study achieves the global optimum value for the criterion function in a shorter time using the minimax distance,Maximum Spanning Tree“MST”,and meta-heuristic algorithms,including Genetic Algorithm“GA”and Particle Swarm Optimization“PSO”.The Fast Path-based Clustering“FPC”algorithm proposed in this paper can find cluster centers correctly in most datasets and quickly perform clustering operations.The FPC does this operation using MST,the minimax distance,and a new hybrid meta-heuristic algorithm in a few rounds of algorithm iterations.This algorithm can achieve the global optimal value,and the main clustering process of the algorithm has a computational complexity of O�k2×n�.However,due to the complexity of the minimum distance algorithm,the total computational complexity is O�n2�.Experimental results of FPC on synthetic datasets with arbitrary shapes demonstrate that the algorithm is resistant to noise and outliers and can correctly identify clusters of varying sizes and numbers.In addition,the FPC requires the number of clusters as the only parameter to perform the clustering process.A comparative analysis of FPC and other clustering algorithms in this domain indicates that FPC exhibits superior speed,stability,and performance.展开更多
We fit various color–magnitude diagrams(CMDs) of the high-latitude Galactic globular clusters NGC 5024(M53),NGC 5053,NGC 5272(M3),NGC 5466,and NGC 7099(M30) by isochrones from the Dartmouth Stellar Evolution Database...We fit various color–magnitude diagrams(CMDs) of the high-latitude Galactic globular clusters NGC 5024(M53),NGC 5053,NGC 5272(M3),NGC 5466,and NGC 7099(M30) by isochrones from the Dartmouth Stellar Evolution Database and Bag of Stellar Tracks and Isochrones for α–enrichment [α/Fe] = +0.4.For the CMDs,we use data sets from Hubble Space Telescope,Gaia,and other sources utilizing,at least,25 photometric filters for each cluster.We obtain the following characteristics with their statistical uncertainties for NGC 5024,NGC 5053,NGC 5272,NGC 5466,and NGC 7099,respectively:metallicities [Fe/H] =-1.93 ± 0.02,-2.08 ± 0.03,-1.60 ± 0.02,-1.95 ± 0.02,and-2.07 ± 0.04 dex with their systematic uncertainty 0.1 dex;ages 13.00 ± 0.11,12.70 ± 0.11,11.63 ± 0.07,12.15 ± 0.11,and 12.80 ± 0.17 Gyr with their systematic uncertainty 0.8 Gyr;distances(systematic uncertainty added) 18.22 ± 0.06 ± 0.60,16.99 ± 0.06 ± 0.56,10.08 ± 0.04 ± 0.33,15.59 ±0.03 ± 0.51,and 8.29 ± 0.03 ± 0.27 kpc;reddenings E(B-V) = 0.023 ± 0.004,0.017 ± 0.004,0.023 ± 0.004,0.023 ± 0.003,and 0.045 ± 0.002 mag with their systematic uncertainty 0.01 mag;extinctions AV= 0.08 ± 0.01,0.06 ± 0.01,0.08 ± 0.01,0.08 ± 0.01,and 0.16 ± 0.01 mag with their systematic uncertainty 0.03 mag,which suggest the total Galactic extinction AV= 0.08 across the whole Galactic dust to extragalactic objects at the North Galactic Pole.The horizontal branch morphology difference of these clusters is explained by their different metallicity,age,mass-loss efficiency,and loss of low-mass members in the evolution of the core-collapse cluster NGC 7099 and loose clusters NGC 5053 and NGC 5466.展开更多
Based on the analysis of the importance of professional cluster construction by ecological theory,with the change of social demand for talents,this paper explores the practice of environmental chemical professional cl...Based on the analysis of the importance of professional cluster construction by ecological theory,with the change of social demand for talents,this paper explores the practice of environmental chemical professional cluster construction in Pingdingshan University,including gradually perfecting teaching conditions and reforming teaching mode,breaking through the limitations of resources,integrating the boundaries of colleges and departments,integrating multiple resources,innovating systems and mechanisms,reconstructing professional clusters,decon-structing professional connotations,reorganizing curriculum systems,etc.,in order to better build the ecological chain network of education in application-oriented colleges and universities,realize the deep integration of industry and education,train future-oriented interdisciplinary applied talents of new engineering,and realize the construction of characteristic professional cluster in application-oriented colleges.展开更多
Clusters greatly influence thermophysical properties of near critical gases. The cluster structures of supercritical fluids in general and Carbon Dioxide especially are important for the advanced supercritical fluid t...Clusters greatly influence thermophysical properties of near critical gases. The cluster structures of supercritical fluids in general and Carbon Dioxide especially are important for the advanced supercritical fluid technologies and analytics development. The paper extends to near critical densities the developed earlier methods to extract the clusters’ properties from Online Electronic Database of NIST on thermophysical properties of fluids. This Database contains a hidden knowledge of cluster fractions’ properties in real gases. The discovered earlier linear chain clusters dominate at intermediate densities. Their properties can be extrapolated to high density gases, thus opening the way to study large 3D clusters in near critical zone. The potential energy density of a gas, cleared from the chain clusters’ contribution, reflects only the 3D clusters’ characteristics. A series expansion of this value by the Monomer Fraction density discovers properties of n-particle 3D clusters. The paper demonstrates a discrete row of 3D clusters’ particle numbers and gives estimations for bond energies of these clusters.展开更多
Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The signif...Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.展开更多
Clustering data with varying densities and complicated structures is important,while many existing clustering algorithms face difficulties for this problem. The reason is that varying densities and complicated structu...Clustering data with varying densities and complicated structures is important,while many existing clustering algorithms face difficulties for this problem. The reason is that varying densities and complicated structure make single algorithms perform badly for different parts of data. More intensive parts are assumed to have more information probably,an algorithm clustering from high density part is proposed,which begins from a tiny distance to find the highest density-connected partition and form corresponding super cores,then distance is iteratively increased by a global heuristic method to cluster parts with different densities. Mean of silhouette coefficient indicates the cluster performance. Denoising function is implemented to eliminate influence of noise and outliers. Many challenging experiments indicate that the algorithm has good performance on data with widely varying densities and extremely complex structures. It decides the optimal number of clusters automatically.Background knowledge is not needed and parameters tuning is easy. It is robust against noise and outliers.展开更多
A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive...A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive to initializations and often generates coincident clusters. AFCM overcomes this shortcoming and it is an ex tension of PCM. Membership and typicality values can be simultaneously produced in AFCM. Experimental re- suits show that noise data can be well processed, coincident clusters are avoided and clustering accuracy is better.展开更多
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
基金supported by the National Natural Science Foundation of China(grant Nos.U2038104 and 11703014)the Bureau of International Cooperation,Chinese Academy of Sciences(GJHZ1864)。
文摘We study the structural and dynamical properties of A209 based on Chandra and XMM-Newton observations.We obtain detailed temperature,pressure,and entropy maps with the contour binning method,and find a hot region in the NW direction.The X-ray brightness residual map and corresponding temperature profiles reveal a possible shock front in the NW direction and a cold front feature in the SE direction.Combined with the galaxy luminosity density map we propose a weak merger scenario.A young sub-cluster passing from the SE to NW direction could explain the optical subpeak,the intracluster medium temperature map,the X-ray surface brightness excess,and the X-ray peak offset together.
文摘Clustering a social network is a process of grouping social actors into clusters where intra-cluster similarities among actors are higher than inter-cluster similarities. Clustering approaches, i.e. , k-medoids or hierarchical, use the distance function to measure the dissimilarities among actors. These distance functions need to fulfill various properties, including the triangle inequality (TI). However, in some cases, the triangle inequality might be violated, impacting the quality of the resulting clusters. With experiments, this paper explains how TI violates while performing traditional clustering techniques: k-medoids, hierarchical, DENGRAPH, and spectral clustering on social networks and how the violation of TI affects the quality of the resulting clusters.
基金This work is supported by the National Key R&D Program of China(2017YFB0802900).
文摘In recent years,many unknown protocols are constantly emerging,and they bring severe challenges to network security and network management.Existing unknown protocol recognition methods suffer from weak feature extraction ability,and they cannot mine the discriminating features of the protocol data thoroughly.To address the issue,we propose an unknown application layer protocol recognition method based on deep clustering.Deep clustering which consists of the deep neural network and the clustering algorithm can automatically extract the features of the input and cluster the data based on the extracted features.Compared with the traditional clustering methods,deep clustering boasts of higher clustering accuracy.The proposed method utilizes network-in-network(NIN),channel attention,spatial attention and Bidirectional Long Short-term memory(BLSTM)to construct an autoencoder to extract the spatial-temporal features of the protocol data,and utilizes the unsupervised clustering algorithm to recognize the unknown protocols based on the features.The method firstly extracts the application layer protocol data from the network traffic and transforms the data into one-dimensional matrix.Secondly,the autoencoder is pretrained,and the protocol data is compressed into low dimensional latent space by the autoencoder and the initial clustering is performed with K-Means.Finally,the clustering loss is calculated and the classification model is optimized according to the clustering loss.The classification results can be obtained when the classification model is optimal.Compared with the existing unknown protocol recognition methods,the proposed method utilizes deep clustering to cluster the unknown protocols,and it can mine the key features of the protocol data and recognize the unknown protocols accurately.Experimental results show that the proposed method can effectively recognize the unknown protocols,and its performance is better than other methods.
基金supported by the National Key Research And Development Program of China(No.2022YFF0711500)the National Natural Science Foundation of China(NSFC,Grant No.12373097)+1 种基金the Basic and Applied Basic Research Foundation Project of Guangdong Province(No.2024A1515011503)the Guangzhou Science and Technology Funds(2023A03J0016)。
文摘Open clusters(OCs)serve as invaluable tracers for investigating the properties and evolution of stars and galaxies.Despite recent advancements in machine learning clustering algorithms,accurately discerning such clusters remains challenging.We re-visited the 3013 samples generated with a hybrid clustering algorithm of FoF and pyUPMASK.A multi-view clustering(MvC)ensemble method was applied,which analyzes each member star of the OC from three perspectives—proper motion,spatial position,and composite views—before integrating the clustering outcomes to deduce more reliable cluster memberships.Based on the MvC results,we further excluded cluster candidates with fewer than ten member stars and obtained 1256 OC candidates.After isochrone fitting and visual inspection,we identified 506 candidate OCs in the Milky Way.In addition to the 493 previously reported candidates,we finally discovered 13 high-confidence new candidate clusters.
基金supported by the National Natural Science Foundation of China(No.U1967212)the Fundamental Research Funds for the Central Universities(No.2021MS032)the Nuclear Materials Innovation Foundation(No.WDZC-2023-AW-0305)。
文摘The evolution of dislocation loops in austenitic steels irradiated with Fe^(+)is investigated using cluster dynamics(CD)simulations by developing a CD model.The CD predictions are compared with experimental results in the literature.The number density and average diameter of the dislocation loops obtained from the CD simulations are in good agreement with the experimental data obtained from transmission electron microscopy(TEM)observations of Fe~+-irradiated Solution Annealed 304,Cold Worked 316,and HR3 austenitic steels in the literature.The CD simulation results demonstrate that the diffusion of in-cascade interstitial clusters plays a major role in the dislocation loop density and dislocation loop growth;in particular,for the HR3 austenitic steel,the CD model has verified the effect of temperature on the density and size of the dislocation loops.
文摘Developing highly active alloy catalysts that surpass the performance of platinum group metals in the oxygen reduction reaction(ORR)is critical in electrocatalysis.Gold-based single-atom alloy(AuSAA)clusters are gaining recognition as promising alternatives due to their potential for high activity.However,enhancing its activity of AuSAA clusters remains challenging due to limited insights into its actual active site in alkaline environments.Herein,we studied a variety of Au_(54)M_(1) SAA cluster catalysts and revealed the operando formed MO_(x)(OH)_(y) complex acts as the crucial active site for catalyzing the ORR under the basic solution condition.The observed volcano plot indicates that Au_(54)Co_(1),Au_(54)M_(1),and Au_(54)Ru_(1) clusters can be the optimal Au_(54)M_(1) SAA cluster catalysts for the ORR.Our findings offer new insights into the actual active sites of AuSAA cluster catalysts,which will inform rational catalyst design in experimental settings.
基金supported by the National Key Research and Development Program of China(No.2022YFB3304400)the National Natural Science Foundation of China(Nos.6230311,62303111,62076060,61932007,and 62176083)the Key Research and Development Program of Jiangsu Province of China(No.BE2022157).
文摘Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.
基金National Natural Science Foundation of China(U2133208,U20A20161)National Natural Science Foundation of China(No.62273244)Sichuan Science and Technology Program(No.2022YFG0180).
文摘In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set for ATC cybersecurity attacks is constructed by setting the feature states,adding recursive features,and determining the feature criticality.The expected information gain and entropy of the feature data are computed to determine the information gain of the feature data and reduce the interference of similar feature data.An autoencoder is introduced into the AI(artificial intelligence)algorithm to encode and decode the characteristics of ATC network security attack behavior to reduce the dimensionality of the ATC network security attack behavior data.Based on the above processing,an unsupervised learning algorithm for clustering detection of ATC network security attacks is designed.First,determine the distance between the clustering clusters of ATC network security attack behavior characteristics,calculate the clustering threshold,and construct the initial clustering center.Then,the new average value of all feature objects in each cluster is recalculated as the new cluster center.Second,it traverses all objects in a cluster of ATC network security attack behavior feature data.Finally,the cluster detection of ATC network security attack behavior is completed by the computation of objective functions.The experiment took three groups of experimental attack behavior data sets as the test object,and took the detection rate,false detection rate and recall rate as the test indicators,and selected three similar methods for comparative test.The experimental results show that the detection rate of this method is about 98%,the false positive rate is below 1%,and the recall rate is above 97%.Research shows that this method can improve the detection performance of security attacks in air traffic control network.
基金supported in part by the National Natural Science Foundation of China under Grant 62171203in part by the Jiangsu Province“333 Project”High-Level Talent Cultivation Subsidized Project+2 种基金in part by the SuzhouKey Supporting Subjects for Health Informatics under Grant SZFCXK202147in part by the Changshu Science and Technology Program under Grants CS202015 and CS202246in part by Changshu Key Laboratory of Medical Artificial Intelligence and Big Data under Grants CYZ202301 and CS202314.
文摘In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.
基金This work was supported by Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-M202300502,KJQN201800539).
文摘In clustering algorithms,the selection of neighbors significantly affects the quality of the final clustering results.While various neighbor relationships exist,such as K-nearest neighbors,natural neighbors,and shared neighbors,most neighbor relationships can only handle single structural relationships,and the identification accuracy is low for datasets with multiple structures.In life,people’s first instinct for complex things is to divide them into multiple parts to complete.Partitioning the dataset into more sub-graphs is a good idea approach to identifying complex structures.Taking inspiration from this,we propose a novel neighbor method:Shared Natural Neighbors(SNaN).To demonstrate the superiority of this neighbor method,we propose a shared natural neighbors-based hierarchical clustering algorithm for discovering arbitrary-shaped clusters(HC-SNaN).Our algorithm excels in identifying both spherical clusters and manifold clusters.Tested on synthetic datasets and real-world datasets,HC-SNaN demonstrates significant advantages over existing clustering algorithms,particularly when dealing with datasets containing arbitrary shapes.
文摘Numerous wireless networks have emerged that can be used for short communication ranges where the infrastructure-based networks may fail because of their installation and cost.One of them is a sensor network with embedded sensors working as the primary nodes,termed Wireless Sensor Networks(WSNs),in which numerous sensors are connected to at least one Base Station(BS).These sensors gather information from the environment and transmit it to a BS or gathering location.WSNs have several challenges,including throughput,energy usage,and network lifetime concerns.Different strategies have been applied to get over these restrictions.Clustering may,therefore,be thought of as the best way to solve such issues.Consequently,it is crucial to analyze effective Cluster Head(CH)selection to maximize efficiency throughput,extend the network lifetime,and minimize energy consumption.This paper proposed an Accelerated Particle Swarm Optimization(APSO)algorithm based on the Low Energy Adaptive Clustering Hierarchy(LEACH),Neighboring Based Energy Efficient Routing(NBEER),Cooperative Energy Efficient Routing(CEER),and Cooperative Relay Neighboring Based Energy Efficient Routing(CR-NBEER)techniques.With the help of APSO in the implementation of the WSN,the main methodology of this article has taken place.The simulation findings in this study demonstrated that the suggested approach uses less energy,with respective energy consumption ranges of 0.1441 to 0.013 for 5 CH,1.003 to 0.0521 for 10 CH,and 0.1734 to 0.0911 for 15 CH.The sending packets ratio was also raised for all three CH selection scenarios,increasing from 659 to 1730.The number of dead nodes likewise dropped for the given combination,falling between 71 and 66.The network lifetime was deemed to have risen based on the results found.A hybrid with a few valuable parameters can further improve the suggested APSO-based protocol.Similar to underwater,WSN can make use of the proposed protocol.The overall results have been evaluated and compared with the existing approaches of sensor networks.
文摘Path-based clustering algorithms typically generate clusters by optimizing a benchmark function.Most optimiza-tion methods in clustering algorithms often offer solutions close to the general optimal value.This study achieves the global optimum value for the criterion function in a shorter time using the minimax distance,Maximum Spanning Tree“MST”,and meta-heuristic algorithms,including Genetic Algorithm“GA”and Particle Swarm Optimization“PSO”.The Fast Path-based Clustering“FPC”algorithm proposed in this paper can find cluster centers correctly in most datasets and quickly perform clustering operations.The FPC does this operation using MST,the minimax distance,and a new hybrid meta-heuristic algorithm in a few rounds of algorithm iterations.This algorithm can achieve the global optimal value,and the main clustering process of the algorithm has a computational complexity of O�k2×n�.However,due to the complexity of the minimum distance algorithm,the total computational complexity is O�n2�.Experimental results of FPC on synthetic datasets with arbitrary shapes demonstrate that the algorithm is resistant to noise and outliers and can correctly identify clusters of varying sizes and numbers.In addition,the FPC requires the number of clusters as the only parameter to perform the clustering process.A comparative analysis of FPC and other clustering algorithms in this domain indicates that FPC exhibits superior speed,stability,and performance.
基金financial support from the Russian Science Foundation (grant No.20-72-10052)。
文摘We fit various color–magnitude diagrams(CMDs) of the high-latitude Galactic globular clusters NGC 5024(M53),NGC 5053,NGC 5272(M3),NGC 5466,and NGC 7099(M30) by isochrones from the Dartmouth Stellar Evolution Database and Bag of Stellar Tracks and Isochrones for α–enrichment [α/Fe] = +0.4.For the CMDs,we use data sets from Hubble Space Telescope,Gaia,and other sources utilizing,at least,25 photometric filters for each cluster.We obtain the following characteristics with their statistical uncertainties for NGC 5024,NGC 5053,NGC 5272,NGC 5466,and NGC 7099,respectively:metallicities [Fe/H] =-1.93 ± 0.02,-2.08 ± 0.03,-1.60 ± 0.02,-1.95 ± 0.02,and-2.07 ± 0.04 dex with their systematic uncertainty 0.1 dex;ages 13.00 ± 0.11,12.70 ± 0.11,11.63 ± 0.07,12.15 ± 0.11,and 12.80 ± 0.17 Gyr with their systematic uncertainty 0.8 Gyr;distances(systematic uncertainty added) 18.22 ± 0.06 ± 0.60,16.99 ± 0.06 ± 0.56,10.08 ± 0.04 ± 0.33,15.59 ±0.03 ± 0.51,and 8.29 ± 0.03 ± 0.27 kpc;reddenings E(B-V) = 0.023 ± 0.004,0.017 ± 0.004,0.023 ± 0.004,0.023 ± 0.003,and 0.045 ± 0.002 mag with their systematic uncertainty 0.01 mag;extinctions AV= 0.08 ± 0.01,0.06 ± 0.01,0.08 ± 0.01,0.08 ± 0.01,and 0.16 ± 0.01 mag with their systematic uncertainty 0.03 mag,which suggest the total Galactic extinction AV= 0.08 across the whole Galactic dust to extragalactic objects at the North Galactic Pole.The horizontal branch morphology difference of these clusters is explained by their different metallicity,age,mass-loss efficiency,and loss of low-mass members in the evolution of the core-collapse cluster NGC 7099 and loose clusters NGC 5053 and NGC 5466.
基金Supported by Education and Teaching Reform Research Project of Pingdingshan University(2021-JY55,2020-JY05)Key Scientifie Research Project of Col-leges and Universities in Henan Province(22B180011)+2 种基金Project of Henan Sci-ence and Technology Department(232102320262)Ideological and Political Theories Teaching in Key Demonstration Courses at School Level in Pingdings-han College in 2022-Comprehensive Experiment of Environmental BiologyIde-ological and Political Theories Teaching in Demonstration Courses at School Level in Pingdingshan College in 2023-Ecological Engineering.
文摘Based on the analysis of the importance of professional cluster construction by ecological theory,with the change of social demand for talents,this paper explores the practice of environmental chemical professional cluster construction in Pingdingshan University,including gradually perfecting teaching conditions and reforming teaching mode,breaking through the limitations of resources,integrating the boundaries of colleges and departments,integrating multiple resources,innovating systems and mechanisms,reconstructing professional clusters,decon-structing professional connotations,reorganizing curriculum systems,etc.,in order to better build the ecological chain network of education in application-oriented colleges and universities,realize the deep integration of industry and education,train future-oriented interdisciplinary applied talents of new engineering,and realize the construction of characteristic professional cluster in application-oriented colleges.
文摘Clusters greatly influence thermophysical properties of near critical gases. The cluster structures of supercritical fluids in general and Carbon Dioxide especially are important for the advanced supercritical fluid technologies and analytics development. The paper extends to near critical densities the developed earlier methods to extract the clusters’ properties from Online Electronic Database of NIST on thermophysical properties of fluids. This Database contains a hidden knowledge of cluster fractions’ properties in real gases. The discovered earlier linear chain clusters dominate at intermediate densities. Their properties can be extrapolated to high density gases, thus opening the way to study large 3D clusters in near critical zone. The potential energy density of a gas, cleared from the chain clusters’ contribution, reflects only the 3D clusters’ characteristics. A series expansion of this value by the Monomer Fraction density discovers properties of n-particle 3D clusters. The paper demonstrates a discrete row of 3D clusters’ particle numbers and gives estimations for bond energies of these clusters.
文摘Multi-view Subspace Clustering (MVSC) emerges as an advanced clustering method, designed to integrate diverse views to uncover a common subspace, enhancing the accuracy and robustness of clustering results. The significance of low-rank prior in MVSC is emphasized, highlighting its role in capturing the global data structure across views for improved performance. However, it faces challenges with outlier sensitivity due to its reliance on the Frobenius norm for error measurement. Addressing this, our paper proposes a Low-Rank Multi-view Subspace Clustering Based on Sparse Regularization (LMVSC- Sparse) approach. Sparse regularization helps in selecting the most relevant features or views for clustering while ignoring irrelevant or noisy ones. This leads to a more efficient and effective representation of the data, improving the clustering accuracy and robustness, especially in the presence of outliers or noisy data. By incorporating sparse regularization, LMVSC-Sparse can effectively handle outlier sensitivity, which is a common challenge in traditional MVSC methods relying solely on low-rank priors. Then Alternating Direction Method of Multipliers (ADMM) algorithm is employed to solve the proposed optimization problems. Our comprehensive experiments demonstrate the efficiency and effectiveness of LMVSC-Sparse, offering a robust alternative to traditional MVSC methods.
基金Supported by the National Key Research and Development Program of China(No.2016YFB0201305)National Science and Technology Major Project(No.2013ZX0102-8001-001-001)National Natural Science Foundation of China(No.91430218,31327901,61472395,61272134,61432018)
文摘Clustering data with varying densities and complicated structures is important,while many existing clustering algorithms face difficulties for this problem. The reason is that varying densities and complicated structure make single algorithms perform badly for different parts of data. More intensive parts are assumed to have more information probably,an algorithm clustering from high density part is proposed,which begins from a tiny distance to find the highest density-connected partition and form corresponding super cores,then distance is iteratively increased by a global heuristic method to cluster parts with different densities. Mean of silhouette coefficient indicates the cluster performance. Denoising function is implemented to eliminate influence of noise and outliers. Many challenging experiments indicate that the algorithm has good performance on data with widely varying densities and extremely complex structures. It decides the optimal number of clusters automatically.Background knowledge is not needed and parameters tuning is easy. It is robust against noise and outliers.
文摘A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive to initializations and often generates coincident clusters. AFCM overcomes this shortcoming and it is an ex tension of PCM. Membership and typicality values can be simultaneously produced in AFCM. Experimental re- suits show that noise data can be well processed, coincident clusters are avoided and clustering accuracy is better.