At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for ident...At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.展开更多
This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised ...This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised detection goes in this paper analysis through 4 steps:(1)selection of the most informative features from the considered data;(2)definition of the number of clusters based on the elbow criterion.The experimental results showed that the optimal number of clusters that group the considered data in an unsupervised manner corresponds to 2 clusters;(3)proposition of a new approach for hybridization of both hard and fuzzy clustering tuned with Ant Lion Optimization(ALO);(4)comparison with some existing metaheuristic optimizations such as Genetic Algorithm(GA)and Particle Swarm Optimization(PSO).By employing a multi-angle analysis based on the cluster validation indices,the confusion matrix,the efficiencies and purities rates,the average cost variation,the computational time and the Sammon mapping visualization,the results highlight the effectiveness of the improved Gustafson-Kessel algorithm optimized withALO(ALOGK)to validate the proposed approach.Even if the paper gives a complete clustering analysis,its novel contribution concerns only the Steps(1)and(3)considered above.The first contribution lies in the method used for Step(1)to select the most informative features and variables.We used the t-Statistic technique to rank them.Afterwards,a feature mapping is applied using Self-Organizing Map(SOM)to identify the level of correlation between them.Then,Particle Swarm Optimization(PSO),a metaheuristic optimization technique,is used to reduce the data set dimension.The second contribution of thiswork concern the third step,where each one of the clustering algorithms as K-means(KM),Global K-means(GlobalKM),Partitioning AroundMedoids(PAM),Fuzzy C-means(FCM),Gustafson-Kessel(GK)and Gath-Geva(GG)is optimized and tuned with ALO.展开更多
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien...Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.展开更多
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical...Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.展开更多
Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timesc...Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets con- taining millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, ag- glomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geomet- ric and kinetic clustering metrics will be discussed along with the performances of diflhrent clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algo- rithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.展开更多
Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-com...Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.展开更多
Accurate perception of the performance degradation of fuel cell is very important to detect its health state.However,inconsistent operating conditions of fuel cell vehicles in the test result in errors in the data.In ...Accurate perception of the performance degradation of fuel cell is very important to detect its health state.However,inconsistent operating conditions of fuel cell vehicles in the test result in errors in the data.In order to obtain a more credible degradation rate,this study proposes a novel method to classify the experimental data collected under different working conditions into similar operating conditions by using dimensionality reduction and clustering algorithms.Firstly,the experimental data collected from fuel cell vehicles belong to high-dimensional data.Then projecting high-dimensional data into three-dimensional feature vector space via principal component analysis(PCA).The dimension-reduced three-dimensional feature vectors are input into the clustering algorithm,such as K-means and density-based noise application spatial clustering(DBSCAN).According to the clustering results,the fuel cell voltage data with similar operating conditions can be classified.Finally,the selected voltage data can be used to precisely represent the true performance degradation of an on-board fuel cell stack.The results show that the voltage using the K-means algorithm declines the fastest,followed by the DBSCAN algorithm, finally the original data, which indicates that the performance of the fuel cell actually declines faste. Early intervention can prolong its life to the greatest extent.展开更多
Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the mul...Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.展开更多
As a mainstream research direction in the field of image segmentation,medical image segmentation plays a key role in the quantification of lesions,three-dimensional reconstruction,region of interest extraction and so ...As a mainstream research direction in the field of image segmentation,medical image segmentation plays a key role in the quantification of lesions,three-dimensional reconstruction,region of interest extraction and so on.Compared with natural images,medical images have a variety of modes.Besides,the emphasis of information which is conveyed by images of different modes is quite different.Because it is time-consuming and inefficient to manually segment medical images only by professional and experienced doctors.Therefore,large quantities of automated medical image segmentation methods have been developed.However,until now,researchers have not developed a universal method for all types of medical image segmentation.This paper reviews the literature on segmentation techniques that have produced major breakthroughs in recent years.Among the large quantities of medical image segmentation methods,this paper mainly discusses two categories of medical image segmentation methods.One is the improved strategies based on traditional clustering method.The other is the research progress of the improved image segmentation network structure model based on U-Net.The power of technology proves that the performance of the deep learning-based method is significantly better than that of the traditional method.This paper discussed both advantages and disadvantages of different algorithms and detailed how these methods can be used for the segmentation of lesions or other organs and tissues,as well as possible technical trends for future work.展开更多
Recently,the fundamental problem with Hybrid Mobile Ad-hoc Net-works(H-MANETs)is tofind a suitable and secure way of balancing the load through Internet gateways.Moreover,the selection of the gateway and overload of th...Recently,the fundamental problem with Hybrid Mobile Ad-hoc Net-works(H-MANETs)is tofind a suitable and secure way of balancing the load through Internet gateways.Moreover,the selection of the gateway and overload of the network results in packet loss and Delay(DL).For optimal performance,it is important to load balance between different gateways.As a result,a stable load balancing procedure is implemented,which selects gateways based on Fuzzy Logic(FL)and increases the efficiency of the network.In this case,since gate-ways are selected based on the number of nodes,the Energy Consumption(EC)was high.This paper presents a novel Node Quality-based Clustering Algo-rithm(NQCA)based on Fuzzy-Genetic for Cluster Head and Gateway Selection(FGCHGS).This algorithm combines NQCA with the Improved Weighted Clus-tering Algorithm(IWCA).The NQCA algorithm divides the network into clusters based upon node priority,transmission range,and neighbourfidelity.In addition,the simulation results tend to evaluate the performance effectiveness of the FFFCHGS algorithm in terms of EC,packet loss rate(PLR),etc.展开更多
In recent decades,several optimization algorithms have been developed for selecting the most energy efficient clusters in order to save power during trans-mission to a shorter distance while restricting the Primary Us...In recent decades,several optimization algorithms have been developed for selecting the most energy efficient clusters in order to save power during trans-mission to a shorter distance while restricting the Primary Users(PUs)interfer-ence.The Cognitive Radio(CR)system is based on the Adaptive Swarm Distributed Intelligent based Clustering algorithm(ASDIC)that shows better spectrum sensing among group of multiusers in terms of sensing error,power sav-ing,and convergence time.In this research paper,the proposed ASDIC algorithm develops better energy efficient distributed cluster based sensing with the optimal number of clusters on their connectivity.In this research,multiple random Sec-ondary Users(SUs),and PUs are considered for implementation.Hence,the pro-posed ASDIC algorithm improved the convergence speed by combining the multi-users clustered communication compared to the existing optimization algo-rithms.Experimental results showed that the proposed ASDIC algorithm reduced the node power of 9.646%compared to the existing algorithms.Similarly,ASDIC algorithm reduced 24.23%of SUs average node power compared to the existing algorithms.Probability of detection is higher by reducing the Signal-to-Noise Ratio(SNR)to 2 dB values.The proposed ASDIC delivers low false alarm rate compared to other existing optimization algorithms in the primary detection.Simulation results showed that the proposed ASDIC algorithm effectively solves the multimodal optimization problems and maximizes the performance of net-work capacity.展开更多
Internet services and web-based applications play pivotal roles in various sensitive domains, encompassing e-commerce, e-learning, e-healthcare, and e-payment. However, safeguarding these services poses a significant ...Internet services and web-based applications play pivotal roles in various sensitive domains, encompassing e-commerce, e-learning, e-healthcare, and e-payment. However, safeguarding these services poses a significant challenge, as the need for robust security measures becomes increasingly imperative. This paper presented an innovative method based on differential analyses to detect abrupt changes in network traffic characteristics. The core concept revolves around identifying abrupt alterations in certain characteristics such as input/output volume, the number of TCP connections, or DNS queries—within the analyzed traffic. Initially, the traffic is segmented into distinct sequences of slices, followed by quantifying specific characteristics for each slice. Subsequently, the distance between successive values of these measured characteristics is computed and clustered to detect sudden changes. To accomplish its objectives, the approach combined several techniques, including propositional logic, distance metrics (e.g., Kullback-Leibler Divergence), and clustering algorithms (e.g., K-means). When applied to two distinct datasets, the proposed approach demonstrates exceptional performance, achieving detection rates of up to 100%.展开更多
In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared dista...In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.展开更多
In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising...In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.展开更多
Web application fingerprint recognition is an effective security technology designed to identify and classify web applications,thereby enhancing the detection of potential threats and attacks.Traditional fingerprint r...Web application fingerprint recognition is an effective security technology designed to identify and classify web applications,thereby enhancing the detection of potential threats and attacks.Traditional fingerprint recognition methods,which rely on preannotated feature matching,face inherent limitations due to the ever-evolving nature and diverse landscape of web applications.In response to these challenges,this work proposes an innovative web application fingerprint recognition method founded on clustering techniques.The method involves extensive data collection from the Tranco List,employing adjusted feature selection built upon Wappalyzer and noise reduction through truncated SVD dimensionality reduction.The core of the methodology lies in the application of the unsupervised OPTICS clustering algorithm,eliminating the need for preannotated labels.By transforming web applications into feature vectors and leveraging clustering algorithms,our approach accurately categorizes diverse web applications,providing comprehensive and precise fingerprint recognition.The experimental results,which are obtained on a dataset featuring various web application types,affirm the efficacy of the method,demonstrating its ability to achieve high accuracy and broad coverage.This novel approach not only distinguishes between different web application types effectively but also demonstrates superiority in terms of classification accuracy and coverage,offering a robust solution to the challenges of web application fingerprint recognition.展开更多
Indoor positioning is a key technology in today’s intelligent environments,and it plays a crucial role in many application areas.This paper proposed an unscented Kalman filter(UKF)based on the maximum correntropy cri...Indoor positioning is a key technology in today’s intelligent environments,and it plays a crucial role in many application areas.This paper proposed an unscented Kalman filter(UKF)based on the maximum correntropy criterion(MCC)instead of the minimummean square error criterion(MMSE).This innovative approach is applied to the loose coupling of the Inertial Navigation System(INS)and Ultra-Wideband(UWB).By introducing the maximum correntropy criterion,the MCCUKF algorithm dynamically adjusts the covariance matrices of the system noise and the measurement noise,thus enhancing its adaptability to diverse environmental localization requirements.Particularly in the presence of non-Gaussian noise,especially heavy-tailed noise,the MCCUKF exhibits superior accuracy and robustness compared to the traditional UKF.The method initially generates an estimate of the predicted state and covariance matrix through the unscented transform(UT)and then recharacterizes the measurement information using a nonlinear regression method at the cost of theMCC.Subsequently,the state and covariance matrices of the filter are updated by employing the unscented transformation on the measurement equations.Moreover,to mitigate the influence of non-line-of-sight(NLOS)errors positioning accuracy,this paper proposes a k-medoid clustering algorithm based on bisection k-means(Bikmeans).This algorithm preprocesses the UWB distance measurements to yield a more precise position estimation.Simulation results demonstrate that MCCUKF is robust to the uncertainty of UWB and realizes stable integration of INS and UWB systems.展开更多
The analysis of microstates in EEG signals is a crucial technique for understanding the spatiotemporal dynamics of brain electrical activity.Traditional methods such as Atomic Agglomerative Hierarchical Clustering(AAH...The analysis of microstates in EEG signals is a crucial technique for understanding the spatiotemporal dynamics of brain electrical activity.Traditional methods such as Atomic Agglomerative Hierarchical Clustering(AAHC),K-means clustering,Principal Component Analysis(PCA),and Independent Component Analysis(ICA)are limited by a fixed number of microstate maps and insufficient capability in cross-task feature extraction.Tackling these limitations,this study introduces a Global Map Dissimilarity(GMD)-driven density canopy K-means clustering algorithm.This innovative approach autonomously determines the optimal number of EEG microstate topographies and employs Gaussian kernel density estimation alongside the GMD index for dynamic modeling of EEG data.Utilizing this advanced algorithm,the study analyzes the Motor Imagery(MI)dataset from the GigaScience database,GigaDB.The findings reveal six distinct microstates during actual right-hand movement and five microstates across other task conditions,with microstate C showing superior performance in all task states.During imagined movement,microstate A was significantly enhanced.Comparison with existing algorithms indicates a significant improvement in clustering performance by the refined method,with an average Calinski-Harabasz Index(CHI)of 35517.29 and a Davis-Bouldin Index(DBI)average of 2.57.Furthermore,an information-theoretical analysis of the microstate sequences suggests that imagined movement exhibits higher complexity and disorder than actual movement.By utilizing the extracted microstate sequence parameters as features,the improved algorithm achieved a classification accuracy of 98.41%in EEG signal categorization for motor imagery.A performance of 78.183%accuracy was achieved in a four-class motor imagery task on the BCI-IV-2a dataset.These results demonstrate the potential of the advanced algorithm in microstate analysis,offering a more effective tool for a deeper understanding of the spatiotemporal features of EEG signals.展开更多
The demands on conventional communication networks are increasing rapidly because of the exponential expansion of connected multimedia content.In light of the data-centric aspect of contemporary communication,the info...The demands on conventional communication networks are increasing rapidly because of the exponential expansion of connected multimedia content.In light of the data-centric aspect of contemporary communication,the information-centric network(ICN)paradigm offers hope for a solution by emphasizing content retrieval by name instead of location.If 5G networks are to meet the expected data demand surge from expanded connectivity and Internet of Things(IoT)devices,then effective caching solutions will be required tomaximize network throughput andminimize the use of resources.Hence,an ICN-based Cooperative Caching(ICN-CoC)technique has been used to select a cache by considering cache position,content attractiveness,and rate prediction.The findings show that utilizing our suggested approach improves caching regarding the Cache Hit Ratio(CHR)of 84.3%,Average Hop Minimization Ratio(AHMR)of 89.5%,and Mean Access Latency(MAL)of 0.4 s.Within a framework,it suggests improved caching strategies to handle the difficulty of effectively controlling data consumption in 5G networks.These improvements aim to make the network run more smoothly by enhancing content delivery,decreasing latency,and relieving congestion.By improving 5G communication systems’capacity tomanage the demands faced by modern data-centric applications,the research ultimately aids in advancement.展开更多
Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities...Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.展开更多
基金funded by the State Grid Limited Science and Technology Project of China,Grant Number SGSXDK00DJJS2200144.
文摘At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.
文摘This paper focuses on the unsupervised detection of the Higgs boson particle using the most informative features and variables which characterize the“Higgs machine learning challenge 2014”data set.This unsupervised detection goes in this paper analysis through 4 steps:(1)selection of the most informative features from the considered data;(2)definition of the number of clusters based on the elbow criterion.The experimental results showed that the optimal number of clusters that group the considered data in an unsupervised manner corresponds to 2 clusters;(3)proposition of a new approach for hybridization of both hard and fuzzy clustering tuned with Ant Lion Optimization(ALO);(4)comparison with some existing metaheuristic optimizations such as Genetic Algorithm(GA)and Particle Swarm Optimization(PSO).By employing a multi-angle analysis based on the cluster validation indices,the confusion matrix,the efficiencies and purities rates,the average cost variation,the computational time and the Sammon mapping visualization,the results highlight the effectiveness of the improved Gustafson-Kessel algorithm optimized withALO(ALOGK)to validate the proposed approach.Even if the paper gives a complete clustering analysis,its novel contribution concerns only the Steps(1)and(3)considered above.The first contribution lies in the method used for Step(1)to select the most informative features and variables.We used the t-Statistic technique to rank them.Afterwards,a feature mapping is applied using Self-Organizing Map(SOM)to identify the level of correlation between them.Then,Particle Swarm Optimization(PSO),a metaheuristic optimization technique,is used to reduce the data set dimension.The second contribution of thiswork concern the third step,where each one of the clustering algorithms as K-means(KM),Global K-means(GlobalKM),Partitioning AroundMedoids(PAM),Fuzzy C-means(FCM),Gustafson-Kessel(GK)and Gath-Geva(GG)is optimized and tuned with ALO.
文摘Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.
文摘Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.
基金supported by Shenzhen Science and Technology Innovation Committee(JCYJ20170413173837121)the Hong Kong Research Grant Council(HKUST C6009-15G,14203915,16302214,16304215,16318816,and AoE/P-705/16)+2 种基金King Abdullah University of Science and Technology(KAUST) Office of Sponsored Research(OSR)(OSR-2016-CRG5-3007)Guangzhou Science Technology and Innovation Commission(201704030116)Innovation and Technology Commission(ITCPD/17-9and ITC-CNERC14SC01)
文摘Molecular dynamics (MD) simulation has become a powerful tool to investigate the structure- function relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets con- taining millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, ag- glomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geomet- ric and kinetic clustering metrics will be discussed along with the performances of diflhrent clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algo- rithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.
基金Deanship of scientific research for funding and supporting this research through the initiative of DSR Graduate Students Research Support(GSR).
文摘Mobile commerce(m-commerce)contributes to increasing the popularity of electronic commerce(e-commerce),allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time.As demand for e-commerce increases tremendously,the pressure on delivery companies increases to organise their transportation plans to achieve profits and customer satisfaction.One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP),where a selected set of pickup and delivery customers need to be served within certain allowed trip time.In this paper,we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure(GRASP)to construct an initial solution for the MVPPDP.Our approaches first cluster the search space in order to reduce its dimensionality,then use GRASP to build routes for each cluster.We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem.Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.
基金supported by the special key project of Chongqing technological innovation and application development(cstc2019jscx-zdztzxX0033)the national key R&D plan of the Ministry of science and Technology(sub project)(2018YFB0105400)the National Natural Science Foundation of China(21908142).
文摘Accurate perception of the performance degradation of fuel cell is very important to detect its health state.However,inconsistent operating conditions of fuel cell vehicles in the test result in errors in the data.In order to obtain a more credible degradation rate,this study proposes a novel method to classify the experimental data collected under different working conditions into similar operating conditions by using dimensionality reduction and clustering algorithms.Firstly,the experimental data collected from fuel cell vehicles belong to high-dimensional data.Then projecting high-dimensional data into three-dimensional feature vector space via principal component analysis(PCA).The dimension-reduced three-dimensional feature vectors are input into the clustering algorithm,such as K-means and density-based noise application spatial clustering(DBSCAN).According to the clustering results,the fuel cell voltage data with similar operating conditions can be classified.Finally,the selected voltage data can be used to precisely represent the true performance degradation of an on-board fuel cell stack.The results show that the voltage using the K-means algorithm declines the fastest,followed by the DBSCAN algorithm, finally the original data, which indicates that the performance of the fuel cell actually declines faste. Early intervention can prolong its life to the greatest extent.
基金Supported by the National Natural Science Foundation of China under Grant No. 60872018,60721002,60875038the National Basic Research 973 Program of China under Grant No. 2007CB310607+2 种基金SRFDP Project under Grant No. 20070293001the Science and Technology Support Foundation of Jiangsu Province under Grant No. BE2009142 and BE2010180the Scientific Research Foundation of Graduate School of Nanjing University under Grant No. 2011CL07
文摘Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.
基金supported partly by the Open Project of State Key Laboratory of Millimeter Wave under Grant K202218partly by Innovation and Entrepreneurship Training Program of College Students under Grants 202210700006Y and 202210700005Z.
文摘As a mainstream research direction in the field of image segmentation,medical image segmentation plays a key role in the quantification of lesions,three-dimensional reconstruction,region of interest extraction and so on.Compared with natural images,medical images have a variety of modes.Besides,the emphasis of information which is conveyed by images of different modes is quite different.Because it is time-consuming and inefficient to manually segment medical images only by professional and experienced doctors.Therefore,large quantities of automated medical image segmentation methods have been developed.However,until now,researchers have not developed a universal method for all types of medical image segmentation.This paper reviews the literature on segmentation techniques that have produced major breakthroughs in recent years.Among the large quantities of medical image segmentation methods,this paper mainly discusses two categories of medical image segmentation methods.One is the improved strategies based on traditional clustering method.The other is the research progress of the improved image segmentation network structure model based on U-Net.The power of technology proves that the performance of the deep learning-based method is significantly better than that of the traditional method.This paper discussed both advantages and disadvantages of different algorithms and detailed how these methods can be used for the segmentation of lesions or other organs and tissues,as well as possible technical trends for future work.
文摘Recently,the fundamental problem with Hybrid Mobile Ad-hoc Net-works(H-MANETs)is tofind a suitable and secure way of balancing the load through Internet gateways.Moreover,the selection of the gateway and overload of the network results in packet loss and Delay(DL).For optimal performance,it is important to load balance between different gateways.As a result,a stable load balancing procedure is implemented,which selects gateways based on Fuzzy Logic(FL)and increases the efficiency of the network.In this case,since gate-ways are selected based on the number of nodes,the Energy Consumption(EC)was high.This paper presents a novel Node Quality-based Clustering Algo-rithm(NQCA)based on Fuzzy-Genetic for Cluster Head and Gateway Selection(FGCHGS).This algorithm combines NQCA with the Improved Weighted Clus-tering Algorithm(IWCA).The NQCA algorithm divides the network into clusters based upon node priority,transmission range,and neighbourfidelity.In addition,the simulation results tend to evaluate the performance effectiveness of the FFFCHGS algorithm in terms of EC,packet loss rate(PLR),etc.
文摘In recent decades,several optimization algorithms have been developed for selecting the most energy efficient clusters in order to save power during trans-mission to a shorter distance while restricting the Primary Users(PUs)interfer-ence.The Cognitive Radio(CR)system is based on the Adaptive Swarm Distributed Intelligent based Clustering algorithm(ASDIC)that shows better spectrum sensing among group of multiusers in terms of sensing error,power sav-ing,and convergence time.In this research paper,the proposed ASDIC algorithm develops better energy efficient distributed cluster based sensing with the optimal number of clusters on their connectivity.In this research,multiple random Sec-ondary Users(SUs),and PUs are considered for implementation.Hence,the pro-posed ASDIC algorithm improved the convergence speed by combining the multi-users clustered communication compared to the existing optimization algo-rithms.Experimental results showed that the proposed ASDIC algorithm reduced the node power of 9.646%compared to the existing algorithms.Similarly,ASDIC algorithm reduced 24.23%of SUs average node power compared to the existing algorithms.Probability of detection is higher by reducing the Signal-to-Noise Ratio(SNR)to 2 dB values.The proposed ASDIC delivers low false alarm rate compared to other existing optimization algorithms in the primary detection.Simulation results showed that the proposed ASDIC algorithm effectively solves the multimodal optimization problems and maximizes the performance of net-work capacity.
文摘Internet services and web-based applications play pivotal roles in various sensitive domains, encompassing e-commerce, e-learning, e-healthcare, and e-payment. However, safeguarding these services poses a significant challenge, as the need for robust security measures becomes increasingly imperative. This paper presented an innovative method based on differential analyses to detect abrupt changes in network traffic characteristics. The core concept revolves around identifying abrupt alterations in certain characteristics such as input/output volume, the number of TCP connections, or DNS queries—within the analyzed traffic. Initially, the traffic is segmented into distinct sequences of slices, followed by quantifying specific characteristics for each slice. Subsequently, the distance between successive values of these measured characteristics is computed and clustered to detect sudden changes. To accomplish its objectives, the approach combined several techniques, including propositional logic, distance metrics (e.g., Kullback-Leibler Divergence), and clustering algorithms (e.g., K-means). When applied to two distinct datasets, the proposed approach demonstrates exceptional performance, achieving detection rates of up to 100%.
文摘In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.
基金supported by National Natural Science Foundation of China(61304256)Zhejiang Provincial Natural Science Foundation of China(LQ13F030013)+4 种基金Project of the Education Department of Zhejiang Province(Y201327006)Young Researchers Foundation of Zhejiang Provincial Top Key Academic Discipline of Mechanical Engineering and Zhejiang Sci-Tech University Key Laboratory(ZSTUME01B15)New Century 151 Talent Project of Zhejiang Province521 Talent Project of Zhejiang Sci-Tech UniversityYoung and Middle-aged Talents Foundation of Zhejiang Provincial Top Key Academic Discipline of Mechanical Engineering
基金The National Natural Science Foundation of China(No.50674086)Specialized Research Fund for the Doctoral Program of Higher Education(No.20060290508)the Postdoctoral Scientific Program of Jiangsu Province(No.0701045B)
文摘In order to mine production and security information from security supervising data and to ensure security and safety involved in production and decision-making,a clustering analysis algorithm for security supervising data based on a semantic description in coal mines is studied.First,the semantic and numerical-based hybrid description method of security supervising data in coal mines is described.Secondly,the similarity measurement method of semantic and numerical data are separately given and a weight-based hybrid similarity measurement method for the security supervising data based on a semantic description in coal mines is presented.Thirdly,taking the hybrid similarity measurement method as the distance criteria and using a grid methodology for reference,an improved CURE clustering algorithm based on the grid is presented.Finally,the simulation results of a security supervising data set in coal mines validate the efficiency of the algorithm.
基金supported in part by the National Science Foundation of China under Grants U22B2027,62172297,62102262,61902276 and 62272311,Tianjin Intelligent Manufacturing Special Fund Project under Grant 20211097the China Guangxi Science and Technology Plan Project(Guangxi Science and Technology Base and Talent Special Project)under Grant AD23026096(Application Number 2022AC20001)+1 种基金Hainan Provincial Natural Science Foundation of China under Grant 622RC616CCF-Nsfocus Kunpeng Fund Project under Grant CCF-NSFOCUS202207.
文摘Web application fingerprint recognition is an effective security technology designed to identify and classify web applications,thereby enhancing the detection of potential threats and attacks.Traditional fingerprint recognition methods,which rely on preannotated feature matching,face inherent limitations due to the ever-evolving nature and diverse landscape of web applications.In response to these challenges,this work proposes an innovative web application fingerprint recognition method founded on clustering techniques.The method involves extensive data collection from the Tranco List,employing adjusted feature selection built upon Wappalyzer and noise reduction through truncated SVD dimensionality reduction.The core of the methodology lies in the application of the unsupervised OPTICS clustering algorithm,eliminating the need for preannotated labels.By transforming web applications into feature vectors and leveraging clustering algorithms,our approach accurately categorizes diverse web applications,providing comprehensive and precise fingerprint recognition.The experimental results,which are obtained on a dataset featuring various web application types,affirm the efficacy of the method,demonstrating its ability to achieve high accuracy and broad coverage.This novel approach not only distinguishes between different web application types effectively but also demonstrates superiority in terms of classification accuracy and coverage,offering a robust solution to the challenges of web application fingerprint recognition.
基金supported by the National Natural Science Foundation of China under Grant Nos.62273083 and 61803077Natural Science Foundation of Hebei Province under Grant No.F2020501012.
文摘Indoor positioning is a key technology in today’s intelligent environments,and it plays a crucial role in many application areas.This paper proposed an unscented Kalman filter(UKF)based on the maximum correntropy criterion(MCC)instead of the minimummean square error criterion(MMSE).This innovative approach is applied to the loose coupling of the Inertial Navigation System(INS)and Ultra-Wideband(UWB).By introducing the maximum correntropy criterion,the MCCUKF algorithm dynamically adjusts the covariance matrices of the system noise and the measurement noise,thus enhancing its adaptability to diverse environmental localization requirements.Particularly in the presence of non-Gaussian noise,especially heavy-tailed noise,the MCCUKF exhibits superior accuracy and robustness compared to the traditional UKF.The method initially generates an estimate of the predicted state and covariance matrix through the unscented transform(UT)and then recharacterizes the measurement information using a nonlinear regression method at the cost of theMCC.Subsequently,the state and covariance matrices of the filter are updated by employing the unscented transformation on the measurement equations.Moreover,to mitigate the influence of non-line-of-sight(NLOS)errors positioning accuracy,this paper proposes a k-medoid clustering algorithm based on bisection k-means(Bikmeans).This algorithm preprocesses the UWB distance measurements to yield a more precise position estimation.Simulation results demonstrate that MCCUKF is robust to the uncertainty of UWB and realizes stable integration of INS and UWB systems.
基金funded by National Nature Science Foundation of China,Yunnan Funda-Mental Research Projects,Special Project of Guangdong Province in Key Fields of Ordinary Colleges and Universities and Chaozhou Science and Technology Plan Project of Funder Grant Numbers 82060329,202201AT070108,2023ZDZX2038 and 202201GY01.
文摘The analysis of microstates in EEG signals is a crucial technique for understanding the spatiotemporal dynamics of brain electrical activity.Traditional methods such as Atomic Agglomerative Hierarchical Clustering(AAHC),K-means clustering,Principal Component Analysis(PCA),and Independent Component Analysis(ICA)are limited by a fixed number of microstate maps and insufficient capability in cross-task feature extraction.Tackling these limitations,this study introduces a Global Map Dissimilarity(GMD)-driven density canopy K-means clustering algorithm.This innovative approach autonomously determines the optimal number of EEG microstate topographies and employs Gaussian kernel density estimation alongside the GMD index for dynamic modeling of EEG data.Utilizing this advanced algorithm,the study analyzes the Motor Imagery(MI)dataset from the GigaScience database,GigaDB.The findings reveal six distinct microstates during actual right-hand movement and five microstates across other task conditions,with microstate C showing superior performance in all task states.During imagined movement,microstate A was significantly enhanced.Comparison with existing algorithms indicates a significant improvement in clustering performance by the refined method,with an average Calinski-Harabasz Index(CHI)of 35517.29 and a Davis-Bouldin Index(DBI)average of 2.57.Furthermore,an information-theoretical analysis of the microstate sequences suggests that imagined movement exhibits higher complexity and disorder than actual movement.By utilizing the extracted microstate sequence parameters as features,the improved algorithm achieved a classification accuracy of 98.41%in EEG signal categorization for motor imagery.A performance of 78.183%accuracy was achieved in a four-class motor imagery task on the BCI-IV-2a dataset.These results demonstrate the potential of the advanced algorithm in microstate analysis,offering a more effective tool for a deeper understanding of the spatiotemporal features of EEG signals.
基金New Brunswick Innovation Foundation(NBIF)for the financial support of the global project.
文摘The demands on conventional communication networks are increasing rapidly because of the exponential expansion of connected multimedia content.In light of the data-centric aspect of contemporary communication,the information-centric network(ICN)paradigm offers hope for a solution by emphasizing content retrieval by name instead of location.If 5G networks are to meet the expected data demand surge from expanded connectivity and Internet of Things(IoT)devices,then effective caching solutions will be required tomaximize network throughput andminimize the use of resources.Hence,an ICN-based Cooperative Caching(ICN-CoC)technique has been used to select a cache by considering cache position,content attractiveness,and rate prediction.The findings show that utilizing our suggested approach improves caching regarding the Cache Hit Ratio(CHR)of 84.3%,Average Hop Minimization Ratio(AHMR)of 89.5%,and Mean Access Latency(MAL)of 0.4 s.Within a framework,it suggests improved caching strategies to handle the difficulty of effectively controlling data consumption in 5G networks.These improvements aim to make the network run more smoothly by enhancing content delivery,decreasing latency,and relieving congestion.By improving 5G communication systems’capacity tomanage the demands faced by modern data-centric applications,the research ultimately aids in advancement.
文摘Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes.