To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Con...To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.展开更多
Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the ...Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the prototype of each cluster. By integrating feature weights, a formula for weight calculation is introduced to the clustering algorithm. The selection of weight exponent is crucial for good result and the weights are updated iteratively with each partition of clusters. The convergence of the weighted algorithms is given, and the feasible cluster validity indices of data mining application are utilized. Experimental results on both synthetic and real-life numerical data with different feature weights demonstrate that the weighted algorithm is better than the other unweighted algorithms.展开更多
A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive...A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive to initializations and often generates coincident clusters. AFCM overcomes this shortcoming and it is an ex tension of PCM. Membership and typicality values can be simultaneously produced in AFCM. Experimental re- suits show that noise data can be well processed, coincident clusters are avoided and clustering accuracy is better.展开更多
Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have ...Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.展开更多
Fuzzy C-Means(FCM)is an effective and widely used clustering algorithm,but there are still some problems.considering the number of clusters must be determined manually,the local optimal solutions is easily influenced ...Fuzzy C-Means(FCM)is an effective and widely used clustering algorithm,but there are still some problems.considering the number of clusters must be determined manually,the local optimal solutions is easily influenced by the random selection of initial cluster centers,and the performance of Euclid distance in complex high-dimensional data is poor.To solve the above problems,the improved FCM clustering algorithm based on density Canopy and Manifold learning(DM-FCM)is proposed.First,a density Canopy algorithm based on improved local density is proposed to automatically deter-mine the number of clusters and initial cluster centers,which improves the self-adaptability and stability of the algorithm.Then,considering that high-dimensional data often present a nonlinear structure,the manifold learning method is applied to construct a manifold spatial structure,which preserves the global geometric properties of complex high-dimensional data and improves the clustering effect of the algorithm on complex high-dimensional datasets.Fowlkes-Mallows Index(FMI),the weighted average of homogeneity and completeness(V-measure),Adjusted Mutual Information(AMI),and Adjusted Rand Index(ARI)are used as performance measures of clustering algorithms.The experimental results show that the manifold learning method is the superior distance measure,and the algorithm improves the clustering accuracy and performs superiorly in the clustering of low-dimensional and complex high-dimensional data.展开更多
Studying user electricity consumption behavior is crucial for understanding their power usage patterns.However,the traditional clustering methods fail to identify emerging types of electricity consumption behavior.To ...Studying user electricity consumption behavior is crucial for understanding their power usage patterns.However,the traditional clustering methods fail to identify emerging types of electricity consumption behavior.To address this issue,this paper introduces a statistical analysis of clusters and evaluates the set of indicators for power usage patterns.The fuzzy C-means clustering algorithm is then used to analyze 6 months of electricity consumption data in 2017 from energy storage equipment,agricultural drainage irrigation,port shore power,and electric vehicles.Finally,the proposed method is validated through experiments,where the Davies-Bouldin index and profile coefficient are calculated and compared.Experiments showed that the optimal number of clusters is 4.This study demonstrates the potential of using a fuzzy C-means clustering algorithmin identifying emerging types of electricity consumption behavior,which can help power system operators and policymakers to make informed decisions and improve energy efficiency.展开更多
Based on Multi-Masking Empirical Mode Decomposition (MMEMD) and fuzzy c-means (FCM) clustering, a new method of wind turbine bearing fault diagnosis FCM-MMEMD is proposed, which can determine the fault accurately and ...Based on Multi-Masking Empirical Mode Decomposition (MMEMD) and fuzzy c-means (FCM) clustering, a new method of wind turbine bearing fault diagnosis FCM-MMEMD is proposed, which can determine the fault accurately and timely. First, FCM clustering is employed to classify the data into different clusters, which helps to estimate whether there is a fault and how many fault types there are. If fault signals exist, the fault vibration signals are then demodulated and decomposed into different frequency bands by MMEMD in order to be analyzed further. In order to overcome the mode mixing defect of empirical mode decomposition (EMD), a novel method called MMEMD is proposed. It is an improvement to masking empirical mode decomposition (MEMD). By adding multi-masking signals to the signals to be decomposed in different levels, it can restrain low-frequency components from mixing in highfrequency components effectively in the sifting process and then suppress the mode mixing. It has the advantages of easy implementation and strong ability of suppressing modal mixing. The fault type is determined by Hilbert envelope finally. The results of simulation signal decomposition showed the high performance of MMEMD. Experiments of bearing fault diagnosis in wind turbine bearing fault diagnosis proved the validity and high accuracy of the new method.展开更多
To solve the problem of poor anti-noise performance of the traditional fuzzy C-means (FCM) algorithm in image segmentation, a novel two-dimensional FCM clustering algorithm for image segmentation was proposed. In this...To solve the problem of poor anti-noise performance of the traditional fuzzy C-means (FCM) algorithm in image segmentation, a novel two-dimensional FCM clustering algorithm for image segmentation was proposed. In this method, the image segmentation was converted into an optimization problem. The fitness function containing neighbor information was set up based on the gray information and the neighbor relations between the pixels described by the improved two-dimensional histogram. By making use of the global searching ability of the predator-prey particle swarm optimization, the optimal cluster center could be obtained by iterative optimization, and the image segmentation could be accomplished. The simulation results show that the segmentation accuracy ratio of the proposed method is above 99%. The proposed algorithm has strong anti-noise capability, high clustering accuracy and good segment effect, indicating that it is an effective algorithm for image segmentation.展开更多
Suppressed fuzzy c-means (S-FCM) clustering algorithm with the intention of combining the higher speed of hard c-means clustering algorithm and the better classification performance of fuzzy c-means clustering algorit...Suppressed fuzzy c-means (S-FCM) clustering algorithm with the intention of combining the higher speed of hard c-means clustering algorithm and the better classification performance of fuzzy c-means clustering algorithm had been studied by many researchers and applied in many fields. In the algorithm, how to select the suppressed rate is a key step. In this paper, we give a method to select the fixed suppressed rate by the structure of the data itself. The experimental results show that the proposed method is a suitable way to select the suppressed rate in suppressed fuzzy c-means clustering algorithm.展开更多
Determining the relatively similar hydrological properties of the watersheds is very crucial in order to readily classify them for management practices such as flood and soil erosion control. This study aimed to ident...Determining the relatively similar hydrological properties of the watersheds is very crucial in order to readily classify them for management practices such as flood and soil erosion control. This study aimed to identify homogeneous hydrological watersheds using remote sensing data in western Iran. To achieve this goal, remote sensing indices including SAVI, LAI, NDMI, NDVI and snow cover, were extracted from MODIS data over the period 2000 to 2015. Then, a fuzzy method was used to clustering the watersheds based on the extracted indices. A fuzzy c-mean(FCM) algorithm enabled to classify 38 watersheds in three homogeneous groups.The optimal number of clusters was determined through evaluation of partition coefficient, partition entropy function and trial and error. The results indicated three homogeneous regions identified by the fuzzy c-mean clustering and remote sensing product which are consistent with the variations of topography and climate of the study area. Inherently,the grouped watersheds have similar hydrological properties and are likely to need similar management considerations and measures.展开更多
Diabetic Retinopathy(DR)is a vision disease due to the long-term prevalenceof Diabetes Mellitus.It affects the retina of the eye and causes severedamage to the vision.If not treated on time it may lead to permanent vi...Diabetic Retinopathy(DR)is a vision disease due to the long-term prevalenceof Diabetes Mellitus.It affects the retina of the eye and causes severedamage to the vision.If not treated on time it may lead to permanent vision lossin diabetic patients.Today’s development in science has no medication to cureDiabetic Retinopathy.However,if diagnosed at an early stage it can be controlledand permanent vision loss can be avoided.Compared to the diabetic population,experts to diagnose Diabetic Retinopathy are very less in particular to local areas.Hence an automatic computer-aided diagnosis for DR detection is necessary.Inthis paper,we propose an unsupervised clustering technique to automatically clusterthe DR into one of its five development stages.The deep learning based unsupervisedclustering is made to improve itself with the help of fuzzy rough c-meansclustering where cluster centers are updated by fuzzy rough c-means clusteringalgorithm during the forward pass and the deep learning model representationsare updated by Stochastic Gradient Descent during the backward pass of training.The proposed method was implemented using python and the results were takenon DGX server with Tesla V100 GPU cards.An experimental result on the publicallyavailable Kaggle dataset shows an overall accuracy of 88.7%.The proposedmodel improves the accuracy of DR diagnosis compared to the existingunsupervised algorithms like k-means,FCM,auto-encoder,and FRCM withalexnet.展开更多
Minimally Invasive Spine surgery (MISS) was developed to treat disorders of the spine with less disruption to the muscles. Surgeons use CT images to monitor the volume of muscles after operation in order to evaluate t...Minimally Invasive Spine surgery (MISS) was developed to treat disorders of the spine with less disruption to the muscles. Surgeons use CT images to monitor the volume of muscles after operation in order to evaluate the progress of patient recovery. The first step in the task is to segment the muscle regions from other tissues/organs in CT images. However, manual segmentation of muscle regions is not only inaccurate, but also time consuming. In this work, Gray Space Map (GSM) is used in fuzzy c-means clustering algorithm to segment muscle regions in CT images. GSM com- bines both spatial and intensity information of pixels. Experiments show that the proposed GSM- based fuzzy c-means clustering muscle CT image segmentation yields very good results.展开更多
A novel example-based process for Automated Colorization of grayscale images using Texture Descriptors (ACTD) without any human intervention is proposed. By analyzing a set of sample color images, coherent regions of ...A novel example-based process for Automated Colorization of grayscale images using Texture Descriptors (ACTD) without any human intervention is proposed. By analyzing a set of sample color images, coherent regions of homogeneous textures are extracted. A multi-channel filtering technique is used for texture-based image segmentation, combined with a modified Fuzzy C-means (FCM) clustering algorithm. This modified FCM clustering algorithm includes both the local spatial information from neighboring pixels, and the spatial Euclidian distance to the cluster’s center of gravity. For each area of interest, state-of-the-art texture descriptors are then computed and stored, along with corresponding color information. These texture descriptors and the color information are used for colorization of a grayscale image with similar textures. Given a grayscale image to be colorized, the segmentation and feature extraction processes are repeated. The texture descriptors are used to perform Content-Based Image Retrieval (CBIR). The colorization process is performed by Chroma replacement. This research finds numerous applications, ranging from classic film restoration and enhancement, to adding valuable information into medical and satellite imaging. Also, this can be used to enhance the detection of objects from x-ray images at the airports.展开更多
Classifying the data into a meaningful group is one of the fundamental ways of understanding and learning the valuable information. High-quality clustering methods are necessary for the valuable and efficient analysis...Classifying the data into a meaningful group is one of the fundamental ways of understanding and learning the valuable information. High-quality clustering methods are necessary for the valuable and efficient analysis of the increasing data. The Firefly Algorithm (FA) is one of the bio-inspired algorithms and it is recently used to solve the clustering problems. In this paper, Hybrid F-Firefly algorithm is developed by combining the Fuzzy C-Means (FCM) with FA to improve the clustering accuracy with global optimum solution. The Hybrid F-Firefly algorithm is developed by incorporating FCM operator at the end of each iteration in FA algorithm. This proposed algorithm is designed to utilize the goodness of existing algorithm and to enhance the original FA algorithm by solving the shortcomings in the FCM algorithm like the trapping in local optima and sensitive to initial seed points. In this research work, the Hybrid F-Firefly algorithm is implemented and experimentally tested for various performance measures under six different benchmark datasets. From the experimental results, it is observed that the Hybrid F-Firefly algorithm significantly improves the intra-cluster distance when compared with the existing algorithms like K-means, FCM and FA algorithm.展开更多
The premise and basis of load modeling are substation load composition inquiries and cluster analyses.However,the traditional kernel fuzzy C-means(KFCM)algorithm is limited by artificial clustering number selection an...The premise and basis of load modeling are substation load composition inquiries and cluster analyses.However,the traditional kernel fuzzy C-means(KFCM)algorithm is limited by artificial clustering number selection and its convergence to local optimal solutions.To overcome these limitations,an improved KFCM algorithm with adaptive optimal clustering number selection is proposed in this paper.This algorithm optimizes the KFCM algorithm by combining the powerful global search ability of genetic algorithm and the robust local search ability of simulated annealing algorithm.The improved KFCM algorithm adaptively determines the ideal number of clusters using the clustering evaluation index ratio.Compared with the traditional KFCM algorithm,the enhanced KFCM algorithm has robust clustering and comprehensive abilities,enabling the efficient convergence to the global optimal solution.展开更多
An advanced fuzzy C-mean (FCM) algorithm was proposed for the efficient regional clustering of multi-nodes interconnected systems. Due to various locational prices and regional coherencies for each node and point, m...An advanced fuzzy C-mean (FCM) algorithm was proposed for the efficient regional clustering of multi-nodes interconnected systems. Due to various locational prices and regional coherencies for each node and point, modified similarity measure was considered to gather nodes having similar characteristics. The similarity measure was needed to contain locafi0nal prices as well as regional coherency. In order to consider the two properties simultaneously, distance measure of fuzzy C-mean algorithm had to be modified. Regional clustering algorithm for interconnected power systems was designed based on the modified fuzzy C-mean algorithm. The proposed algorithm produces proper classification for the interconnected power system and the results are demonstrated in the example of IEEE 39-bus interconnected electricity system.展开更多
One of the earliest indications of diabetes consequence is Diabetic Retinopathy(DR),the main contributor to blindness worldwide.Recent studies have proposed that Exudates(EXs)are the hallmark of DR severity.The presen...One of the earliest indications of diabetes consequence is Diabetic Retinopathy(DR),the main contributor to blindness worldwide.Recent studies have proposed that Exudates(EXs)are the hallmark of DR severity.The present study aims to accurately and automatically detect EXs that are difficult to detect in retinal images in the early stages.An improved Fusion of Histogram-Based Fuzzy C-Means Clustering(FHBFCM)by a New Weight Assignment Scheme(NWAS)and a set of four selected features from stages of pre-processing to evolve the detection method is proposed.The features of DR train the optimal parameter of FHBFCM for detecting EXs diseases through a stepwise enhancement method through the coarse segmentation stage.The histogram-based is applied to find the color intensity in each pixel and performed to accomplish Red,Green,and Blue(RGB)color information.This RGB color information is used as the initial cluster centers for creating the appropriate region and generating the homogeneous regions by Fuzzy C-Means(FCM).Afterward,the best expression of NWAS is used for the delicate detection stage.According to the experiment results,the proposed method successfully detects EXs on the retinal image datasets of DiaretDB0(Standard Diabetic Retinopathy Database Calibration level 0),DiaretDB1(Standard Diabetic Retinopathy Database Calibration level 1),and STARE(Structured Analysis of the Retina)with accuracy values of 96.12%,97.20%,and 93.22%,respectively.As a result,this study proposes a new approach for the early detection of EXs with competitive accuracy and the ability to outperform existing methods by improving the detection quality and perhaps significantly reducing the segmentation of false positives.展开更多
Clustering is an unsupervised learning method used to organize raw data in such a way that those with the same (similar) characteristics are found in the same class and those that are dissimilar are found in different...Clustering is an unsupervised learning method used to organize raw data in such a way that those with the same (similar) characteristics are found in the same class and those that are dissimilar are found in different classes. In this day and age, the very rapid increase in the amount of data being produced brings new challenges in the analysis and storage of this data. Recently, there is a growing interest in key areas such as real-time data mining, which reveal an urgent need to process very large data under strict performance constraints. The objective of this paper is to survey four algorithms including K-Means algorithm, FCM algorithm, EM algorithm and BIRCH, used for data clustering and then show their strengths and weaknesses. Another task is to compare the results obtained by applying each of these algorithms to the same data and to give a conclusion based on these results.展开更多
This paper presents a fuzzy C- means clustering image segmentation algorithm based on particle swarm optimization, the method utilizes the strong search ability of particle swarm clustering search center. Because the ...This paper presents a fuzzy C- means clustering image segmentation algorithm based on particle swarm optimization, the method utilizes the strong search ability of particle swarm clustering search center. Because the search clustering center has small amount of calculation according to density, so it can greatly improve the calculation speed of fuzzy C- means algorithm. The experimental results show that, this method can make the fuzzy clustering to obviously improve the speed, so it can achieve fast image segmentation.展开更多
Sequence analysis technology under big data provides unprecedented opportunities for modern life science. A novel gene coding sequence identification method is proposed in this paper. Firstly, an improved short-time F...Sequence analysis technology under big data provides unprecedented opportunities for modern life science. A novel gene coding sequence identification method is proposed in this paper. Firstly, an improved short-time Fourier transform algorithm based on Morlet wavelet is applied to extract the power spectrum of DNA sequence. Then, threshold value determination method based on kernel fuzzy C-mean clustering is used to combine Signal to Noise Ratio (SNR) data of exon and intron into a sequence, classify the sequence into two types, calculate the weighted sum of two SNR clustering centers obtained and the discrimination threshold value. Finally, exon interval endpoint identification algorithm based on Takagi-Sugeno fuzzy identification model is presented to train Takagi-Sugeno model, optimize model parameters with Levenberg-Marquardt least square method, complete model and determine fuzzy rule. To verify the effectiveness of the proposed method, example tests are conducted on typical gene sequence sample data.展开更多
基金The National Natural Science Foundation of China(No60672056)Open Fund of MOE-MS Key Laboratory of Multime-dia Computing and Communication(No06120809)
文摘To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.
基金Supported by the National Natural Science Foundation of China(61139002)~~
文摘Partition-based clustering with weighted feature is developed in the framework of shadowed sets. The objects in the core and boundary regions, generated by shadowed sets-based clustering, have different impact on the prototype of each cluster. By integrating feature weights, a formula for weight calculation is introduced to the clustering algorithm. The selection of weight exponent is crucial for good result and the weights are updated iteratively with each partition of clusters. The convergence of the weighted algorithms is given, and the feasible cluster validity indices of data mining application are utilized. Experimental results on both synthetic and real-life numerical data with different feature weights demonstrate that the weighted algorithm is better than the other unweighted algorithms.
文摘A novel model of fuzzy clustering, i.e. an allied fuzzy c means (AFCM) model is proposed based on the combination of advantages of fuzzy c means (FCM) and possibilistic c means (PCM) clustering. PCM is sensitive to initializations and often generates coincident clusters. AFCM overcomes this shortcoming and it is an ex tension of PCM. Membership and typicality values can be simultaneously produced in AFCM. Experimental re- suits show that noise data can be well processed, coincident clusters are avoided and clustering accuracy is better.
基金supported by the National Key Research and Development Program of China(No.2022YFB3304400)the National Natural Science Foundation of China(Nos.6230311,62303111,62076060,61932007,and 62176083)the Key Research and Development Program of Jiangsu Province of China(No.BE2022157).
文摘Traditional Fuzzy C-Means(FCM)and Possibilistic C-Means(PCM)clustering algorithms are data-driven,and their objective function minimization process is based on the available numeric data.Recently,knowledge hints have been introduced to formknowledge-driven clustering algorithms,which reveal a data structure that considers not only the relationships between data but also the compatibility with knowledge hints.However,these algorithms cannot produce the optimal number of clusters by the clustering algorithm itself;they require the assistance of evaluation indices.Moreover,knowledge hints are usually used as part of the data structure(directly replacing some clustering centers),which severely limits the flexibility of the algorithm and can lead to knowledgemisguidance.To solve this problem,this study designs a newknowledge-driven clustering algorithmcalled the PCM clusteringwith High-density Points(HP-PCM),in which domain knowledge is represented in the form of so-called high-density points.First,a newdatadensitycalculation function is proposed.The Density Knowledge Points Extraction(DKPE)method is established to filter out high-density points from the dataset to form knowledge hints.Then,these hints are incorporated into the PCM objective function so that the clustering algorithm is guided by high-density points to discover the natural data structure.Finally,the initial number of clusters is set to be greater than the true one based on the number of knowledge hints.Then,the HP-PCM algorithm automatically determines the final number of clusters during the clustering process by considering the cluster elimination mechanism.Through experimental studies,including some comparative analyses,the results highlight the effectiveness of the proposed algorithm,such as the increased success rate in clustering,the ability to determine the optimal cluster number,and the faster convergence speed.
基金The National Natural Science Foundation of China(No.62262011)the Natural Science Foundation of Guangxi(No.2021JJA170130).
文摘Fuzzy C-Means(FCM)is an effective and widely used clustering algorithm,but there are still some problems.considering the number of clusters must be determined manually,the local optimal solutions is easily influenced by the random selection of initial cluster centers,and the performance of Euclid distance in complex high-dimensional data is poor.To solve the above problems,the improved FCM clustering algorithm based on density Canopy and Manifold learning(DM-FCM)is proposed.First,a density Canopy algorithm based on improved local density is proposed to automatically deter-mine the number of clusters and initial cluster centers,which improves the self-adaptability and stability of the algorithm.Then,considering that high-dimensional data often present a nonlinear structure,the manifold learning method is applied to construct a manifold spatial structure,which preserves the global geometric properties of complex high-dimensional data and improves the clustering effect of the algorithm on complex high-dimensional datasets.Fowlkes-Mallows Index(FMI),the weighted average of homogeneity and completeness(V-measure),Adjusted Mutual Information(AMI),and Adjusted Rand Index(ARI)are used as performance measures of clustering algorithms.The experimental results show that the manifold learning method is the superior distance measure,and the algorithm improves the clustering accuracy and performs superiorly in the clustering of low-dimensional and complex high-dimensional data.
基金supported by the Science and Technology Project of State Grid Jiangxi Electric Power Corporation Limited‘Research on Key Technologies for Non-Intrusive Load Identification for Typical Power Industry Users in Jiangxi Province’(521852220004)。
文摘Studying user electricity consumption behavior is crucial for understanding their power usage patterns.However,the traditional clustering methods fail to identify emerging types of electricity consumption behavior.To address this issue,this paper introduces a statistical analysis of clusters and evaluates the set of indicators for power usage patterns.The fuzzy C-means clustering algorithm is then used to analyze 6 months of electricity consumption data in 2017 from energy storage equipment,agricultural drainage irrigation,port shore power,and electric vehicles.Finally,the proposed method is validated through experiments,where the Davies-Bouldin index and profile coefficient are calculated and compared.Experiments showed that the optimal number of clusters is 4.This study demonstrates the potential of using a fuzzy C-means clustering algorithmin identifying emerging types of electricity consumption behavior,which can help power system operators and policymakers to make informed decisions and improve energy efficiency.
基金Supported by National Key R&D Projects(Grant No.2018YFB0905500)National Natural Science Foundation of China(Grant No.51875498)+1 种基金Hebei Provincial Natural Science Foundation of China(Grant Nos.E2018203439,E2018203339,F2016203496)Key Scientific Research Projects Plan of Henan Higher Education Institutions(Grant No.19B460001)
文摘Based on Multi-Masking Empirical Mode Decomposition (MMEMD) and fuzzy c-means (FCM) clustering, a new method of wind turbine bearing fault diagnosis FCM-MMEMD is proposed, which can determine the fault accurately and timely. First, FCM clustering is employed to classify the data into different clusters, which helps to estimate whether there is a fault and how many fault types there are. If fault signals exist, the fault vibration signals are then demodulated and decomposed into different frequency bands by MMEMD in order to be analyzed further. In order to overcome the mode mixing defect of empirical mode decomposition (EMD), a novel method called MMEMD is proposed. It is an improvement to masking empirical mode decomposition (MEMD). By adding multi-masking signals to the signals to be decomposed in different levels, it can restrain low-frequency components from mixing in highfrequency components effectively in the sifting process and then suppress the mode mixing. It has the advantages of easy implementation and strong ability of suppressing modal mixing. The fault type is determined by Hilbert envelope finally. The results of simulation signal decomposition showed the high performance of MMEMD. Experiments of bearing fault diagnosis in wind turbine bearing fault diagnosis proved the validity and high accuracy of the new method.
基金Project(06JJ50110) supported by the Natural Science Foundation of Hunan Province, China
文摘To solve the problem of poor anti-noise performance of the traditional fuzzy C-means (FCM) algorithm in image segmentation, a novel two-dimensional FCM clustering algorithm for image segmentation was proposed. In this method, the image segmentation was converted into an optimization problem. The fitness function containing neighbor information was set up based on the gray information and the neighbor relations between the pixels described by the improved two-dimensional histogram. By making use of the global searching ability of the predator-prey particle swarm optimization, the optimal cluster center could be obtained by iterative optimization, and the image segmentation could be accomplished. The simulation results show that the segmentation accuracy ratio of the proposed method is above 99%. The proposed algorithm has strong anti-noise capability, high clustering accuracy and good segment effect, indicating that it is an effective algorithm for image segmentation.
文摘Suppressed fuzzy c-means (S-FCM) clustering algorithm with the intention of combining the higher speed of hard c-means clustering algorithm and the better classification performance of fuzzy c-means clustering algorithm had been studied by many researchers and applied in many fields. In the algorithm, how to select the suppressed rate is a key step. In this paper, we give a method to select the fixed suppressed rate by the structure of the data itself. The experimental results show that the proposed method is a suitable way to select the suppressed rate in suppressed fuzzy c-means clustering algorithm.
文摘Determining the relatively similar hydrological properties of the watersheds is very crucial in order to readily classify them for management practices such as flood and soil erosion control. This study aimed to identify homogeneous hydrological watersheds using remote sensing data in western Iran. To achieve this goal, remote sensing indices including SAVI, LAI, NDMI, NDVI and snow cover, were extracted from MODIS data over the period 2000 to 2015. Then, a fuzzy method was used to clustering the watersheds based on the extracted indices. A fuzzy c-mean(FCM) algorithm enabled to classify 38 watersheds in three homogeneous groups.The optimal number of clusters was determined through evaluation of partition coefficient, partition entropy function and trial and error. The results indicated three homogeneous regions identified by the fuzzy c-mean clustering and remote sensing product which are consistent with the variations of topography and climate of the study area. Inherently,the grouped watersheds have similar hydrological properties and are likely to need similar management considerations and measures.
文摘Diabetic Retinopathy(DR)is a vision disease due to the long-term prevalenceof Diabetes Mellitus.It affects the retina of the eye and causes severedamage to the vision.If not treated on time it may lead to permanent vision lossin diabetic patients.Today’s development in science has no medication to cureDiabetic Retinopathy.However,if diagnosed at an early stage it can be controlledand permanent vision loss can be avoided.Compared to the diabetic population,experts to diagnose Diabetic Retinopathy are very less in particular to local areas.Hence an automatic computer-aided diagnosis for DR detection is necessary.Inthis paper,we propose an unsupervised clustering technique to automatically clusterthe DR into one of its five development stages.The deep learning based unsupervisedclustering is made to improve itself with the help of fuzzy rough c-meansclustering where cluster centers are updated by fuzzy rough c-means clusteringalgorithm during the forward pass and the deep learning model representationsare updated by Stochastic Gradient Descent during the backward pass of training.The proposed method was implemented using python and the results were takenon DGX server with Tesla V100 GPU cards.An experimental result on the publicallyavailable Kaggle dataset shows an overall accuracy of 88.7%.The proposedmodel improves the accuracy of DR diagnosis compared to the existingunsupervised algorithms like k-means,FCM,auto-encoder,and FRCM withalexnet.
文摘Minimally Invasive Spine surgery (MISS) was developed to treat disorders of the spine with less disruption to the muscles. Surgeons use CT images to monitor the volume of muscles after operation in order to evaluate the progress of patient recovery. The first step in the task is to segment the muscle regions from other tissues/organs in CT images. However, manual segmentation of muscle regions is not only inaccurate, but also time consuming. In this work, Gray Space Map (GSM) is used in fuzzy c-means clustering algorithm to segment muscle regions in CT images. GSM com- bines both spatial and intensity information of pixels. Experiments show that the proposed GSM- based fuzzy c-means clustering muscle CT image segmentation yields very good results.
文摘A novel example-based process for Automated Colorization of grayscale images using Texture Descriptors (ACTD) without any human intervention is proposed. By analyzing a set of sample color images, coherent regions of homogeneous textures are extracted. A multi-channel filtering technique is used for texture-based image segmentation, combined with a modified Fuzzy C-means (FCM) clustering algorithm. This modified FCM clustering algorithm includes both the local spatial information from neighboring pixels, and the spatial Euclidian distance to the cluster’s center of gravity. For each area of interest, state-of-the-art texture descriptors are then computed and stored, along with corresponding color information. These texture descriptors and the color information are used for colorization of a grayscale image with similar textures. Given a grayscale image to be colorized, the segmentation and feature extraction processes are repeated. The texture descriptors are used to perform Content-Based Image Retrieval (CBIR). The colorization process is performed by Chroma replacement. This research finds numerous applications, ranging from classic film restoration and enhancement, to adding valuable information into medical and satellite imaging. Also, this can be used to enhance the detection of objects from x-ray images at the airports.
文摘Classifying the data into a meaningful group is one of the fundamental ways of understanding and learning the valuable information. High-quality clustering methods are necessary for the valuable and efficient analysis of the increasing data. The Firefly Algorithm (FA) is one of the bio-inspired algorithms and it is recently used to solve the clustering problems. In this paper, Hybrid F-Firefly algorithm is developed by combining the Fuzzy C-Means (FCM) with FA to improve the clustering accuracy with global optimum solution. The Hybrid F-Firefly algorithm is developed by incorporating FCM operator at the end of each iteration in FA algorithm. This proposed algorithm is designed to utilize the goodness of existing algorithm and to enhance the original FA algorithm by solving the shortcomings in the FCM algorithm like the trapping in local optima and sensitive to initial seed points. In this research work, the Hybrid F-Firefly algorithm is implemented and experimentally tested for various performance measures under six different benchmark datasets. From the experimental results, it is observed that the Hybrid F-Firefly algorithm significantly improves the intra-cluster distance when compared with the existing algorithms like K-means, FCM and FA algorithm.
基金supported by the Planning Special Project of Guangdong Power Grid Co.,Ltd.:“Study on load modeling based on total measurement and discrimination method suitable for system characteristic analysis and calculation during the implementation of target grid in Guangdong power grid”(0319002022030203JF00023).
文摘The premise and basis of load modeling are substation load composition inquiries and cluster analyses.However,the traditional kernel fuzzy C-means(KFCM)algorithm is limited by artificial clustering number selection and its convergence to local optimal solutions.To overcome these limitations,an improved KFCM algorithm with adaptive optimal clustering number selection is proposed in this paper.This algorithm optimizes the KFCM algorithm by combining the powerful global search ability of genetic algorithm and the robust local search ability of simulated annealing algorithm.The improved KFCM algorithm adaptively determines the ideal number of clusters using the clustering evaluation index ratio.Compared with the traditional KFCM algorithm,the enhanced KFCM algorithm has robust clustering and comprehensive abilities,enabling the efficient convergence to the global optimal solution.
基金Work supported by the Second Stage of Brain Korea 21 ProjectsWork(2010-0020163) supported by Priority Research Centers Program through the National Research Foundation (NRF) funded by the Ministry of Education,Science and Technology of Korea
文摘An advanced fuzzy C-mean (FCM) algorithm was proposed for the efficient regional clustering of multi-nodes interconnected systems. Due to various locational prices and regional coherencies for each node and point, modified similarity measure was considered to gather nodes having similar characteristics. The similarity measure was needed to contain locafi0nal prices as well as regional coherency. In order to consider the two properties simultaneously, distance measure of fuzzy C-mean algorithm had to be modified. Regional clustering algorithm for interconnected power systems was designed based on the modified fuzzy C-mean algorithm. The proposed algorithm produces proper classification for the interconnected power system and the results are demonstrated in the example of IEEE 39-bus interconnected electricity system.
基金This research project was financially supported by Mahasarakham University,Thailand.
文摘One of the earliest indications of diabetes consequence is Diabetic Retinopathy(DR),the main contributor to blindness worldwide.Recent studies have proposed that Exudates(EXs)are the hallmark of DR severity.The present study aims to accurately and automatically detect EXs that are difficult to detect in retinal images in the early stages.An improved Fusion of Histogram-Based Fuzzy C-Means Clustering(FHBFCM)by a New Weight Assignment Scheme(NWAS)and a set of four selected features from stages of pre-processing to evolve the detection method is proposed.The features of DR train the optimal parameter of FHBFCM for detecting EXs diseases through a stepwise enhancement method through the coarse segmentation stage.The histogram-based is applied to find the color intensity in each pixel and performed to accomplish Red,Green,and Blue(RGB)color information.This RGB color information is used as the initial cluster centers for creating the appropriate region and generating the homogeneous regions by Fuzzy C-Means(FCM).Afterward,the best expression of NWAS is used for the delicate detection stage.According to the experiment results,the proposed method successfully detects EXs on the retinal image datasets of DiaretDB0(Standard Diabetic Retinopathy Database Calibration level 0),DiaretDB1(Standard Diabetic Retinopathy Database Calibration level 1),and STARE(Structured Analysis of the Retina)with accuracy values of 96.12%,97.20%,and 93.22%,respectively.As a result,this study proposes a new approach for the early detection of EXs with competitive accuracy and the ability to outperform existing methods by improving the detection quality and perhaps significantly reducing the segmentation of false positives.
文摘Clustering is an unsupervised learning method used to organize raw data in such a way that those with the same (similar) characteristics are found in the same class and those that are dissimilar are found in different classes. In this day and age, the very rapid increase in the amount of data being produced brings new challenges in the analysis and storage of this data. Recently, there is a growing interest in key areas such as real-time data mining, which reveal an urgent need to process very large data under strict performance constraints. The objective of this paper is to survey four algorithms including K-Means algorithm, FCM algorithm, EM algorithm and BIRCH, used for data clustering and then show their strengths and weaknesses. Another task is to compare the results obtained by applying each of these algorithms to the same data and to give a conclusion based on these results.
文摘This paper presents a fuzzy C- means clustering image segmentation algorithm based on particle swarm optimization, the method utilizes the strong search ability of particle swarm clustering search center. Because the search clustering center has small amount of calculation according to density, so it can greatly improve the calculation speed of fuzzy C- means algorithm. The experimental results show that, this method can make the fuzzy clustering to obviously improve the speed, so it can achieve fast image segmentation.
文摘Sequence analysis technology under big data provides unprecedented opportunities for modern life science. A novel gene coding sequence identification method is proposed in this paper. Firstly, an improved short-time Fourier transform algorithm based on Morlet wavelet is applied to extract the power spectrum of DNA sequence. Then, threshold value determination method based on kernel fuzzy C-mean clustering is used to combine Signal to Noise Ratio (SNR) data of exon and intron into a sequence, classify the sequence into two types, calculate the weighted sum of two SNR clustering centers obtained and the discrimination threshold value. Finally, exon interval endpoint identification algorithm based on Takagi-Sugeno fuzzy identification model is presented to train Takagi-Sugeno model, optimize model parameters with Levenberg-Marquardt least square method, complete model and determine fuzzy rule. To verify the effectiveness of the proposed method, example tests are conducted on typical gene sequence sample data.