A leukocyte image fast scanning method based on max min distance clustering is proposed.Because of the lower proportion and uneven distribution of leukocytes in human peripheral blood,there will not be any leukocyte i...A leukocyte image fast scanning method based on max min distance clustering is proposed.Because of the lower proportion and uneven distribution of leukocytes in human peripheral blood,there will not be any leukocyte in lager quantity of the captured images if we directly scan the blood smear along an ordinary zigzag scanning routine with high power(100^(x))objective.Due to the larger field of view of low power(10^(x))objective,the captured low power blood smear images can be used to locate leukocytes.All of the located positions make up a specific routine,if we scan the blood smear along this routine with high power objective,there will be definitely leukocytes in almost all of the captured images.Considering the number of captured images is still large and some leukocytes may be redundantly captured twice or more,a leukocyte clustering method based on max-min distance clustering is developed to reduce the total number of captured images as well as the number of redundantly captured leukocytes.This method can improve the scanning eficiency obviously.The experimental results show that the proposed method can shorten scanning time from 8.0-14.0min to 2.54.0 min while extracting 110 nonredundant individual high power leukocyte images.展开更多
A fault diagnosis model is proposed based on fuzzy support vector machine (FSVM) combined with fuzzy clustering (FC).Considering the relationship between the sample point and non-self class,FC algorithm is applied to ...A fault diagnosis model is proposed based on fuzzy support vector machine (FSVM) combined with fuzzy clustering (FC).Considering the relationship between the sample point and non-self class,FC algorithm is applied to generate fuzzy memberships.In the algorithm,sample weights based on a distribution density function of data point and genetic algorithm (GA) are introduced to enhance the performance of FC.Then a multi-class FSVM with radial basis function kernel is established according to directed acyclic graph algorithm,the penalty factor and kernel parameter of which are optimized by GA.Finally,the model is executed for multi-class fault diagnosis of rolling element bearings.The results show that the presented model achieves high performances both in identifying fault types and fault degrees.The performance comparisons of the presented model with SVM and distance-based FSVM for noisy case demonstrate the capacity of dealing with noise and generalization.展开更多
Turbopump condition monitoring is a significant approach to ensure the safety of liquid rocket engine (LRE).Because of lack of fault samples,a monitoring system cannot be trained on all possible condition patterns.T...Turbopump condition monitoring is a significant approach to ensure the safety of liquid rocket engine (LRE).Because of lack of fault samples,a monitoring system cannot be trained on all possible condition patterns.Thus it is important to differentiate abnormal or unknown patterns from normal pattern with novelty detection methods.One-class support vector machine (OCSVM) that has been commonly used for novelty detection cannot deal well with large scale samples.In order to model the normal pattern of the turbopump with OCSVM and so as to monitor the condition of the turbopump,a monitoring method that integrates OCSVM with incremental clustering is presented.In this method,the incremental clustering is used for sample reduction by extracting representative vectors from a large training set.The representative vectors are supposed to distribute uniformly in the object region and fulfill the region.And training OCSVM on these representative vectors yields a novelty detector.By applying this method to the analysis of the turbopump's historical test data,it shows that the incremental clustering algorithm can extract 91 representative points from more than 36 000 training vectors,and the OCSVM detector trained on these 91 representative points can recognize spikes in vibration signals caused by different abnormal events such as vane shedding,rub-impact and sensor faults.This monitoring method does not need fault samples during training as classical recognition methods.The method resolves the learning problem of large samples and is an alternative method for condition monitoring of the LRE turbopump.展开更多
The traditional grey incidence degree is mainly based on the distance analysis methods, which is measured by the displacement difference between corresponding points between sequences. When some data of sequences are ...The traditional grey incidence degree is mainly based on the distance analysis methods, which is measured by the displacement difference between corresponding points between sequences. When some data of sequences are missing (inconsistency in the length of the sequences), the only way is to delete the longer sequences or to fill the shorter sequences. Therefore, some uncertainty is introduced. To solve this problem, by introducing three-dimensional grey incidence degree (3D-GID), a novel GID based on the multidimensional dynamic time warping distance (MDDTW distance-GID) is proposed. On the basis of it, the corresponding grey incidence clustering (MDDTW distance-GIC) method is constructed. It not only has the simpler computation process, but also can be applied to the incidence comparison between uncertain multidimensional sequences directly. The experiment shows that MDDTW distance-GIC is more accurate when dealing with the uncertain sequences. Compared with the traditional GIC method, the precision of the MDDTW distance-GIC method has increased nearly 30%.展开更多
It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (...It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically dusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision.tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.展开更多
Seismic waveform clustering is a useful technique for lithologic identification and reservoir characterization.The current seismic waveform clustering algorithms are predominantly based on a fixed time window,which is...Seismic waveform clustering is a useful technique for lithologic identification and reservoir characterization.The current seismic waveform clustering algorithms are predominantly based on a fixed time window,which is applicable for layers of stable thickness.When a layer exhibits variable thickness in the seismic response,a fixed time window cannot provide comprehensive geologic information for the target interval.Therefore,we propose a novel approach for a waveform clustering workfl ow based on a variable time window to enable broader applications.The dynamic time warping(DTW)distance is fi rst introduced to effectively measure the similarities between seismic waveforms with various lengths.We develop a DTW distance-based clustering algorithm to extract centroids,and we then determine the class of all seismic traces according to the DTW distances from centroids.To greatly reduce the computational complexity in seismic data application,we propose a superpixel-based seismic data thinning approach.We further propose an integrated workfl ow that can be applied to practical seismic data by incorporating the DTW distance-based clustering and seismic data thinning algorithms.We evaluated the performance by applying the proposed workfl ow to synthetic seismograms and seismic survey data.Compared with the the traditional waveform clustering method,the synthetic seismogram results demonstrate the enhanced capability of the proposed workfl ow to detect boundaries of diff erent lithologies or lithologic associations with variable thickness.Results from a practical application show that the planar map of seismic waveform clustering obtained by the proposed workfl ow correlates well with the geological characteristics of wells in terms of reservoir thickness.展开更多
A sustainable production of electricity is essential for low carbon green growth in South Korea. The generation of wind power as renewable energy has been rapidly growing around the world. Undoubtedly, wind energy is ...A sustainable production of electricity is essential for low carbon green growth in South Korea. The generation of wind power as renewable energy has been rapidly growing around the world. Undoubtedly, wind energy is unlimited in potential. However due to its own intermittency and volatility, there are difficulties in the effective harvesting of wind energy and the integration of wind power into the current electric power grid. To cope with this, many works have been done for wind speed and power forecasting. In this paper, an SVR (support vector regression) using FCM (Fuzzy C-Means) is proposed for wind speed forecasting. This paper describes the design of an FCM based SVR to increase the prediction accuracy. Proposed model was compared with ordinary SVR model using balanced and unbalanced test data. Also, multi-step ahead forecasting result was compared. Kernel parameters in SVR are adaptively determined in order to improve forecasting accuracy. An illustrative example is given by using real-world wind farm dataset. According to the experimental results, it is shown that the proposed method provides better forecasts of wind power.展开更多
Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for ...Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.展开更多
A new algorithm named kernel bisecting k-means and sample removal(KBK-SR) is proposed as sampling preprocessing for support vector machine(SVM) training to improve the efficiency.The proposed algorithm tends to quickl...A new algorithm named kernel bisecting k-means and sample removal(KBK-SR) is proposed as sampling preprocessing for support vector machine(SVM) training to improve the efficiency.The proposed algorithm tends to quickly produce balanced clusters of similar sizes in the kernel feature space,which makes it efficient and effective for reducing training samples.Theoretical analysis and experimental results on three UCI real data benchmarks both show that,with very short sampling time,the proposed algorithm dramatically accelerates SVM sampling and training while maintaining high test accuracy.展开更多
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical...Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.展开更多
In a vehicular ad hoc network(VANET),a massive quantity of data needs to be transmitted on a large scale in shorter time durations.At the same time,vehicles exhibit high velocity,leading to more vehicle disconnections...In a vehicular ad hoc network(VANET),a massive quantity of data needs to be transmitted on a large scale in shorter time durations.At the same time,vehicles exhibit high velocity,leading to more vehicle disconnections.Both of these characteristics result in unreliable data communication in VANET.A vehicle clustering algorithm clusters the vehicles in groups employed in VANET to enhance network scalability and connection reliability.Clustering is considered one of the possible solutions for attaining effectual interaction in VANETs.But one such difficulty was reducing the cluster number under increasing transmitting nodes.This article introduces an Evolutionary Hide Objects Game Optimization based Distance Aware Clustering(EHOGO-DAC)Scheme for VANET.The major intention of the EHOGO-DAC technique is to portion the VANET into distinct sets of clusters by grouping vehicles.In addition,the DHOGO-EAC technique is mainly based on the HOGO algorithm,which is stimulated by old games,and the searching agent tries to identify hidden objects in a given space.The DHOGO-EAC technique derives a fitness function for the clustering process,including the total number of clusters and Euclidean distance.The experimental assessment of the DHOGO-EAC technique was carried out under distinct aspects.The comparison outcome stated the enhanced outcomes of the DHOGO-EAC technique compared to recent approaches.展开更多
A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input d...A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data.Then,a set of prototypes are selected from the clustered input data.The inessential data can be ultimately discarded from the data set.The proposed method can reduce the effect from outliers because only the prototypes are used.This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets.From the experiments,the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instances of the synthetic data sets are decreased by 25%-69%.The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%,respectively.The reduction rate of 96% is very good for the electrocardiogram(ECG) data set because of the redundant and periodic nature of ECG signals.For all of the data sets,the regression results are similar to those from the corresponding original data sets.Therefore,the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.展开更多
Big data clustering plays an important role in the field of data processing in wireless sensor networks.However,there are some problems such as poor clustering effect and low Jaccard coefficient.This paper proposes a ...Big data clustering plays an important role in the field of data processing in wireless sensor networks.However,there are some problems such as poor clustering effect and low Jaccard coefficient.This paper proposes a novel big data clustering optimization method based on intuitionistic fuzzy set distance and particle swarm optimization for wireless sensor networks.This method combines principal component analysis method and information entropy dimensionality reduction to process big data and reduce the time required for data clustering.A new distance measurement method of intuitionistic fuzzy sets is defined,which not only considers membership and non-membership information,but also considers the allocation of hesitancy to membership and non-membership,thereby indirectly introducing hesitancy into intuitionistic fuzzy set distance.The intuitionistic fuzzy kernel clustering algorithm is used to cluster big data,and particle swarm optimization is introduced to optimize the intuitionistic fuzzy kernel clustering method.The optimized algorithm is used to obtain the optimization results of wireless sensor network big data clustering,and the big data clustering is realized.Simulation results show that the proposed method has good clustering effect by comparing with other state-of-the-art clustering methods.展开更多
For the existing support vector machine, when recognizing more questions, the shortcomings of high computational complexity and low recognition rate under the low SNR are emerged. The characteristic parameter of the s...For the existing support vector machine, when recognizing more questions, the shortcomings of high computational complexity and low recognition rate under the low SNR are emerged. The characteristic parameter of the signal is extracted and optimized by using a clustering algorithm, support vector machine is trained by grading algorithm so as to enhance the rate of convergence, improve the performance of recognition under the low SNR and realize modulation recognition of the signal based on the modulation system of the constellation diagram in this paper. Simulation results show that the average recognition rate based on this algorithm is enhanced over 30% compared with methods that adopting clustering algorithm or support vector machine respectively under the low SNR. The average recognition rate can reach 90% when the SNR is 5 dB, and the method is easy to be achieved so that it has broad application prospect in the modulating recognition.展开更多
Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set...Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clustering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.展开更多
An advanced fuzzy C-mean (FCM) algorithm was proposed for the efficient regional clustering of multi-nodes interconnected systems. Due to various locational prices and regional coherencies for each node and point, m...An advanced fuzzy C-mean (FCM) algorithm was proposed for the efficient regional clustering of multi-nodes interconnected systems. Due to various locational prices and regional coherencies for each node and point, modified similarity measure was considered to gather nodes having similar characteristics. The similarity measure was needed to contain locafi0nal prices as well as regional coherency. In order to consider the two properties simultaneously, distance measure of fuzzy C-mean algorithm had to be modified. Regional clustering algorithm for interconnected power systems was designed based on the modified fuzzy C-mean algorithm. The proposed algorithm produces proper classification for the interconnected power system and the results are demonstrated in the example of IEEE 39-bus interconnected electricity system.展开更多
According to the characteristics of sonar image data with manifold feature,the sonar image detection method based on two-phase manifold partner clustering algorithm is proposed. Firstly,K-means block clustering based ...According to the characteristics of sonar image data with manifold feature,the sonar image detection method based on two-phase manifold partner clustering algorithm is proposed. Firstly,K-means block clustering based on euclidean distance is proposed to reduce the data set. Mean value,standard deviation,and gray minimum value are considered as three features based on the relatinship between clustering model and data structure. Then K-means clustering algorithm based on manifold distance is utilized clustering again on the reduced data set to improve the detection efficiency. In K-means clustering algorithm based on manifold distance,line segment length on the manifold is analyzed,and a new power function line segment length is proposed to decrease the computational complexity. In order to quickly calculate the manifold distance,new allsource shortest path as the pretreatment of efficient algorithm is proposed. Based on this,the spatial feature of the image block is added in the three features to get the final precise partner clustering algorithm. The comparison with the other typical clustering algorithms demonstrates that the proposed algorithm gets good detection result. And it has better adaptability by experiments of the different real sonar images.展开更多
A novel Support Vector Machine(SVM) ensemble approach using clustering analysis is proposed. Firstly,the positive and negative training examples are clustered through subtractive clus-tering algorithm respectively. Th...A novel Support Vector Machine(SVM) ensemble approach using clustering analysis is proposed. Firstly,the positive and negative training examples are clustered through subtractive clus-tering algorithm respectively. Then some representative examples are chosen from each of them to construct SVM components. At last,the outputs of the individual classifiers are fused through ma-jority voting method to obtain the final decision. Comparisons of performance between the proposed method and other popular ensemble approaches,such as Bagging,Adaboost and k.-fold cross valida-tion,are carried out on synthetic and UCI datasets. The experimental results show that our method has higher classification accuracy since the example distribution information is considered during en-semble through clustering analysis. It further indicates that our method needs a much smaller size of training subsets than Bagging and Adaboost to obtain satisfactory classification accuracy.展开更多
This paper describes an improved algorithm for fuzzy c-means clustering of remotely sensed data, by which the degree of fuzziness of the resultant classification is de- creased as comparing with that by a conventional...This paper describes an improved algorithm for fuzzy c-means clustering of remotely sensed data, by which the degree of fuzziness of the resultant classification is de- creased as comparing with that by a conventional algorithm: that is, the classification accura- cy is increased. This is achieved by incorporating covariance matrices at the level of individual classes rather than assuming a global one. Empirical results from a fuzzy classification of an Edinburgh suburban land cover confirmed the improved performance of the new algorithm for fuzzy c-means clustering, in particular when fuzziness is also accommodated in the assumed reference data.展开更多
The paper study improved K-means algorithm and establish indicators to classify customers according to RFM model. Experimental results show that, the new algorithm has good convergence and stability, it has better tha...The paper study improved K-means algorithm and establish indicators to classify customers according to RFM model. Experimental results show that, the new algorithm has good convergence and stability, it has better than single use of FKP algorithms for clustering. Finally the paper study the application of clustering in customer segmentation of mobile communication enterprise. It discusses the basic theory, customer segmentation methods and steps, the customer segmentation model based on consumption behavior psychology, and the segmentation model is successfully applied to the process of marketing decision support.展开更多
基金supported by the 863 National Plan Foundation of China under Grant No.2007AA01Z333 and Special Grand National Project of China under Grant No.2009ZX02204-008.
文摘A leukocyte image fast scanning method based on max min distance clustering is proposed.Because of the lower proportion and uneven distribution of leukocytes in human peripheral blood,there will not be any leukocyte in lager quantity of the captured images if we directly scan the blood smear along an ordinary zigzag scanning routine with high power(100^(x))objective.Due to the larger field of view of low power(10^(x))objective,the captured low power blood smear images can be used to locate leukocytes.All of the located positions make up a specific routine,if we scan the blood smear along this routine with high power objective,there will be definitely leukocytes in almost all of the captured images.Considering the number of captured images is still large and some leukocytes may be redundantly captured twice or more,a leukocyte clustering method based on max-min distance clustering is developed to reduce the total number of captured images as well as the number of redundantly captured leukocytes.This method can improve the scanning eficiency obviously.The experimental results show that the proposed method can shorten scanning time from 8.0-14.0min to 2.54.0 min while extracting 110 nonredundant individual high power leukocyte images.
基金Supported by the joint fund of National Natural Science Foundation of China and Civil Aviation Administration Foundation of China(No.U1233201)
文摘A fault diagnosis model is proposed based on fuzzy support vector machine (FSVM) combined with fuzzy clustering (FC).Considering the relationship between the sample point and non-self class,FC algorithm is applied to generate fuzzy memberships.In the algorithm,sample weights based on a distribution density function of data point and genetic algorithm (GA) are introduced to enhance the performance of FC.Then a multi-class FSVM with radial basis function kernel is established according to directed acyclic graph algorithm,the penalty factor and kernel parameter of which are optimized by GA.Finally,the model is executed for multi-class fault diagnosis of rolling element bearings.The results show that the presented model achieves high performances both in identifying fault types and fault degrees.The performance comparisons of the presented model with SVM and distance-based FSVM for noisy case demonstrate the capacity of dealing with noise and generalization.
基金supported by National Natural Science Foundation of China (Grant No. 50675219)Hu’nan Provincial Science Committee Excellent Youth Foundation of China (Grant No. 08JJ1008)
文摘Turbopump condition monitoring is a significant approach to ensure the safety of liquid rocket engine (LRE).Because of lack of fault samples,a monitoring system cannot be trained on all possible condition patterns.Thus it is important to differentiate abnormal or unknown patterns from normal pattern with novelty detection methods.One-class support vector machine (OCSVM) that has been commonly used for novelty detection cannot deal well with large scale samples.In order to model the normal pattern of the turbopump with OCSVM and so as to monitor the condition of the turbopump,a monitoring method that integrates OCSVM with incremental clustering is presented.In this method,the incremental clustering is used for sample reduction by extracting representative vectors from a large training set.The representative vectors are supposed to distribute uniformly in the object region and fulfill the region.And training OCSVM on these representative vectors yields a novelty detector.By applying this method to the analysis of the turbopump's historical test data,it shows that the incremental clustering algorithm can extract 91 representative points from more than 36 000 training vectors,and the OCSVM detector trained on these 91 representative points can recognize spikes in vibration signals caused by different abnormal events such as vane shedding,rub-impact and sensor faults.This monitoring method does not need fault samples during training as classical recognition methods.The method resolves the learning problem of large samples and is an alternative method for condition monitoring of the LRE turbopump.
基金supported by the National Natural Science Foundation of China(6153302061309014)the Natural Science Foundation Project of CQ CSTC(cstc2017jcyj AX0408)
文摘The traditional grey incidence degree is mainly based on the distance analysis methods, which is measured by the displacement difference between corresponding points between sequences. When some data of sequences are missing (inconsistency in the length of the sequences), the only way is to delete the longer sequences or to fill the shorter sequences. Therefore, some uncertainty is introduced. To solve this problem, by introducing three-dimensional grey incidence degree (3D-GID), a novel GID based on the multidimensional dynamic time warping distance (MDDTW distance-GID) is proposed. On the basis of it, the corresponding grey incidence clustering (MDDTW distance-GIC) method is constructed. It not only has the simpler computation process, but also can be applied to the incidence comparison between uncertain multidimensional sequences directly. The experiment shows that MDDTW distance-GIC is more accurate when dealing with the uncertain sequences. Compared with the traditional GIC method, the precision of the MDDTW distance-GIC method has increased nearly 30%.
基金National Natural Science Foundation of China ( No. 61070033 )Fundamental Research Funds for the Central Universities,China( No. 2012ZM0061)
文摘It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically dusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision.tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.
基金supported by the National Science and Technology Major Project (No. 2017ZX05001-003)。
文摘Seismic waveform clustering is a useful technique for lithologic identification and reservoir characterization.The current seismic waveform clustering algorithms are predominantly based on a fixed time window,which is applicable for layers of stable thickness.When a layer exhibits variable thickness in the seismic response,a fixed time window cannot provide comprehensive geologic information for the target interval.Therefore,we propose a novel approach for a waveform clustering workfl ow based on a variable time window to enable broader applications.The dynamic time warping(DTW)distance is fi rst introduced to effectively measure the similarities between seismic waveforms with various lengths.We develop a DTW distance-based clustering algorithm to extract centroids,and we then determine the class of all seismic traces according to the DTW distances from centroids.To greatly reduce the computational complexity in seismic data application,we propose a superpixel-based seismic data thinning approach.We further propose an integrated workfl ow that can be applied to practical seismic data by incorporating the DTW distance-based clustering and seismic data thinning algorithms.We evaluated the performance by applying the proposed workfl ow to synthetic seismograms and seismic survey data.Compared with the the traditional waveform clustering method,the synthetic seismogram results demonstrate the enhanced capability of the proposed workfl ow to detect boundaries of diff erent lithologies or lithologic associations with variable thickness.Results from a practical application show that the planar map of seismic waveform clustering obtained by the proposed workfl ow correlates well with the geological characteristics of wells in terms of reservoir thickness.
文摘A sustainable production of electricity is essential for low carbon green growth in South Korea. The generation of wind power as renewable energy has been rapidly growing around the world. Undoubtedly, wind energy is unlimited in potential. However due to its own intermittency and volatility, there are difficulties in the effective harvesting of wind energy and the integration of wind power into the current electric power grid. To cope with this, many works have been done for wind speed and power forecasting. In this paper, an SVR (support vector regression) using FCM (Fuzzy C-Means) is proposed for wind speed forecasting. This paper describes the design of an FCM based SVR to increase the prediction accuracy. Proposed model was compared with ordinary SVR model using balanced and unbalanced test data. Also, multi-step ahead forecasting result was compared. Kernel parameters in SVR are adaptively determined in order to improve forecasting accuracy. An illustrative example is given by using real-world wind farm dataset. According to the experimental results, it is shown that the proposed method provides better forecasts of wind power.
基金TheNationalHighTechnologyResearchandDevelopmentProgramofChina (No .86 3 5 11 930 0 0 9)
文摘Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.
基金National Natural Science Foundation of China (No. 60975083)Key Grant Project,Ministry of Education,China(No. 104145)
文摘A new algorithm named kernel bisecting k-means and sample removal(KBK-SR) is proposed as sampling preprocessing for support vector machine(SVM) training to improve the efficiency.The proposed algorithm tends to quickly produce balanced clusters of similar sizes in the kernel feature space,which makes it efficient and effective for reducing training samples.Theoretical analysis and experimental results on three UCI real data benchmarks both show that,with very short sampling time,the proposed algorithm dramatically accelerates SVM sampling and training while maintaining high test accuracy.
文摘Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.
基金This work was supported by the Ulsan City&Electronics and Telecommunications Research Institute(ETRI)grant funded by the Ulsan City[22AS1600,the development of intelligentization technology for the main industry for manufacturing innovation and Human-mobile-space autonomous collaboration intelligence technology development in industrial sites].
文摘In a vehicular ad hoc network(VANET),a massive quantity of data needs to be transmitted on a large scale in shorter time durations.At the same time,vehicles exhibit high velocity,leading to more vehicle disconnections.Both of these characteristics result in unreliable data communication in VANET.A vehicle clustering algorithm clusters the vehicles in groups employed in VANET to enhance network scalability and connection reliability.Clustering is considered one of the possible solutions for attaining effectual interaction in VANETs.But one such difficulty was reducing the cluster number under increasing transmitting nodes.This article introduces an Evolutionary Hide Objects Game Optimization based Distance Aware Clustering(EHOGO-DAC)Scheme for VANET.The major intention of the EHOGO-DAC technique is to portion the VANET into distinct sets of clusters by grouping vehicles.In addition,the DHOGO-EAC technique is mainly based on the HOGO algorithm,which is stimulated by old games,and the searching agent tries to identify hidden objects in a given space.The DHOGO-EAC technique derives a fitness function for the clustering process,including the total number of clusters and Euclidean distance.The experimental assessment of the DHOGO-EAC technique was carried out under distinct aspects.The comparison outcome stated the enhanced outcomes of the DHOGO-EAC technique compared to recent approaches.
基金supported by Chiang Mai University Research Fund under the contract number T-M5744
文摘A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data.Then,a set of prototypes are selected from the clustered input data.The inessential data can be ultimately discarded from the data set.The proposed method can reduce the effect from outliers because only the prototypes are used.This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets.From the experiments,the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instances of the synthetic data sets are decreased by 25%-69%.The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%,respectively.The reduction rate of 96% is very good for the electrocardiogram(ECG) data set because of the redundant and periodic nature of ECG signals.For all of the data sets,the regression results are similar to those from the corresponding original data sets.Therefore,the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.
基金2021 Scientific Research Funding Project of Liaoning Provincial Education Department(Research and implementation of university scientific research information platform serving the transformation of achievements).
文摘Big data clustering plays an important role in the field of data processing in wireless sensor networks.However,there are some problems such as poor clustering effect and low Jaccard coefficient.This paper proposes a novel big data clustering optimization method based on intuitionistic fuzzy set distance and particle swarm optimization for wireless sensor networks.This method combines principal component analysis method and information entropy dimensionality reduction to process big data and reduce the time required for data clustering.A new distance measurement method of intuitionistic fuzzy sets is defined,which not only considers membership and non-membership information,but also considers the allocation of hesitancy to membership and non-membership,thereby indirectly introducing hesitancy into intuitionistic fuzzy set distance.The intuitionistic fuzzy kernel clustering algorithm is used to cluster big data,and particle swarm optimization is introduced to optimize the intuitionistic fuzzy kernel clustering method.The optimized algorithm is used to obtain the optimization results of wireless sensor network big data clustering,and the big data clustering is realized.Simulation results show that the proposed method has good clustering effect by comparing with other state-of-the-art clustering methods.
基金supported in part by the National Natural Science Foundation of China under Grand No.61871129 and No.61301179Projects of Science and Technology Plan Guangdong Province under Grand No.2014A010101284
文摘For the existing support vector machine, when recognizing more questions, the shortcomings of high computational complexity and low recognition rate under the low SNR are emerged. The characteristic parameter of the signal is extracted and optimized by using a clustering algorithm, support vector machine is trained by grading algorithm so as to enhance the rate of convergence, improve the performance of recognition under the low SNR and realize modulation recognition of the signal based on the modulation system of the constellation diagram in this paper. Simulation results show that the average recognition rate based on this algorithm is enhanced over 30% compared with methods that adopting clustering algorithm or support vector machine respectively under the low SNR. The average recognition rate can reach 90% when the SNR is 5 dB, and the method is easy to be achieved so that it has broad application prospect in the modulating recognition.
基金supported by the National Natural Science Foundation of China (70571087)the National Science Fund for Distinguished Young Scholars of China (70625005)
文摘Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a membership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clustering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.
基金Work supported by the Second Stage of Brain Korea 21 ProjectsWork(2010-0020163) supported by Priority Research Centers Program through the National Research Foundation (NRF) funded by the Ministry of Education,Science and Technology of Korea
文摘An advanced fuzzy C-mean (FCM) algorithm was proposed for the efficient regional clustering of multi-nodes interconnected systems. Due to various locational prices and regional coherencies for each node and point, modified similarity measure was considered to gather nodes having similar characteristics. The similarity measure was needed to contain locafi0nal prices as well as regional coherency. In order to consider the two properties simultaneously, distance measure of fuzzy C-mean algorithm had to be modified. Regional clustering algorithm for interconnected power systems was designed based on the modified fuzzy C-mean algorithm. The proposed algorithm produces proper classification for the interconnected power system and the results are demonstrated in the example of IEEE 39-bus interconnected electricity system.
基金Sponsored by the National Natural Science Foundation of China(Grant No.41306086)the Technology Innovation Talent Special Foundation of Harbin(Grant No.2014RFQXJ105)the Fundamental Research Funds for the Central Universities(Grant No.HEUCFR1121,HEUCF100606)
文摘According to the characteristics of sonar image data with manifold feature,the sonar image detection method based on two-phase manifold partner clustering algorithm is proposed. Firstly,K-means block clustering based on euclidean distance is proposed to reduce the data set. Mean value,standard deviation,and gray minimum value are considered as three features based on the relatinship between clustering model and data structure. Then K-means clustering algorithm based on manifold distance is utilized clustering again on the reduced data set to improve the detection efficiency. In K-means clustering algorithm based on manifold distance,line segment length on the manifold is analyzed,and a new power function line segment length is proposed to decrease the computational complexity. In order to quickly calculate the manifold distance,new allsource shortest path as the pretreatment of efficient algorithm is proposed. Based on this,the spatial feature of the image block is added in the three features to get the final precise partner clustering algorithm. The comparison with the other typical clustering algorithms demonstrates that the proposed algorithm gets good detection result. And it has better adaptability by experiments of the different real sonar images.
基金the National Natural Science Foundation of China (No.60472072)the Specialized Research Foundation for the Doctoral Program of Higher Educa-tion of China (No.20040699034).
文摘A novel Support Vector Machine(SVM) ensemble approach using clustering analysis is proposed. Firstly,the positive and negative training examples are clustered through subtractive clus-tering algorithm respectively. Then some representative examples are chosen from each of them to construct SVM components. At last,the outputs of the individual classifiers are fused through ma-jority voting method to obtain the final decision. Comparisons of performance between the proposed method and other popular ensemble approaches,such as Bagging,Adaboost and k.-fold cross valida-tion,are carried out on synthetic and UCI datasets. The experimental results show that our method has higher classification accuracy since the example distribution information is considered during en-semble through clustering analysis. It further indicates that our method needs a much smaller size of training subsets than Bagging and Adaboost to obtain satisfactory classification accuracy.
文摘This paper describes an improved algorithm for fuzzy c-means clustering of remotely sensed data, by which the degree of fuzziness of the resultant classification is de- creased as comparing with that by a conventional algorithm: that is, the classification accura- cy is increased. This is achieved by incorporating covariance matrices at the level of individual classes rather than assuming a global one. Empirical results from a fuzzy classification of an Edinburgh suburban land cover confirmed the improved performance of the new algorithm for fuzzy c-means clustering, in particular when fuzziness is also accommodated in the assumed reference data.
文摘The paper study improved K-means algorithm and establish indicators to classify customers according to RFM model. Experimental results show that, the new algorithm has good convergence and stability, it has better than single use of FKP algorithms for clustering. Finally the paper study the application of clustering in customer segmentation of mobile communication enterprise. It discusses the basic theory, customer segmentation methods and steps, the customer segmentation model based on consumption behavior psychology, and the segmentation model is successfully applied to the process of marketing decision support.