Sequence analysis technology under big data provides unprecedented opportunities for modern life science. A novel gene coding sequence identification method is proposed in this paper. Firstly, an improved short-time F...Sequence analysis technology under big data provides unprecedented opportunities for modern life science. A novel gene coding sequence identification method is proposed in this paper. Firstly, an improved short-time Fourier transform algorithm based on Morlet wavelet is applied to extract the power spectrum of DNA sequence. Then, threshold value determination method based on kernel fuzzy C-mean clustering is used to combine Signal to Noise Ratio (SNR) data of exon and intron into a sequence, classify the sequence into two types, calculate the weighted sum of two SNR clustering centers obtained and the discrimination threshold value. Finally, exon interval endpoint identification algorithm based on Takagi-Sugeno fuzzy identification model is presented to train Takagi-Sugeno model, optimize model parameters with Levenberg-Marquardt least square method, complete model and determine fuzzy rule. To verify the effectiveness of the proposed method, example tests are conducted on typical gene sequence sample data.展开更多
The premise and basis of load modeling are substation load composition inquiries and cluster analyses.However,the traditional kernel fuzzy C-means(KFCM)algorithm is limited by artificial clustering number selection an...The premise and basis of load modeling are substation load composition inquiries and cluster analyses.However,the traditional kernel fuzzy C-means(KFCM)algorithm is limited by artificial clustering number selection and its convergence to local optimal solutions.To overcome these limitations,an improved KFCM algorithm with adaptive optimal clustering number selection is proposed in this paper.This algorithm optimizes the KFCM algorithm by combining the powerful global search ability of genetic algorithm and the robust local search ability of simulated annealing algorithm.The improved KFCM algorithm adaptively determines the ideal number of clusters using the clustering evaluation index ratio.Compared with the traditional KFCM algorithm,the enhanced KFCM algorithm has robust clustering and comprehensive abilities,enabling the efficient convergence to the global optimal solution.展开更多
The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, d...The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.展开更多
A method about fault identification is proposed to solve the relationship among fault features of large rotating machinery, which is extremely complicated and nonlinear. This paper studies the rotor test-rig and the c...A method about fault identification is proposed to solve the relationship among fault features of large rotating machinery, which is extremely complicated and nonlinear. This paper studies the rotor test-rig and the clustering of data sets and fault pattern recognitions. The present method firstly maps the data from their original space to a high dimensional Kernel space which makes the highly nonlinear data in low-dimensional space become linearly separable in Kernel space. It highlights the differences among the features of the data set. Then fuzzy C-means (FCM) is conducted in the Kernel space. Each data is assigned to the nearest class by computing the distance to the clustering center. Finally, test set is used to judge the results. The convergence rate and clustering accuracy are better than traditional FCM. The study shows that the method is effective for the accuracy of pattern recognition on rotating machinery.展开更多
A novel model of fuzzy clustering using kernel methods is proposed. This model is called kernel modified possibilistic c-means (KMPCM) model. The proposed model is an extension of the modified possibilistic c-means ...A novel model of fuzzy clustering using kernel methods is proposed. This model is called kernel modified possibilistic c-means (KMPCM) model. The proposed model is an extension of the modified possibilistic c-means (MPCM) algorithm by using kernel methods. Different from MPCM and fuzzy c-means (FCM) model which are based on Euclidean distance, the proposed model is based on kernel-induced distance. Furthermore, with kernel methods the input data can be mapped implicitly into a high-dimensional feature space where the nonlinear pattern now appears linear. It is unnecessary to do calculation in the high-dimensional feature space because the kernel function can do it. Numerical experiments show that KMPCM outperforms FCM and MPCM.展开更多
We propose a novel clustering algorithm using fast global kernel fuzzy c-means-F(FGKFCM-F), where F refers to kernelized feature space. This algorithm proceeds in an incremental way to derive the near-optimal solution...We propose a novel clustering algorithm using fast global kernel fuzzy c-means-F(FGKFCM-F), where F refers to kernelized feature space. This algorithm proceeds in an incremental way to derive the near-optimal solution by solving all intermediate problems using kernel-based fuzzy c-means-F(KFCM-F) as a local search procedure. Due to the incremental nature and the nonlinear properties inherited from KFCM-F, this algorithm overcomes the two shortcomings of fuzzy c-means(FCM): sen- sitivity to initialization and inability to use nonlinear separable data. An accelerating scheme is developed to reduce the compu-tational complexity without significantly affecting the solution quality. Experiments are carried out to test the proposed algorithm on a nonlinear artificial dataset and a real-world dataset of speech signals for consonant/vowel segmentation. Simulation results demonstrate the effectiveness of the proposed algorithm in improving clustering performance on both types of datasets.展开更多
A Recommender System(RS)is a crucial part of several firms,particularly those involved in e-commerce.In conventional RS,a user may only offer a single rating for an item-that is insufficient to perceive consumer prefe...A Recommender System(RS)is a crucial part of several firms,particularly those involved in e-commerce.In conventional RS,a user may only offer a single rating for an item-that is insufficient to perceive consumer preferences.Nowadays,businesses in industries like e-learning and tourism enable customers to rate a product using a variety of factors to comprehend customers’preferences.On the other hand,the collaborative filtering(CF)algorithm utilizing AutoEncoder(AE)is seen to be effective in identifying user-interested items.However,the cost of these computations increases nonlinearly as the number of items and users increases.To triumph over the issues,a novel expanded stacked autoencoder(ESAE)with Kernel Fuzzy C-Means Clustering(KFCM)technique is proposed with two phases.In the first phase of offline,the sparse multicriteria rating matrix is smoothened to a complete matrix by predicting the users’intact rating by the ESAE approach and users are clustered using the KFCM approach.In the next phase of online,the top-N recommendation prediction is made by the ESAE approach involving only the most similar user from multiple clusters.Hence the ESAE_KFCM model upgrades the prediction accuracy of 98.2%in Top-N recommendation with a minimized recommendation generation time.An experimental check on the Yahoo!Movies(YM)movie dataset and TripAdvisor(TA)travel dataset confirmed that the ESAE_KFCM model constantly outperforms conventional RS algorithms on a variety of assessment measures.展开更多
Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some ...Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some problems: it is still sensitive to initial clustering centers and the clustering results are not good when the tested datasets with noise are very unequal. An improved kernel possibilistic fuzzy c-means algorithm based on invasive weed optimization(IWO-KPFCM) is proposed in this paper. This algorithm first uses invasive weed optimization(IWO) algorithm to seek the optimal solution as the initial clustering centers, and introduces kernel method to make the input data from the sample space map into the high-dimensional feature space. Then, the sample variance is introduced in the objection function to measure the compact degree of data. Finally, the improved algorithm is used to cluster data. The simulation results of the University of California-Irvine(UCI) data sets and artificial data sets show that the proposed algorithm has stronger ability to resist noise, higher cluster accuracy and faster convergence speed than the PFCM algorithm.展开更多
文摘Sequence analysis technology under big data provides unprecedented opportunities for modern life science. A novel gene coding sequence identification method is proposed in this paper. Firstly, an improved short-time Fourier transform algorithm based on Morlet wavelet is applied to extract the power spectrum of DNA sequence. Then, threshold value determination method based on kernel fuzzy C-mean clustering is used to combine Signal to Noise Ratio (SNR) data of exon and intron into a sequence, classify the sequence into two types, calculate the weighted sum of two SNR clustering centers obtained and the discrimination threshold value. Finally, exon interval endpoint identification algorithm based on Takagi-Sugeno fuzzy identification model is presented to train Takagi-Sugeno model, optimize model parameters with Levenberg-Marquardt least square method, complete model and determine fuzzy rule. To verify the effectiveness of the proposed method, example tests are conducted on typical gene sequence sample data.
基金supported by the Planning Special Project of Guangdong Power Grid Co.,Ltd.:“Study on load modeling based on total measurement and discrimination method suitable for system characteristic analysis and calculation during the implementation of target grid in Guangdong power grid”(0319002022030203JF00023).
文摘The premise and basis of load modeling are substation load composition inquiries and cluster analyses.However,the traditional kernel fuzzy C-means(KFCM)algorithm is limited by artificial clustering number selection and its convergence to local optimal solutions.To overcome these limitations,an improved KFCM algorithm with adaptive optimal clustering number selection is proposed in this paper.This algorithm optimizes the KFCM algorithm by combining the powerful global search ability of genetic algorithm and the robust local search ability of simulated annealing algorithm.The improved KFCM algorithm adaptively determines the ideal number of clusters using the clustering evaluation index ratio.Compared with the traditional KFCM algorithm,the enhanced KFCM algorithm has robust clustering and comprehensive abilities,enabling the efficient convergence to the global optimal solution.
文摘The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.
基金supported by the National Natural Science Foundation of China(51675253)
文摘A method about fault identification is proposed to solve the relationship among fault features of large rotating machinery, which is extremely complicated and nonlinear. This paper studies the rotor test-rig and the clustering of data sets and fault pattern recognitions. The present method firstly maps the data from their original space to a high dimensional Kernel space which makes the highly nonlinear data in low-dimensional space become linearly separable in Kernel space. It highlights the differences among the features of the data set. Then fuzzy C-means (FCM) is conducted in the Kernel space. Each data is assigned to the nearest class by computing the distance to the clustering center. Finally, test set is used to judge the results. The convergence rate and clustering accuracy are better than traditional FCM. The study shows that the method is effective for the accuracy of pattern recognition on rotating machinery.
基金Project supported by the 15th Plan for National Defence Preventive Research Project (Grant No.413030201)
文摘A novel model of fuzzy clustering using kernel methods is proposed. This model is called kernel modified possibilistic c-means (KMPCM) model. The proposed model is an extension of the modified possibilistic c-means (MPCM) algorithm by using kernel methods. Different from MPCM and fuzzy c-means (FCM) model which are based on Euclidean distance, the proposed model is based on kernel-induced distance. Furthermore, with kernel methods the input data can be mapped implicitly into a high-dimensional feature space where the nonlinear pattern now appears linear. It is unnecessary to do calculation in the high-dimensional feature space because the kernel function can do it. Numerical experiments show that KMPCM outperforms FCM and MPCM.
基金Project supported by the National Research Foundation(NRF) of Korea(Nos.2013009458 and 2013068127)
文摘We propose a novel clustering algorithm using fast global kernel fuzzy c-means-F(FGKFCM-F), where F refers to kernelized feature space. This algorithm proceeds in an incremental way to derive the near-optimal solution by solving all intermediate problems using kernel-based fuzzy c-means-F(KFCM-F) as a local search procedure. Due to the incremental nature and the nonlinear properties inherited from KFCM-F, this algorithm overcomes the two shortcomings of fuzzy c-means(FCM): sen- sitivity to initialization and inability to use nonlinear separable data. An accelerating scheme is developed to reduce the compu-tational complexity without significantly affecting the solution quality. Experiments are carried out to test the proposed algorithm on a nonlinear artificial dataset and a real-world dataset of speech signals for consonant/vowel segmentation. Simulation results demonstrate the effectiveness of the proposed algorithm in improving clustering performance on both types of datasets.
文摘A Recommender System(RS)is a crucial part of several firms,particularly those involved in e-commerce.In conventional RS,a user may only offer a single rating for an item-that is insufficient to perceive consumer preferences.Nowadays,businesses in industries like e-learning and tourism enable customers to rate a product using a variety of factors to comprehend customers’preferences.On the other hand,the collaborative filtering(CF)algorithm utilizing AutoEncoder(AE)is seen to be effective in identifying user-interested items.However,the cost of these computations increases nonlinearly as the number of items and users increases.To triumph over the issues,a novel expanded stacked autoencoder(ESAE)with Kernel Fuzzy C-Means Clustering(KFCM)technique is proposed with two phases.In the first phase of offline,the sparse multicriteria rating matrix is smoothened to a complete matrix by predicting the users’intact rating by the ESAE approach and users are clustered using the KFCM approach.In the next phase of online,the top-N recommendation prediction is made by the ESAE approach involving only the most similar user from multiple clusters.Hence the ESAE_KFCM model upgrades the prediction accuracy of 98.2%in Top-N recommendation with a minimized recommendation generation time.An experimental check on the Yahoo!Movies(YM)movie dataset and TripAdvisor(TA)travel dataset confirmed that the ESAE_KFCM model constantly outperforms conventional RS algorithms on a variety of assessment measures.
文摘Fuzzy c-means(FCM) clustering algorithm is sensitive to noise points and outlier data, and the possibilistic fuzzy c-means(PFCM) clustering algorithm overcomes the problem well, but PFCM clustering algorithm has some problems: it is still sensitive to initial clustering centers and the clustering results are not good when the tested datasets with noise are very unequal. An improved kernel possibilistic fuzzy c-means algorithm based on invasive weed optimization(IWO-KPFCM) is proposed in this paper. This algorithm first uses invasive weed optimization(IWO) algorithm to seek the optimal solution as the initial clustering centers, and introduces kernel method to make the input data from the sample space map into the high-dimensional feature space. Then, the sample variance is introduced in the objection function to measure the compact degree of data. Finally, the improved algorithm is used to cluster data. The simulation results of the University of California-Irvine(UCI) data sets and artificial data sets show that the proposed algorithm has stronger ability to resist noise, higher cluster accuracy and faster convergence speed than the PFCM algorithm.