In order to enable clustering to be done under a lower dimension, a new feature selection method for clustering is proposed. This method has three steps which are all carried out in a wrapper framework. First, all the...In order to enable clustering to be done under a lower dimension, a new feature selection method for clustering is proposed. This method has three steps which are all carried out in a wrapper framework. First, all the original features are ranked according to their importance. An evaluation function E(f) used to evaluate the importance of a feature is introduced. Secondly, the set of important features is selected sequentially. Finally, the possible redundant features are removed from the important feature subset. Because the features are selected sequentially, it is not necessary to search through the large feature subset space, thus the efficiency can be improved. Experimental results show that the set of important features for clustering can be found and those unimportant features or features that may hinder the clustering task will be discarded by this method.展开更多
Tensor representation is useful to reduce the overfitting problem in vector-based learning algorithm in pattern recognition.This is mainly because the structure information of objects in pattern analysis is a reasonab...Tensor representation is useful to reduce the overfitting problem in vector-based learning algorithm in pattern recognition.This is mainly because the structure information of objects in pattern analysis is a reasonable constraint to reduce the number of unknown parameters used to model a classifier.In this paper, we generalize the vector-based learning algorithm TWin Support Vector Machine(TWSVM) to the tensor-based method TWin Support Tensor Machines(TWSTM), which accepts general tensors as input.To examine the effectiveness of TWSTM, we implement the TWSTM method for Microcalcification Clusters(MCs) detection.In the tensor subspace domain, the MCs detection procedure is formulated as a supervised learning and classification problem, and TWSTM is used as a classifier to make decision for the presence of MCs or not.A large number of experiments were carried out to evaluate and compare the performance of the proposed MCs detection algorithm.By comparison with TWSVM, the tensor version reduces the overfitting problem.展开更多
Aiming at the topic of electroencephalogram (EEG) pattern recognition in brain computer interface (BCI), a classification method based on probabilistic neural network (PNN) with supervised learning is presented ...Aiming at the topic of electroencephalogram (EEG) pattern recognition in brain computer interface (BCI), a classification method based on probabilistic neural network (PNN) with supervised learning is presented in this paper. It applies the recognition rate of training samples to the learning progress of network parameters. The learning vector quantization is employed to group training samples and the Genetic algorithm (GA) is used for training the network' s smoothing parameters and hidden central vector for detemlining hidden neurons. Utilizing the standard dataset I (a) of BCI Competition 2003 and comparing with other classification methods, the experiment results show that the best performance of pattern recognition Js got in this way, and the classification accuracy can reach to 93.8%, which improves over 5% compared with the best result (88.7 % ) of the competition. This technology provides an effective way to EEG classification in practical system of BCI.展开更多
Objective To explore the semi-supervised learning(SSL) algorithm for long-tail endoscopic image classification with limited annotations.Method We explored semi-supervised long-tail endoscopic image classification in H...Objective To explore the semi-supervised learning(SSL) algorithm for long-tail endoscopic image classification with limited annotations.Method We explored semi-supervised long-tail endoscopic image classification in HyperKvasir,the largest gastrointestinal public dataset with 23 diverse classes.Semi-supervised learning algorithm FixMatch was applied based on consistency regularization and pseudo-labeling.After splitting the training dataset and the test dataset at a ratio of 4:1,we sampled 20%,50%,and 100% labeled training data to test the classification with limited annotations.Results The classification performance was evaluated by micro-average and macro-average evaluation metrics,with the Mathews correlation coefficient(MCC) as the overall evaluation.SSL algorithm improved the classification performance,with MCC increasing from 0.8761 to 0.8850,from 0.8983 to 0.8994,and from 0.9075 to 0.9095 with 20%,50%,and 100% ratio of labeled training data,respectively.With a 20% ratio of labeled training data,SSL improved both the micro-average and macro-average classification performance;while for the ratio of 50% and 100%,SSL improved the micro-average performance but hurt macro-average performance.Through analyzing the confusion matrix and labeling bias in each class,we found that the pseudo-based SSL algorithm exacerbated the classifier’ s preference for the head class,resulting in improved performance in the head class and degenerated performance in the tail class.Conclusion SSL can improve the classification performance for semi-supervised long-tail endoscopic image classification,especially when the labeled data is extremely limited,which may benefit the building of assisted diagnosis systems for low-volume hospitals.However,the pseudo-labeling strategy may amplify the effect of class imbalance,which hurts the classification performance for the tail class.展开更多
Ensemble learning is a wildly concerned issue.Traditional ensemble techniques are always adopted to seek better results with labeled data and base classifiers.They fail to address the ensemble task where only unlabele...Ensemble learning is a wildly concerned issue.Traditional ensemble techniques are always adopted to seek better results with labeled data and base classifiers.They fail to address the ensemble task where only unlabeled data are available.A label propagation based ensemble(LPBE) approach is proposed to further combine base classification results with unlabeled data.First,a graph is constructed by taking unlabeled data as vertexes,and the weights in the graph are calculated by correntropy function.Average prediction results are gained from base classifiers,and then propagated under a regularization framework and adaptively enhanced over the graph.The proposed approach is further enriched when small labeled data are available.The proposed algorithms are evaluated on several UCI benchmark data sets.Results of simulations show that the proposed algorithms achieve satisfactory performance compared with existing ensemble methods.展开更多
An extended self-organizing map for supervised classification is proposed in this paper. Unlike other traditional SOMs, the model has an input layer, a Kohonen layer, and an output layer. The number of neurons in the ...An extended self-organizing map for supervised classification is proposed in this paper. Unlike other traditional SOMs, the model has an input layer, a Kohonen layer, and an output layer. The number of neurons in the input layer depends on the dimensionality of input patterns. The number of neurons in the output layer equals the number of the desired classes. The number of neurons in the Kohonen layer may be a few to several thousands, which depends on the complexity of classification problems and the classification precision. Each training sample is expressed by a pair of vectors : an input vector and a class codebook vector. When a training sample is input into the model, Kohonen's competitive learning rule is applied to selecting the winning neuron from the Kohouen layer and the weight coefficients connecting all the neurons in the input layer with both the winning neuron and its neighbors in the Kohonen layer are modified to be closer to the input vector, and those connecting all the neurons around the winning neuron within a certain diameter in the Kohonen layer with all the neurons in the output layer are adjusted to be closer to the class codebook vector. If the number of training sam- ples is sufficiently large and the learning epochs iterate enough times, the model will be able to serve as a supervised classifier. The model has been tentatively applied to the supervised classification of multispectral remotely sensed data. The author compared the performances of the extended SOM and BPN in remotely sensed data classification. The investigation manifests that the extended SOM is feasible for supervised classification.展开更多
The proliferation of forums and blogs leads to challenges and opportunities for processing large amounts of information. The information shared on various topics often contains opinionated words which are qualitative ...The proliferation of forums and blogs leads to challenges and opportunities for processing large amounts of information. The information shared on various topics often contains opinionated words which are qualitative in nature. These qualitative words need statistical computations to convert them into useful quantitative data. This data should be processed properly since it expresses opinions. Each of these opinion bearing words differs based on the significant meaning it conveys. To process the linguistic meaning of words into data and to enhance opinion mining analysis, we propose a novel weighting scheme, referred to as inferred word weighting(IWW). IWW is computed based on the significance of the word in the document(SWD) and the significance of the word in the expression(SWE) to enhance their performance. The proposed weighting methods give an analytic view and provide appropriate weights to the words compared to existing methods. In addition to the new weighting methods, another type of checking is done on the performance of text classification by including stop-words. Generally, stop-words are removed in text processing. When this new concept of including stop-words is applied to the proposed and existing weighting methods, two facts are observed:(1) Classification performance is enhanced;(2) The outcome difference between inclusion and exclusion of stop-words is smaller in the proposed methods, and larger in existing methods. The inferences provided by these observations are discussed. Experimental results of the benchmark data sets show the potential enhancement in terms of classification accuracy.展开更多
文摘In order to enable clustering to be done under a lower dimension, a new feature selection method for clustering is proposed. This method has three steps which are all carried out in a wrapper framework. First, all the original features are ranked according to their importance. An evaluation function E(f) used to evaluate the importance of a feature is introduced. Secondly, the set of important features is selected sequentially. Finally, the possible redundant features are removed from the important feature subset. Because the features are selected sequentially, it is not necessary to search through the large feature subset space, thus the efficiency can be improved. Experimental results show that the set of important features for clustering can be found and those unimportant features or features that may hinder the clustering task will be discarded by this method.
基金Supported by the National Natural Science Foundation of China (No. 60771068)the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2007F248)
文摘Tensor representation is useful to reduce the overfitting problem in vector-based learning algorithm in pattern recognition.This is mainly because the structure information of objects in pattern analysis is a reasonable constraint to reduce the number of unknown parameters used to model a classifier.In this paper, we generalize the vector-based learning algorithm TWin Support Vector Machine(TWSVM) to the tensor-based method TWin Support Tensor Machines(TWSTM), which accepts general tensors as input.To examine the effectiveness of TWSTM, we implement the TWSTM method for Microcalcification Clusters(MCs) detection.In the tensor subspace domain, the MCs detection procedure is formulated as a supervised learning and classification problem, and TWSTM is used as a classifier to make decision for the presence of MCs or not.A large number of experiments were carried out to evaluate and compare the performance of the proposed MCs detection algorithm.By comparison with TWSVM, the tensor version reduces the overfitting problem.
基金Supported by the National Natural Science Foundation of China (No. 30570485)the Shanghai "Chen Guang" Project (No. 09CG69).
文摘Aiming at the topic of electroencephalogram (EEG) pattern recognition in brain computer interface (BCI), a classification method based on probabilistic neural network (PNN) with supervised learning is presented in this paper. It applies the recognition rate of training samples to the learning progress of network parameters. The learning vector quantization is employed to group training samples and the Genetic algorithm (GA) is used for training the network' s smoothing parameters and hidden central vector for detemlining hidden neurons. Utilizing the standard dataset I (a) of BCI Competition 2003 and comparing with other classification methods, the experiment results show that the best performance of pattern recognition Js got in this way, and the classification accuracy can reach to 93.8%, which improves over 5% compared with the best result (88.7 % ) of the competition. This technology provides an effective way to EEG classification in practical system of BCI.
文摘Objective To explore the semi-supervised learning(SSL) algorithm for long-tail endoscopic image classification with limited annotations.Method We explored semi-supervised long-tail endoscopic image classification in HyperKvasir,the largest gastrointestinal public dataset with 23 diverse classes.Semi-supervised learning algorithm FixMatch was applied based on consistency regularization and pseudo-labeling.After splitting the training dataset and the test dataset at a ratio of 4:1,we sampled 20%,50%,and 100% labeled training data to test the classification with limited annotations.Results The classification performance was evaluated by micro-average and macro-average evaluation metrics,with the Mathews correlation coefficient(MCC) as the overall evaluation.SSL algorithm improved the classification performance,with MCC increasing from 0.8761 to 0.8850,from 0.8983 to 0.8994,and from 0.9075 to 0.9095 with 20%,50%,and 100% ratio of labeled training data,respectively.With a 20% ratio of labeled training data,SSL improved both the micro-average and macro-average classification performance;while for the ratio of 50% and 100%,SSL improved the micro-average performance but hurt macro-average performance.Through analyzing the confusion matrix and labeling bias in each class,we found that the pseudo-based SSL algorithm exacerbated the classifier’ s preference for the head class,resulting in improved performance in the head class and degenerated performance in the tail class.Conclusion SSL can improve the classification performance for semi-supervised long-tail endoscopic image classification,especially when the labeled data is extremely limited,which may benefit the building of assisted diagnosis systems for low-volume hospitals.However,the pseudo-labeling strategy may amplify the effect of class imbalance,which hurts the classification performance for the tail class.
基金Project (20121101004) supported by the Major Science and Technology Program of Shanxi Province,ChinaProject (20130321004-01) supported by the Key Technologies R&D Program of Shanxi Province,China+2 种基金Project (2013M530896) supported by the Postdoctoral Science Foundation of ChinaProject (2014021022-6) supported by the Shanxi Provincial Science Foundation for Youths,ChinaProject (80010302010053) supported by the Shanxi Characteristic Discipline Fund,China
文摘Ensemble learning is a wildly concerned issue.Traditional ensemble techniques are always adopted to seek better results with labeled data and base classifiers.They fail to address the ensemble task where only unlabeled data are available.A label propagation based ensemble(LPBE) approach is proposed to further combine base classification results with unlabeled data.First,a graph is constructed by taking unlabeled data as vertexes,and the weights in the graph are calculated by correntropy function.Average prediction results are gained from base classifiers,and then propagated under a regularization framework and adaptively enhanced over the graph.The proposed approach is further enriched when small labeled data are available.The proposed algorithms are evaluated on several UCI benchmark data sets.Results of simulations show that the proposed algorithms achieve satisfactory performance compared with existing ensemble methods.
基金Supported by National Natural Science Foundation of China (No. 40872193)
文摘An extended self-organizing map for supervised classification is proposed in this paper. Unlike other traditional SOMs, the model has an input layer, a Kohonen layer, and an output layer. The number of neurons in the input layer depends on the dimensionality of input patterns. The number of neurons in the output layer equals the number of the desired classes. The number of neurons in the Kohonen layer may be a few to several thousands, which depends on the complexity of classification problems and the classification precision. Each training sample is expressed by a pair of vectors : an input vector and a class codebook vector. When a training sample is input into the model, Kohonen's competitive learning rule is applied to selecting the winning neuron from the Kohouen layer and the weight coefficients connecting all the neurons in the input layer with both the winning neuron and its neighbors in the Kohonen layer are modified to be closer to the input vector, and those connecting all the neurons around the winning neuron within a certain diameter in the Kohonen layer with all the neurons in the output layer are adjusted to be closer to the class codebook vector. If the number of training sam- ples is sufficiently large and the learning epochs iterate enough times, the model will be able to serve as a supervised classifier. The model has been tentatively applied to the supervised classification of multispectral remotely sensed data. The author compared the performances of the extended SOM and BPN in remotely sensed data classification. The investigation manifests that the extended SOM is feasible for supervised classification.
文摘The proliferation of forums and blogs leads to challenges and opportunities for processing large amounts of information. The information shared on various topics often contains opinionated words which are qualitative in nature. These qualitative words need statistical computations to convert them into useful quantitative data. This data should be processed properly since it expresses opinions. Each of these opinion bearing words differs based on the significant meaning it conveys. To process the linguistic meaning of words into data and to enhance opinion mining analysis, we propose a novel weighting scheme, referred to as inferred word weighting(IWW). IWW is computed based on the significance of the word in the document(SWD) and the significance of the word in the expression(SWE) to enhance their performance. The proposed weighting methods give an analytic view and provide appropriate weights to the words compared to existing methods. In addition to the new weighting methods, another type of checking is done on the performance of text classification by including stop-words. Generally, stop-words are removed in text processing. When this new concept of including stop-words is applied to the proposed and existing weighting methods, two facts are observed:(1) Classification performance is enhanced;(2) The outcome difference between inclusion and exclusion of stop-words is smaller in the proposed methods, and larger in existing methods. The inferences provided by these observations are discussed. Experimental results of the benchmark data sets show the potential enhancement in terms of classification accuracy.