The aim of this paper is to broaden the application of Stochastic Configuration Network (SCN) in the semi-supervised domain by utilizing common unlabeled data in daily life. It can enhance the classification accuracy ...The aim of this paper is to broaden the application of Stochastic Configuration Network (SCN) in the semi-supervised domain by utilizing common unlabeled data in daily life. It can enhance the classification accuracy of decentralized SCN algorithms while effectively protecting user privacy. To this end, we propose a decentralized semi-supervised learning algorithm for SCN, called DMT-SCN, which introduces teacher and student models by combining the idea of consistency regularization to improve the response speed of model iterations. In order to reduce the possible negative impact of unsupervised data on the model, we purposely change the way of adding noise to the unlabeled data. Simulation results show that the algorithm can effectively utilize unlabeled data to improve the classification accuracy of SCN training and is robust under different ground simulation environments.展开更多
Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition...Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition method based on a small amount of labeled data is developed.First,a small amount of labeled data are randomly sampled by using the bootstrap method,loss functions for three common deep learning net-works are improved,the uniform distribution and cross-entropy function are combined to reduce the overconfidence of softmax classification.Subsequently,the dataset obtained after sam-pling is adopted to train three improved networks so as to build the initial model.In addition,the unlabeled data are preliminarily screened through dynamic time warping(DTW)and then input into the initial model trained previously for judgment.If the judg-ment results of two or more networks are consistent,the unla-beled data are labeled and put into the labeled data set.Lastly,the three network models are input into the labeled dataset for training,and the final model is built.As revealed by the simula-tion results,the semi-supervised learning method adopted in this paper is capable of exploiting a small amount of labeled data and basically achieving the accuracy of labeled data recognition.展开更多
In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficie...In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.展开更多
A Laplacian support vector machine (LapSVM) algorithm, a semi-supervised learning based on manifold, is introduced to brain-computer interface (BCI) to raise the classification precision and reduce the subjects' ...A Laplacian support vector machine (LapSVM) algorithm, a semi-supervised learning based on manifold, is introduced to brain-computer interface (BCI) to raise the classification precision and reduce the subjects' training complexity. The data are collected from three subjects in a three-task mental imagery experiment. LapSVM and transductive SVM (TSVM) are trained with a few labeled samples and a large number of unlabeled samples. The results confirm that LapSVM has a much better classification than TSVM.展开更多
In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploit...In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturally gives relative importance to the labeled ones through a graph-based methodology. Then it tries to extract discriminative features on each manifold such that the data points in the same manifold become closer. The effectiveness of the proposed multi-manifold learning algorithm is demonstrated and compared through experiments on a real hyperspectral images.展开更多
The state of health(SOH)and remaining useful life(RUL)of lithium-ion batteries are crucial for health management and diagnosis.However,most data-driven estimation methods heavily rely on scarce labeled data,while trad...The state of health(SOH)and remaining useful life(RUL)of lithium-ion batteries are crucial for health management and diagnosis.However,most data-driven estimation methods heavily rely on scarce labeled data,while traditional transfer learning faces challenges in handling domain shifts across various battery types.This paper proposes an enhanced vision-transformer integrating with semi-supervised transfer learning for SOH and RUL estimation of lithium-ion batteries.A depth-wise separable convolutional vision-transformer is developed to extract local aging details with depth-wise convolutions and establishes global dependencies between aging information using multi-head attention.Maximum mean discrepancy is employed to initially reduce the distribution difference between the source and target domains,providing a superior starting point for fine-tuning the target domain model.Subsequently,the abundant aging data of the same type as the target battery are labeled through semi-supervised learning,compensating for the source model's limitations in capturing target battery aging characteristics.Consistency regularization incorporates the cross-entropy between predictions with and without adversarial perturbations into the gradient backpropagation of the overall model.In particular,across the experimental groups 13–15 for different types of batteries,the root mean square error of SOH estimation was less than 0.66%,and the mean relative error of RUL estimation was 3.86%.Leveraging extensive unlabeled aging data,the proposed method could achieve accurate estimation of SOH and RUL.展开更多
The limited labeled sample data in the field of advanced security threats detection seriously restricts the effective development of research work.Learning the sample labels from the labeled and unlabeled data has rec...The limited labeled sample data in the field of advanced security threats detection seriously restricts the effective development of research work.Learning the sample labels from the labeled and unlabeled data has received a lot of research attention and various universal labeling methods have been proposed.However,the labeling task of malicious communication samples targeted at advanced threats has to face the two practical challenges:the difficulty of extracting effective features in advance and the complexity of the actual sample types.To address these problems,we proposed a sample labeling method for malicious communication based on semi-supervised deep neural network.This method supports continuous learning and optimization feature representation while labeling sample,and can handle uncertain samples that are outside the concerned sample types.According to the experimental results,our proposed deep neural network can automatically learn effective feature representation,and the validity of features is close to or even higher than that of features which extracted based on expert knowledge.Furthermore,our proposed method can achieve the labeling accuracy of 97.64%~98.50%,which is more accurate than the train-then-detect,kNN and LPA methodsin any labeled-sample proportion condition.The problem of insufficient labeled samples in many network attack detecting scenarios,and our proposed work can function as a reference for the sample labeling tasks in the similar real-world scenarios.展开更多
Considering limitations of Linear Discriminant Analysis (LDA) and Marginal Fisher Analysis (MFA), a novel discriminant analysis called Local Correlation Discriminant Analysis (LCDA) is proposed in this paper. The main...Considering limitations of Linear Discriminant Analysis (LDA) and Marginal Fisher Analysis (MFA), a novel discriminant analysis called Local Correlation Discriminant Analysis (LCDA) is proposed in this paper. The main idea behind LCDA is to use more robust similarity measure, correlation metric, to measure the local similarity between image data. This results in better classifi-cation performance. In addition, to further improve the discriminant power of LCDA, we extend LCDA to semi-supervised case, which can make use of both labeled and unlabeled data to perform dis-criminant analysis. Extensive experimental results on ORL and AR face databases demonstrate that the proposed LCDA and its semi-supervised version are superior to Principal Component Analysis (PCA), LDA, CEA, and MFA.展开更多
For indoor location estimation based on received signal strength( RSS) in wireless local area networks( WLAN),in order to reduce the influence of noise on the positioning accuracy,a large number of RSS should be colle...For indoor location estimation based on received signal strength( RSS) in wireless local area networks( WLAN),in order to reduce the influence of noise on the positioning accuracy,a large number of RSS should be collected in offline phase. Therefore,collecting training data with positioning information is time consuming which becomes the bottleneck of WLAN indoor localization. In this paper,the traditional semisupervised learning method based on k-NN and ε-NN graph for reducing collection workload of offline phase are analyzed,and the result shows that the k-NN or ε-NN graph are sensitive to data noise,which limit the performance of semi-supervised learning WLAN indoor localization system. Aiming at the above problem,it proposes a l1-graph-algorithm-based semi-supervised learning( LG-SSL) indoor localization method in which the graph is built by l1-norm algorithm. In our system,it firstly labels the unlabeled data using LG-SSL and labeled data to build the Radio Map in offline training phase,and then uses LG-SSL to estimate user's location in online phase. Extensive experimental results show that,benefit from the robustness to noise and sparsity ofl1-graph,LG-SSL exhibits superior performance by effectively reducing the collection workload in offline phase and improving localization accuracy in online phase.展开更多
To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms....To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms.Develop a semi-supervised document clustering approach based on the latent Dirichlet allocation(LDA)model,namely,pLDA,guided by the user provided key terms.Propose a generalized Polya urn(GPU) model to integrate the user preferences to the document clustering process.A Gibbs sampler was investigated to infer the document collection structure.Experiments on real datasets were taken to explore the performance of pLDA.The results demonstrate that the pLDA approach is effective.展开更多
Semi-supervised dimensionality reduction is an important research area for data classification. A new linear dimensionality reduction approach, global inference preserving projection (GIPP), was proposed to perform ...Semi-supervised dimensionality reduction is an important research area for data classification. A new linear dimensionality reduction approach, global inference preserving projection (GIPP), was proposed to perform classification task in semi-supervised case. GIPP provided a global structure that utilized the underlying discriminative knowledge of unlabeled samples. It used path-based dissimilarity measurement to infer the class label information for unlabeled samples and transformd the diseriminant algorithm into a generalized eigenequation problem. Experimental results demonstrate the effectiveness of the proposed approach.展开更多
Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类...Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类簇中心和局部密度可选出数据空间结构表现较好的样本。DPC-TT算法采用密度峰值聚类算法获取训练数据的类簇中心和样本的局部密度,对类簇中心的截断距离范围内的样本认定为空间结构表现较好,标记为核心数据,使用核心数据更新分类器,可降低迭代过程中的训练噪声,进而提高分类器的性能。实验结果表明:相比于标准Tritraining算法及其改进算法,DPC-TT算法具有更好的分类性能。展开更多
文摘The aim of this paper is to broaden the application of Stochastic Configuration Network (SCN) in the semi-supervised domain by utilizing common unlabeled data in daily life. It can enhance the classification accuracy of decentralized SCN algorithms while effectively protecting user privacy. To this end, we propose a decentralized semi-supervised learning algorithm for SCN, called DMT-SCN, which introduces teacher and student models by combining the idea of consistency regularization to improve the response speed of model iterations. In order to reduce the possible negative impact of unsupervised data on the model, we purposely change the way of adding noise to the unlabeled data. Simulation results show that the algorithm can effectively utilize unlabeled data to improve the classification accuracy of SCN training and is robust under different ground simulation environments.
文摘Rare labeled data are difficult to recognize by using conventional methods in the process of radar emitter recogni-tion.To solve this problem,an optimized cooperative semi-supervised learning radar emitter recognition method based on a small amount of labeled data is developed.First,a small amount of labeled data are randomly sampled by using the bootstrap method,loss functions for three common deep learning net-works are improved,the uniform distribution and cross-entropy function are combined to reduce the overconfidence of softmax classification.Subsequently,the dataset obtained after sam-pling is adopted to train three improved networks so as to build the initial model.In addition,the unlabeled data are preliminarily screened through dynamic time warping(DTW)and then input into the initial model trained previously for judgment.If the judg-ment results of two or more networks are consistent,the unla-beled data are labeled and put into the labeled data set.Lastly,the three network models are input into the labeled dataset for training,and the final model is built.As revealed by the simula-tion results,the semi-supervised learning method adopted in this paper is capable of exploiting a small amount of labeled data and basically achieving the accuracy of labeled data recognition.
基金Supported by the National Program on Key Basic Research Project(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.
基金supported by the National Natural Science Foundation of China under Grant No. 30525030, 60701015, and 60736029.
文摘A Laplacian support vector machine (LapSVM) algorithm, a semi-supervised learning based on manifold, is introduced to brain-computer interface (BCI) to raise the classification precision and reduce the subjects' training complexity. The data are collected from three subjects in a three-task mental imagery experiment. LapSVM and transductive SVM (TSVM) are trained with a few labeled samples and a large number of unlabeled samples. The results confirm that LapSVM has a much better classification than TSVM.
文摘In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturally gives relative importance to the labeled ones through a graph-based methodology. Then it tries to extract discriminative features on each manifold such that the data points in the same manifold become closer. The effectiveness of the proposed multi-manifold learning algorithm is demonstrated and compared through experiments on a real hyperspectral images.
基金supported by the Science and Technology Major Project of Fujian Province of China(Grant No.2022HZ028018)the National Natural Science Foundation of China(Grant No.51907030).
文摘The state of health(SOH)and remaining useful life(RUL)of lithium-ion batteries are crucial for health management and diagnosis.However,most data-driven estimation methods heavily rely on scarce labeled data,while traditional transfer learning faces challenges in handling domain shifts across various battery types.This paper proposes an enhanced vision-transformer integrating with semi-supervised transfer learning for SOH and RUL estimation of lithium-ion batteries.A depth-wise separable convolutional vision-transformer is developed to extract local aging details with depth-wise convolutions and establishes global dependencies between aging information using multi-head attention.Maximum mean discrepancy is employed to initially reduce the distribution difference between the source and target domains,providing a superior starting point for fine-tuning the target domain model.Subsequently,the abundant aging data of the same type as the target battery are labeled through semi-supervised learning,compensating for the source model's limitations in capturing target battery aging characteristics.Consistency regularization incorporates the cross-entropy between predictions with and without adversarial perturbations into the gradient backpropagation of the overall model.In particular,across the experimental groups 13–15 for different types of batteries,the root mean square error of SOH estimation was less than 0.66%,and the mean relative error of RUL estimation was 3.86%.Leveraging extensive unlabeled aging data,the proposed method could achieve accurate estimation of SOH and RUL.
基金partially funded by the National Natural Science Foundation of China (Grant No. 61272447)National Entrepreneurship & Innovation Demonstration Base of China (Grant No. C700011)Key Research & Development Project of Sichuan Province of China (Grant No. 2018G20100)
文摘The limited labeled sample data in the field of advanced security threats detection seriously restricts the effective development of research work.Learning the sample labels from the labeled and unlabeled data has received a lot of research attention and various universal labeling methods have been proposed.However,the labeling task of malicious communication samples targeted at advanced threats has to face the two practical challenges:the difficulty of extracting effective features in advance and the complexity of the actual sample types.To address these problems,we proposed a sample labeling method for malicious communication based on semi-supervised deep neural network.This method supports continuous learning and optimization feature representation while labeling sample,and can handle uncertain samples that are outside the concerned sample types.According to the experimental results,our proposed deep neural network can automatically learn effective feature representation,and the validity of features is close to or even higher than that of features which extracted based on expert knowledge.Furthermore,our proposed method can achieve the labeling accuracy of 97.64%~98.50%,which is more accurate than the train-then-detect,kNN and LPA methodsin any labeled-sample proportion condition.The problem of insufficient labeled samples in many network attack detecting scenarios,and our proposed work can function as a reference for the sample labeling tasks in the similar real-world scenarios.
基金Supproted by the National Natural Science Foundation of China(No.60875004)the Natural Science Foundation of Jiangsu Province of China(No.BK2009184)the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(No.07KJB520133)
文摘Considering limitations of Linear Discriminant Analysis (LDA) and Marginal Fisher Analysis (MFA), a novel discriminant analysis called Local Correlation Discriminant Analysis (LCDA) is proposed in this paper. The main idea behind LCDA is to use more robust similarity measure, correlation metric, to measure the local similarity between image data. This results in better classifi-cation performance. In addition, to further improve the discriminant power of LCDA, we extend LCDA to semi-supervised case, which can make use of both labeled and unlabeled data to perform dis-criminant analysis. Extensive experimental results on ORL and AR face databases demonstrate that the proposed LCDA and its semi-supervised version are superior to Principal Component Analysis (PCA), LDA, CEA, and MFA.
基金Sponsored by the National Natural Science Foundation of China(Grant No.61101122)the National High Technology Research and Development Program of China(Grant No.2012AA120802)the National Science and Technology Major Project of the Ministry of Science and Technology of China(Grant No.2012ZX03004-003)
文摘For indoor location estimation based on received signal strength( RSS) in wireless local area networks( WLAN),in order to reduce the influence of noise on the positioning accuracy,a large number of RSS should be collected in offline phase. Therefore,collecting training data with positioning information is time consuming which becomes the bottleneck of WLAN indoor localization. In this paper,the traditional semisupervised learning method based on k-NN and ε-NN graph for reducing collection workload of offline phase are analyzed,and the result shows that the k-NN or ε-NN graph are sensitive to data noise,which limit the performance of semi-supervised learning WLAN indoor localization system. Aiming at the above problem,it proposes a l1-graph-algorithm-based semi-supervised learning( LG-SSL) indoor localization method in which the graph is built by l1-norm algorithm. In our system,it firstly labels the unlabeled data using LG-SSL and labeled data to build the Radio Map in offline training phase,and then uses LG-SSL to estimate user's location in online phase. Extensive experimental results show that,benefit from the robustness to noise and sparsity ofl1-graph,LG-SSL exhibits superior performance by effectively reducing the collection workload in offline phase and improving localization accuracy in online phase.
基金National Natural Science Foundations of China(Nos.61262006,61462011,61202089)the Major Applied Basic Research Program of Guizhou Province Project,China(No.JZ20142001)+2 种基金the Science and Technology Foundation of Guizhou Province Project,China(No.LH20147636)the National Research Foundation for the Doctoral Program of Higher Education of China(No.20125201120006)the Graduate Innovated Foundations of Guizhou University Project,China(No.2015012)
文摘To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms.Develop a semi-supervised document clustering approach based on the latent Dirichlet allocation(LDA)model,namely,pLDA,guided by the user provided key terms.Propose a generalized Polya urn(GPU) model to integrate the user preferences to the document clustering process.A Gibbs sampler was investigated to infer the document collection structure.Experiments on real datasets were taken to explore the performance of pLDA.The results demonstrate that the pLDA approach is effective.
基金National Natural Science Foundations of China (No.61072090,60874113)
文摘Semi-supervised dimensionality reduction is an important research area for data classification. A new linear dimensionality reduction approach, global inference preserving projection (GIPP), was proposed to perform classification task in semi-supervised case. GIPP provided a global structure that utilized the underlying discriminative knowledge of unlabeled samples. It used path-based dissimilarity measurement to infer the class label information for unlabeled samples and transformd the diseriminant algorithm into a generalized eigenequation problem. Experimental results demonstrate the effectiveness of the proposed approach.
文摘Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类簇中心和局部密度可选出数据空间结构表现较好的样本。DPC-TT算法采用密度峰值聚类算法获取训练数据的类簇中心和样本的局部密度,对类簇中心的截断距离范围内的样本认定为空间结构表现较好,标记为核心数据,使用核心数据更新分类器,可降低迭代过程中的训练噪声,进而提高分类器的性能。实验结果表明:相比于标准Tritraining算法及其改进算法,DPC-TT算法具有更好的分类性能。