With the rapid development of Internet of Things(IoT)technology,IoT systems have been widely applied in health-care,transportation,home,and other fields.However,with the continuous expansion of the scale and increasin...With the rapid development of Internet of Things(IoT)technology,IoT systems have been widely applied in health-care,transportation,home,and other fields.However,with the continuous expansion of the scale and increasing complexity of IoT systems,the stability and security issues of IoT systems have become increasingly prominent.Thus,it is crucial to detect anomalies in the collected IoT time series from various sensors.Recently,deep learning models have been leveraged for IoT anomaly detection.However,owing to the challenges associated with data labeling,most IoT anomaly detection methods resort to unsupervised learning techniques.Nevertheless,the absence of accurate abnormal information in unsupervised learning methods limits their performance.To address these problems,we propose AS-GCN-MTM,an adaptive structural Graph Convolutional Networks(GCN)-based framework using a mean-teacher mechanism(AS-GCN-MTM)for anomaly identification.It performs better than unsupervised methods using only a small amount of labeled data.Mean Teachers is an effective semi-supervised learning method that utilizes unlabeled data for training to improve the generalization ability and performance of the model.However,the dependencies between data are often unknown in time series data.To solve this problem,we designed a graph structure adaptive learning layer based on neural networks,which can automatically learn the graph structure from time series data.It not only better captures the relationships between nodes but also enhances the model’s performance by augmenting key data.Experiments have demonstrated that our method improves the baseline model with the highest F1 value by 10.4%,36.1%,and 5.6%,respectively,on three real datasets with a 10%data labeling rate.展开更多
In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world da...In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.展开更多
Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to bes...Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to best improve performance while limiting the number of new labels."Model Change"active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s).We pair this idea with graph-based semi-supervised learning(SSL)methods,that use the spectrum of the graph Laplacian matrix,which can be truncated to avoid prohibitively large computational and storage costs.We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution.We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.展开更多
The aim of this paper is to broaden the application of Stochastic Configuration Network (SCN) in the semi-supervised domain by utilizing common unlabeled data in daily life. It can enhance the classification accuracy ...The aim of this paper is to broaden the application of Stochastic Configuration Network (SCN) in the semi-supervised domain by utilizing common unlabeled data in daily life. It can enhance the classification accuracy of decentralized SCN algorithms while effectively protecting user privacy. To this end, we propose a decentralized semi-supervised learning algorithm for SCN, called DMT-SCN, which introduces teacher and student models by combining the idea of consistency regularization to improve the response speed of model iterations. In order to reduce the possible negative impact of unsupervised data on the model, we purposely change the way of adding noise to the unlabeled data. Simulation results show that the algorithm can effectively utilize unlabeled data to improve the classification accuracy of SCN training and is robust under different ground simulation environments.展开更多
Timely inspection of defects on the surfaces of wind turbine blades can effectively prevent unpredictable accidents.To this end,this study proposes a semi-supervised object-detection network based on You Only Looking ...Timely inspection of defects on the surfaces of wind turbine blades can effectively prevent unpredictable accidents.To this end,this study proposes a semi-supervised object-detection network based on You Only Looking Once version 4(YOLOv4).A semi-supervised structure comprising a generative adversarial network(GAN)was designed to overcome the difficulty in obtaining sufficient samples and sample labeling.In a GAN,the generator is realized by an encoder-decoder network,where the backbone of the encoder is YOLOv4 and the decoder comprises inverse convolutional layers.Partial features from the generator are passed to the defect detection network.Deploying several unlabeled images can significantly improve the generalization and recognition capabilities of defect-detection models.The small-scale object detection capacity of the network can be improved by enhancing essential features in the feature map by adding the concurrent spatial and channel squeeze and excitation(scSE)attention module to the three parts of the YOLOv4 network.A balancing improvement was made to the loss function of YOLOv4 to overcome the imbalance problem of the defective species.The results for both the single-and multi-category defect datasets show that the improved model can make good use of the features of the unlabeled images.The accuracy of wind turbine blade defect detection also has a significant advantage over classical object detection algorithms,including faster R-CNN and DETR.展开更多
Intrusion detection involves identifying unauthorized network activity and recognizing whether the data constitute an abnormal network transmission.Recent research has focused on using semi-supervised learning mechani...Intrusion detection involves identifying unauthorized network activity and recognizing whether the data constitute an abnormal network transmission.Recent research has focused on using semi-supervised learning mechanisms to identify abnormal network traffic to deal with labeled and unlabeled data in the industry.However,real-time training and classifying network traffic pose challenges,as they can lead to the degradation of the overall dataset and difficulties preventing attacks.Additionally,existing semi-supervised learning research might need to analyze the experimental results comprehensively.This paper proposes XA-GANomaly,a novel technique for explainable adaptive semi-supervised learning using GANomaly,an image anomalous detection model that dynamically trains small subsets to these issues.First,this research introduces a deep neural network(DNN)-based GANomaly for semi-supervised learning.Second,this paper presents the proposed adaptive algorithm for the DNN-based GANomaly,which is validated with four subsets of the adaptive dataset.Finally,this study demonstrates a monitoring system that incorporates three explainable techniques—Shapley additive explanations,reconstruction error visualization,and t-distributed stochastic neighbor embedding—to respond effectively to attacks on traffic data at each feature engineering stage,semi-supervised learning,and adaptive learning.Compared to other single-class classification techniques,the proposed DNN-based GANomaly achieves higher scores for Network Security Laboratory-Knowledge Discovery in Databases and UNSW-NB15 datasets at 13%and 8%of F1 scores and 4.17%and 11.51%for accuracy,respectively.Furthermore,experiments of the proposed adaptive learning reveal mostly improved results over the initial values.An analysis and monitoring system based on the combination of the three explainable methodologies is also described.Thus,the proposed method has the potential advantages to be applied in practical industry,and future research will explore handling unbalanced real-time datasets in various scenarios.展开更多
Radio frequency fingerprinting(RFF)is a remarkable lightweight authentication scheme to support rapid and scalable identification in the internet of things(IoT)systems.Deep learning(DL)is a critical enabler of RFF ide...Radio frequency fingerprinting(RFF)is a remarkable lightweight authentication scheme to support rapid and scalable identification in the internet of things(IoT)systems.Deep learning(DL)is a critical enabler of RFF identification by leveraging the hardware-level features.However,traditional supervised learning methods require huge labeled training samples.Therefore,how to establish a highperformance supervised learning model with few labels under practical application is still challenging.To address this issue,we in this paper propose a novel RFF semi-supervised learning(RFFSSL)model which can obtain a better performance with few meta labels.Specifically,the proposed RFFSSL model is constituted by a teacher-student network,in which the student network learns from the pseudo label predicted by the teacher.Then,the output of the student model will be exploited to improve the performance of teacher among the labeled data.Furthermore,a comprehensive evaluation on the accuracy is conducted.We derive about 50 GB real long-term evolution(LTE)mobile phone’s raw signal datasets,which is used to evaluate various models.Experimental results demonstrate that the proposed RFFSSL scheme can achieve up to 97%experimental testing accuracy over a noisy environment only with 10%labeled samples when training samples equal to 2700.展开更多
In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In thi...In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.展开更多
Through semi-supervised learning and knowledge inheritance,a novel Takagi-Sugeno-Kang(TSK)fuzzy system framework is proposed for epilepsy data classification in this study.The new method is based on the maximum mean d...Through semi-supervised learning and knowledge inheritance,a novel Takagi-Sugeno-Kang(TSK)fuzzy system framework is proposed for epilepsy data classification in this study.The new method is based on the maximum mean discrepancy(MMD)method and TSK fuzzy system,as a basic model for the classification of epilepsy data.First,formedical data,the interpretability of TSK fuzzy systems can ensure that the prediction results are traceable and safe.Second,in view of the deviation in the data distribution between the real source domain and the target domain,MMD is used to measure the distance between different data distributions.The objective function is constructed according to the MMD distance,and the distribution distance of different datasets is minimized to find the similar characteristics of different datasets.We introduce semi-supervised learning to further explore the relationship between data.Based on the MMD method,a semi-supervised learning(SSL)-MMD method is constructed by using pseudo-tags to realize the data distribution alignment of the same category.In addition,the idea of knowledge dissemination is used to learn pseudo-tags as additional data features.Finally,for epilepsy classification,the cross-domain TSK fuzzy system uses the cross-entropy function as the objective function and adopts the back-propagation strategy to optimize the parameters.The experimental results show that the new method can process complex epilepsy data and identify whether patients have epilepsy.展开更多
Malaria is a lethal disease responsible for thousands of deaths worldwide every year.Manual methods of malaria diagnosis are timeconsuming that require a great deal of human expertise and efforts.Computerbased automat...Malaria is a lethal disease responsible for thousands of deaths worldwide every year.Manual methods of malaria diagnosis are timeconsuming that require a great deal of human expertise and efforts.Computerbased automated diagnosis of diseases is progressively becoming popular.Although deep learning models show high performance in the medical field,it demands a large volume of data for training which is hard to acquire for medical problems.Similarly,labeling of medical images can be done with the help of medical experts only.Several recent studies have utilized deep learning models to develop efficient malaria diagnostic system,which showed promising results.However,the most common problem with these models is that they need a large amount of data for training.This paper presents a computer-aided malaria diagnosis system that combines a semi-supervised generative adversarial network and transfer learning.The proposed model is trained in a semi-supervised manner and requires less training data than conventional deep learning models.Performance of the proposed model is evaluated on a publicly available dataset of blood smear images(with malariainfected and normal class)and achieved a classification accuracy of 96.6%.展开更多
Sentiment classification is a useful tool to classify reviews about sentiments and attitudes towards a product or service.Existing studies heavily rely on sentiment classification methods that require fully annotated ...Sentiment classification is a useful tool to classify reviews about sentiments and attitudes towards a product or service.Existing studies heavily rely on sentiment classification methods that require fully annotated inputs.However,there is limited labelled text available,making the acquirement process of the fully annotated input costly and labour-intensive.Lately,semi-supervised methods emerge as they require only partially labelled input but perform comparably to supervised methods.Nevertheless,some works reported that the performance of the semi-supervised model degraded after adding unlabelled instances into training.Literature also shows that not all unlabelled instances are equally useful;thus identifying the informative unlabelled instances is beneficial in training a semi-supervised model.To achieve this,an informative score is proposed and incorporated into semisupervised sentiment classification.The evaluation is performed on a semisupervised method without an informative score and with an informative score.By using the informative score in the instance selection strategy to identify informative unlabelled instances,semi-supervised models perform better compared to models that do not incorporate informative scores into their training.Although the performance of semi-supervised models incorporated with an informative score is not able to surpass the supervised models,the results are still found promising as the differences in performance are subtle with a small difference of 2%to 5%,but the number of labelled instances used is greatly reduced from100%to 40%.The best finding of the proposed instance selection strategy is achieved when incorporating an informative score with a baseline confidence score at a 0.5:0.5 ratio using only 40%labelled data.展开更多
Label propagation is an essential semi-supervised learning method based on graphs,which has a broad spectrum of applications in pattern recognition and data mining.This paper proposes a quantum semi-supervised classif...Label propagation is an essential semi-supervised learning method based on graphs,which has a broad spectrum of applications in pattern recognition and data mining.This paper proposes a quantum semi-supervised classifier based on label propagation.Considering the difficulty of graph construction,we develop a variational quantum label propagation(VQLP)method.In this method,a locally parameterized quantum circuit is created to reduce the parameters required in the optimization.Furthermore,we design a quantum semi-supervised binary classifier based on hybrid Bell and Z bases measurement,which has a shallower circuit depth and is more suitable for implementation on near-term quantum devices.We demonstrate the performance of the quantum semi-supervised classifier on the Iris data set,and the simulation results show that the quantum semi-supervised classifier has higher classification accuracy than the swap test classifier.This work opens a new path to quantum machine learning based on graphs.展开更多
Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize cl...Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize clustering for cognitive research.Dealing with noisy data caused by inaccurate synthesis from several sources or misleading data production processes is one of the most intriguing clustering difficulties.Noisy data can lead to incorrect object recognition and inference.This research aims to innovate a novel clustering approach,named Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering(PNTS3FCM),to solve the clustering problem with noisy data using neutral and refusal degrees in the definition of Picture Fuzzy Set(PFS)and Neutrosophic Set(NS).Our contribution is to propose a new optimization model with four essential components:clustering,outlier removal,safe semi-supervised fuzzy clustering and partitioning with labeled and unlabeled data.The effectiveness and flexibility of the proposed technique are estimated and compared with the state-of-art methods,standard Picture fuzzy clustering(FC-PFS)and Confidence-weighted safe semi-supervised clustering(CS3FCM)on benchmark UCI datasets.The experimental results show that our method is better at least 10/15 datasets than the compared methods in terms of clustering quality and computational time.展开更多
Recent state-of-the-art semi-supervised learning(SSL)methods usually use data augmentations as core components.Such methods,however,are limited to simple transformations such as the augmentations under the instance’s...Recent state-of-the-art semi-supervised learning(SSL)methods usually use data augmentations as core components.Such methods,however,are limited to simple transformations such as the augmentations under the instance’s naive representations or the augmentations under the instance’s semantic representations.To tackle this problem,we offer a unique insight into data augmentations and propose a novel data-augmentation-based semi-supervised learning method,called Attentive Neighborhood Feature Aug-mentation(ANFA).The motivation of our method lies in the observation that the relationship between the given feature and its neighborhood may contribute to constructing more reliable transformations for the data,and further facilitating the classifier to distinguish the ambiguous features from the low-dense regions.Specially,we first project the labeled and unlabeled data points into an embedding space and then construct a neighbor graph that serves as a similarity measure based on the similar representations in the embedding space.Then,we employ an attention mechanism to transform the target features into augmented ones based on the neighbor graph.Finally,we formulate a novel semi-supervised loss by encouraging the predictions of the interpolations of augmented features to be consistent with the corresponding interpolations of the predictions of the target features.We carried out exper-iments on SVHN and CIFAR-10 benchmark datasets and the experimental results demonstrate that our method outperforms the state-of-the-art methods when the number of labeled examples is limited.展开更多
Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.The...Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.Therefore,measuring the distance between sample points is crucial to the effectiveness of clustering.Filtering features by label information and mea-suring the distance between samples by these features is a common supervised learning method to reconstruct distance metric.However,in many application scenarios,it is very expensive to obtain a large number of labeled samples.In this paper,to solve the clustering problem in the few supervised sample and high data dimensionality scenarios,a novel semi-supervised clustering algorithm is proposed by designing an improved prototype network that attempts to reconstruct the distance metric in the sample space with a small amount of pairwise supervised information,such as Must-Link and Cannot-Link,and then cluster the data in the new metric space.The core idea is to make the similar ones closer and the dissimilar ones further away through embedding mapping.Extensive experiments on both real-world and synthetic datasets show the effectiveness of this algorithm.Average clustering metrics on various datasets improved by 8%compared to the comparison algorithm.展开更多
At the early stages of deep-water oil exploration and development, fewer and further apart wells are drilled than in onshore oilfields. Supervised least squares support vector machine algorithms are used to predict th...At the early stages of deep-water oil exploration and development, fewer and further apart wells are drilled than in onshore oilfields. Supervised least squares support vector machine algorithms are used to predict the reservoir parameters but the prediction accuracy is low. We combined the least squares support vector machine (LSSVM) algorithm with semi-supervised learning and established a semi-supervised regression model, which we call the semi-supervised least squares support vector machine (SLSSVM) model. The iterative matrix inversion is also introduced to improve the training ability and training time of the model. We use the UCI data to test the generalization of a semi-supervised and a supervised LSSVM models. The test results suggest that the generalization performance of the LSSVM model greatly improves and with decreasing training samples the generalization performance is better. Moreover, for small-sample models, the SLSSVM method has higher precision than the semi-supervised K-nearest neighbor (SKNN) method. The new semi- supervised LSSVM algorithm was used to predict the distribution of porosity and sandstone in the Jingzhou study area.展开更多
Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samp...Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samples are preprocessed different categories of features including pitch zero-cross rate energy durance formant and Mel frequency cepstrum coefficient MFCC as well as their statistical parameters are extracted from the utterances of samples.In the dimensionality reduction stage before the feature vectors are sent into classifiers parameter-optimized SDA and KSDA are performed to reduce dimensionality.Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning such as linear discriminant analysis LDA locality preserving projections LPP marginal Fisher analysis MFA etc. when multi-class support vector machine SVM classifiers are used.Additionally KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA.展开更多
Deep Learning(DL)is such a powerful tool that we have seen tremendous success in areas such as Computer Vision,Speech Recognition,and Natural Language Processing.Since Automated Modulation Classification(AMC)is an imp...Deep Learning(DL)is such a powerful tool that we have seen tremendous success in areas such as Computer Vision,Speech Recognition,and Natural Language Processing.Since Automated Modulation Classification(AMC)is an important part in Cognitive Radio Networks,we try to explore its potential in solving signal modulation recognition problem.It cannot be overlooked that DL model is a complex model,thus making them prone to over-fitting.DL model requires many training data to combat with over-fitting,but adding high quality labels to training data manually is not always cheap and accessible,especially in real-time system,which may counter unprecedented data in dataset.Semi-supervised Learning is a way to exploit unlabeled data effectively to reduce over-fitting in DL.In this paper,we extend Generative Adversarial Networks(GANs)to the semi-supervised learning will show it is a method can be used to create a more dataefficient classifier.展开更多
A method is proposed to resolve the typical problem of air combat situation assessment. Taking the one-to-one air combat as an example and on the basis of air combat data recorded by the air combat maneuvering instrum...A method is proposed to resolve the typical problem of air combat situation assessment. Taking the one-to-one air combat as an example and on the basis of air combat data recorded by the air combat maneuvering instrument, the problem of air combat situation assessment is equivalent to the situation classification problem of air combat data. The fuzzy C-means clustering algorithm is proposed to cluster the selected air combat sample data and the situation classification of the data is determined by the data correlation analysis in combination with the clustering results and the pilots' description of the air combat process. On the basis of semi-supervised naive Bayes classifier, an improved algorithm is proposed based on data classification confidence, through which the situation classification of air combat data is carried out. The simulation results show that the improved algorithm can assess the air combat situation effectively and the improvement of the algorithm can promote the classification performance without significantly affecting the efficiency of the classifier.展开更多
A Machine Learning (ML)-based Intrusion Detection and Prevention System (IDPS)requires a large amount of labeled up-to-date training data to effectively detect intrusions and generalize well to novel attacks.However,t...A Machine Learning (ML)-based Intrusion Detection and Prevention System (IDPS)requires a large amount of labeled up-to-date training data to effectively detect intrusions and generalize well to novel attacks.However,the labeling of data is costly and becomes infeasible when dealing with big data,such as those generated by Intemet of Things applications.To this effect,building an ML model that learns from non-labeled or partially labeled data is of critical importance.This paper proposes a Semi-supervised Mniti-Layered Clustering ((SMLC))model for the detection and prevention of network intrusion.SMLC has the capability to learn from partially labeled data while achieving a detection performance comparable to that of supervised ML-based IDPS.The performance of SMLC is compared with that of a well-known semi-supervised model (tri-training)and of supervised ensemble ML models, namely Random.Forest,Bagging,and AdaboostM1on two benchmark network-intrusion datasets,NSL and Kyoto 2006+.Experimental resnits show that SMLC is superior to tri-training,providing a comparable detection accuracy with 20%less labeled instances of training data.Furthermore,our results demonstrate that our scheme has a detection accuracy comparable to that of the supervised ensemble models.展开更多
基金This research is partially supported by the National Natural Science Foundation of China under Grant No.62376043Science and Technology Program of Sichuan Province under Grant Nos.2020JDRC0067,2023JDRC0087,and 24NSFTD0025.
文摘With the rapid development of Internet of Things(IoT)technology,IoT systems have been widely applied in health-care,transportation,home,and other fields.However,with the continuous expansion of the scale and increasing complexity of IoT systems,the stability and security issues of IoT systems have become increasingly prominent.Thus,it is crucial to detect anomalies in the collected IoT time series from various sensors.Recently,deep learning models have been leveraged for IoT anomaly detection.However,owing to the challenges associated with data labeling,most IoT anomaly detection methods resort to unsupervised learning techniques.Nevertheless,the absence of accurate abnormal information in unsupervised learning methods limits their performance.To address these problems,we propose AS-GCN-MTM,an adaptive structural Graph Convolutional Networks(GCN)-based framework using a mean-teacher mechanism(AS-GCN-MTM)for anomaly identification.It performs better than unsupervised methods using only a small amount of labeled data.Mean Teachers is an effective semi-supervised learning method that utilizes unlabeled data for training to improve the generalization ability and performance of the model.However,the dependencies between data are often unknown in time series data.To solve this problem,we designed a graph structure adaptive learning layer based on neural networks,which can automatically learn the graph structure from time series data.It not only better captures the relationships between nodes but also enhances the model’s performance by augmenting key data.Experiments have demonstrated that our method improves the baseline model with the highest F1 value by 10.4%,36.1%,and 5.6%,respectively,on three real datasets with a 10%data labeling rate.
基金supported in part by the National Natural Science Foundation of China under Grant 62171203in part by the Jiangsu Province“333 Project”High-Level Talent Cultivation Subsidized Project+2 种基金in part by the SuzhouKey Supporting Subjects for Health Informatics under Grant SZFCXK202147in part by the Changshu Science and Technology Program under Grants CS202015 and CS202246in part by Changshu Key Laboratory of Medical Artificial Intelligence and Big Data under Grants CYZ202301 and CS202314.
文摘In this paper,we introduce a novel Multi-scale and Auto-tuned Semi-supervised Deep Subspace Clustering(MAS-DSC)algorithm,aimed at addressing the challenges of deep subspace clustering in high-dimensional real-world data,particularly in the field of medical imaging.Traditional deep subspace clustering algorithms,which are mostly unsupervised,are limited in their ability to effectively utilize the inherent prior knowledge in medical images.Our MAS-DSC algorithm incorporates a semi-supervised learning framework that uses a small amount of labeled data to guide the clustering process,thereby enhancing the discriminative power of the feature representations.Additionally,the multi-scale feature extraction mechanism is designed to adapt to the complexity of medical imaging data,resulting in more accurate clustering performance.To address the difficulty of hyperparameter selection in deep subspace clustering,this paper employs a Bayesian optimization algorithm for adaptive tuning of hyperparameters related to subspace clustering,prior knowledge constraints,and model loss weights.Extensive experiments on standard clustering datasets,including ORL,Coil20,and Coil100,validate the effectiveness of the MAS-DSC algorithm.The results show that with its multi-scale network structure and Bayesian hyperparameter optimization,MAS-DSC achieves excellent clustering results on these datasets.Furthermore,tests on a brain tumor dataset demonstrate the robustness of the algorithm and its ability to leverage prior knowledge for efficient feature extraction and enhanced clustering performance within a semi-supervised learning framework.
基金supported by the DOD National Defense Science and Engineering Graduate(NDSEG)Research Fellowshipsupported by the NGA under Contract No.HM04762110003.
文摘Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to best improve performance while limiting the number of new labels."Model Change"active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s).We pair this idea with graph-based semi-supervised learning(SSL)methods,that use the spectrum of the graph Laplacian matrix,which can be truncated to avoid prohibitively large computational and storage costs.We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution.We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.
文摘The aim of this paper is to broaden the application of Stochastic Configuration Network (SCN) in the semi-supervised domain by utilizing common unlabeled data in daily life. It can enhance the classification accuracy of decentralized SCN algorithms while effectively protecting user privacy. To this end, we propose a decentralized semi-supervised learning algorithm for SCN, called DMT-SCN, which introduces teacher and student models by combining the idea of consistency regularization to improve the response speed of model iterations. In order to reduce the possible negative impact of unsupervised data on the model, we purposely change the way of adding noise to the unlabeled data. Simulation results show that the algorithm can effectively utilize unlabeled data to improve the classification accuracy of SCN training and is robust under different ground simulation environments.
基金supported in part by the National Natural Science Foundation of China under grants 62202044 and 62372039Scientific and Technological Innovation Foundation of Foshan under grant BK22BF009+3 种基金Excellent Youth Team Project for the Central Universities under grant FRF-EYIT-23-01Fundamental Research Funds for the Central Universities under grants 06500103 and 06500078Guangdong Basic and Applied Basic Research Foundation under grant 2022A1515240044Beijing Natural Science Foundation under grant 4232040.
文摘Timely inspection of defects on the surfaces of wind turbine blades can effectively prevent unpredictable accidents.To this end,this study proposes a semi-supervised object-detection network based on You Only Looking Once version 4(YOLOv4).A semi-supervised structure comprising a generative adversarial network(GAN)was designed to overcome the difficulty in obtaining sufficient samples and sample labeling.In a GAN,the generator is realized by an encoder-decoder network,where the backbone of the encoder is YOLOv4 and the decoder comprises inverse convolutional layers.Partial features from the generator are passed to the defect detection network.Deploying several unlabeled images can significantly improve the generalization and recognition capabilities of defect-detection models.The small-scale object detection capacity of the network can be improved by enhancing essential features in the feature map by adding the concurrent spatial and channel squeeze and excitation(scSE)attention module to the three parts of the YOLOv4 network.A balancing improvement was made to the loss function of YOLOv4 to overcome the imbalance problem of the defective species.The results for both the single-and multi-category defect datasets show that the improved model can make good use of the features of the unlabeled images.The accuracy of wind turbine blade defect detection also has a significant advantage over classical object detection algorithms,including faster R-CNN and DETR.
基金supported by Korea Institute for Advancement of Technology(KIAT)grant funded by theKoreaGovernment(MOTIE)(P0008703,The CompetencyDevelopment Program for Industry Specialist).
文摘Intrusion detection involves identifying unauthorized network activity and recognizing whether the data constitute an abnormal network transmission.Recent research has focused on using semi-supervised learning mechanisms to identify abnormal network traffic to deal with labeled and unlabeled data in the industry.However,real-time training and classifying network traffic pose challenges,as they can lead to the degradation of the overall dataset and difficulties preventing attacks.Additionally,existing semi-supervised learning research might need to analyze the experimental results comprehensively.This paper proposes XA-GANomaly,a novel technique for explainable adaptive semi-supervised learning using GANomaly,an image anomalous detection model that dynamically trains small subsets to these issues.First,this research introduces a deep neural network(DNN)-based GANomaly for semi-supervised learning.Second,this paper presents the proposed adaptive algorithm for the DNN-based GANomaly,which is validated with four subsets of the adaptive dataset.Finally,this study demonstrates a monitoring system that incorporates three explainable techniques—Shapley additive explanations,reconstruction error visualization,and t-distributed stochastic neighbor embedding—to respond effectively to attacks on traffic data at each feature engineering stage,semi-supervised learning,and adaptive learning.Compared to other single-class classification techniques,the proposed DNN-based GANomaly achieves higher scores for Network Security Laboratory-Knowledge Discovery in Databases and UNSW-NB15 datasets at 13%and 8%of F1 scores and 4.17%and 11.51%for accuracy,respectively.Furthermore,experiments of the proposed adaptive learning reveal mostly improved results over the initial values.An analysis and monitoring system based on the combination of the three explainable methodologies is also described.Thus,the proposed method has the potential advantages to be applied in practical industry,and future research will explore handling unbalanced real-time datasets in various scenarios.
基金supported by Innovation Talents Promotion Program of Shaanxi Province,China(No.2021TD08)。
文摘Radio frequency fingerprinting(RFF)is a remarkable lightweight authentication scheme to support rapid and scalable identification in the internet of things(IoT)systems.Deep learning(DL)is a critical enabler of RFF identification by leveraging the hardware-level features.However,traditional supervised learning methods require huge labeled training samples.Therefore,how to establish a highperformance supervised learning model with few labels under practical application is still challenging.To address this issue,we in this paper propose a novel RFF semi-supervised learning(RFFSSL)model which can obtain a better performance with few meta labels.Specifically,the proposed RFFSSL model is constituted by a teacher-student network,in which the student network learns from the pseudo label predicted by the teacher.Then,the output of the student model will be exploited to improve the performance of teacher among the labeled data.Furthermore,a comprehensive evaluation on the accuracy is conducted.We derive about 50 GB real long-term evolution(LTE)mobile phone’s raw signal datasets,which is used to evaluate various models.Experimental results demonstrate that the proposed RFFSSL scheme can achieve up to 97%experimental testing accuracy over a noisy environment only with 10%labeled samples when training samples equal to 2700.
基金supported in part by the National Key R&D Program of China under Grant 2018YFA0701601part by the National Natural Science Foundation of China(Grant No.U22A2002,61941104,62201605)part by Tsinghua University-China Mobile Communications Group Co.,Ltd.Joint Institute。
文摘In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.
基金supported by the Fifth Key Project of Jiangsu Vocational Education Teaching Reform Research under Grant ZZZ13in part by the Science and Technology Project of Changzhou City under Grant CE20215032.
文摘Through semi-supervised learning and knowledge inheritance,a novel Takagi-Sugeno-Kang(TSK)fuzzy system framework is proposed for epilepsy data classification in this study.The new method is based on the maximum mean discrepancy(MMD)method and TSK fuzzy system,as a basic model for the classification of epilepsy data.First,formedical data,the interpretability of TSK fuzzy systems can ensure that the prediction results are traceable and safe.Second,in view of the deviation in the data distribution between the real source domain and the target domain,MMD is used to measure the distance between different data distributions.The objective function is constructed according to the MMD distance,and the distribution distance of different datasets is minimized to find the similar characteristics of different datasets.We introduce semi-supervised learning to further explore the relationship between data.Based on the MMD method,a semi-supervised learning(SSL)-MMD method is constructed by using pseudo-tags to realize the data distribution alignment of the same category.In addition,the idea of knowledge dissemination is used to learn pseudo-tags as additional data features.Finally,for epilepsy classification,the cross-domain TSK fuzzy system uses the cross-entropy function as the objective function and adopts the back-propagation strategy to optimize the parameters.The experimental results show that the new method can process complex epilepsy data and identify whether patients have epilepsy.
基金The publication of this article is funded by the Qatar National Library.
文摘Malaria is a lethal disease responsible for thousands of deaths worldwide every year.Manual methods of malaria diagnosis are timeconsuming that require a great deal of human expertise and efforts.Computerbased automated diagnosis of diseases is progressively becoming popular.Although deep learning models show high performance in the medical field,it demands a large volume of data for training which is hard to acquire for medical problems.Similarly,labeling of medical images can be done with the help of medical experts only.Several recent studies have utilized deep learning models to develop efficient malaria diagnostic system,which showed promising results.However,the most common problem with these models is that they need a large amount of data for training.This paper presents a computer-aided malaria diagnosis system that combines a semi-supervised generative adversarial network and transfer learning.The proposed model is trained in a semi-supervised manner and requires less training data than conventional deep learning models.Performance of the proposed model is evaluated on a publicly available dataset of blood smear images(with malariainfected and normal class)and achieved a classification accuracy of 96.6%.
基金This research is supported by Fundamental Research Grant Scheme(FRGS),Ministry of Education Malaysia(MOE)under the project code,FRGS/1/2018/ICT02/USM/02/9 titled,Automated Big Data Annotation for Training Semi-Supervised Deep Learning Model in Sentiment Classification.
文摘Sentiment classification is a useful tool to classify reviews about sentiments and attitudes towards a product or service.Existing studies heavily rely on sentiment classification methods that require fully annotated inputs.However,there is limited labelled text available,making the acquirement process of the fully annotated input costly and labour-intensive.Lately,semi-supervised methods emerge as they require only partially labelled input but perform comparably to supervised methods.Nevertheless,some works reported that the performance of the semi-supervised model degraded after adding unlabelled instances into training.Literature also shows that not all unlabelled instances are equally useful;thus identifying the informative unlabelled instances is beneficial in training a semi-supervised model.To achieve this,an informative score is proposed and incorporated into semisupervised sentiment classification.The evaluation is performed on a semisupervised method without an informative score and with an informative score.By using the informative score in the instance selection strategy to identify informative unlabelled instances,semi-supervised models perform better compared to models that do not incorporate informative scores into their training.Although the performance of semi-supervised models incorporated with an informative score is not able to surpass the supervised models,the results are still found promising as the differences in performance are subtle with a small difference of 2%to 5%,but the number of labelled instances used is greatly reduced from100%to 40%.The best finding of the proposed instance selection strategy is achieved when incorporating an informative score with a baseline confidence score at a 0.5:0.5 ratio using only 40%labelled data.
基金Project supported by the Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province(Grant No.SKLACSS-202108)the National Natural Science Foundation of China(Grant No.U162271070)Scientific Research Fund of Zaozhuang University(Grant No.102061901).
文摘Label propagation is an essential semi-supervised learning method based on graphs,which has a broad spectrum of applications in pattern recognition and data mining.This paper proposes a quantum semi-supervised classifier based on label propagation.Considering the difficulty of graph construction,we develop a variational quantum label propagation(VQLP)method.In this method,a locally parameterized quantum circuit is created to reduce the parameters required in the optimization.Furthermore,we design a quantum semi-supervised binary classifier based on hybrid Bell and Z bases measurement,which has a shallower circuit depth and is more suitable for implementation on near-term quantum devices.We demonstrate the performance of the quantum semi-supervised classifier on the Iris data set,and the simulation results show that the quantum semi-supervised classifier has higher classification accuracy than the swap test classifier.This work opens a new path to quantum machine learning based on graphs.
基金This research is funded by Graduate University of Science and Technology under grant number GUST.STS.DT2020-TT01。
文摘Clustering is a crucial method for deciphering data structure and producing new information.Due to its significance in revealing fundamental connections between the human brain and events,it is essential to utilize clustering for cognitive research.Dealing with noisy data caused by inaccurate synthesis from several sources or misleading data production processes is one of the most intriguing clustering difficulties.Noisy data can lead to incorrect object recognition and inference.This research aims to innovate a novel clustering approach,named Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering(PNTS3FCM),to solve the clustering problem with noisy data using neutral and refusal degrees in the definition of Picture Fuzzy Set(PFS)and Neutrosophic Set(NS).Our contribution is to propose a new optimization model with four essential components:clustering,outlier removal,safe semi-supervised fuzzy clustering and partitioning with labeled and unlabeled data.The effectiveness and flexibility of the proposed technique are estimated and compared with the state-of-art methods,standard Picture fuzzy clustering(FC-PFS)and Confidence-weighted safe semi-supervised clustering(CS3FCM)on benchmark UCI datasets.The experimental results show that our method is better at least 10/15 datasets than the compared methods in terms of clustering quality and computational time.
基金supported by the National Natural Science Foundation of China (Nos.62072127,62002076,61906049)Natural Science Foundation of Guangdong Province (Nos.2023A1515011774,2020A1515010423)+4 种基金Project 6142111180404 supported by CNKLSTISS,Science and Technology Program of Guangzhou,China (No.202002030131)Guangdong basic and applied basic research fund joint fund Youth Fund (No.2019A1515110213)Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No.MJUKF-IPIC202101)Natural Science Foundation of Guangdong Province No.2020A1515010423)Scientific research project for Guangzhou University (No.RP2022003).
文摘Recent state-of-the-art semi-supervised learning(SSL)methods usually use data augmentations as core components.Such methods,however,are limited to simple transformations such as the augmentations under the instance’s naive representations or the augmentations under the instance’s semantic representations.To tackle this problem,we offer a unique insight into data augmentations and propose a novel data-augmentation-based semi-supervised learning method,called Attentive Neighborhood Feature Aug-mentation(ANFA).The motivation of our method lies in the observation that the relationship between the given feature and its neighborhood may contribute to constructing more reliable transformations for the data,and further facilitating the classifier to distinguish the ambiguous features from the low-dense regions.Specially,we first project the labeled and unlabeled data points into an embedding space and then construct a neighbor graph that serves as a similarity measure based on the similar representations in the embedding space.Then,we employ an attention mechanism to transform the target features into augmented ones based on the neighbor graph.Finally,we formulate a novel semi-supervised loss by encouraging the predictions of the interpolations of augmented features to be consistent with the corresponding interpolations of the predictions of the target features.We carried out exper-iments on SVHN and CIFAR-10 benchmark datasets and the experimental results demonstrate that our method outperforms the state-of-the-art methods when the number of labeled examples is limited.
文摘Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.Therefore,measuring the distance between sample points is crucial to the effectiveness of clustering.Filtering features by label information and mea-suring the distance between samples by these features is a common supervised learning method to reconstruct distance metric.However,in many application scenarios,it is very expensive to obtain a large number of labeled samples.In this paper,to solve the clustering problem in the few supervised sample and high data dimensionality scenarios,a novel semi-supervised clustering algorithm is proposed by designing an improved prototype network that attempts to reconstruct the distance metric in the sample space with a small amount of pairwise supervised information,such as Must-Link and Cannot-Link,and then cluster the data in the new metric space.The core idea is to make the similar ones closer and the dissimilar ones further away through embedding mapping.Extensive experiments on both real-world and synthetic datasets show the effectiveness of this algorithm.Average clustering metrics on various datasets improved by 8%compared to the comparison algorithm.
基金supported by the "12th Five Year Plan" National Science and Technology Major Special Subject:Well Logging Data and Seismic Data Fusion Technology Research(No.2011ZX05023-005-006)
文摘At the early stages of deep-water oil exploration and development, fewer and further apart wells are drilled than in onshore oilfields. Supervised least squares support vector machine algorithms are used to predict the reservoir parameters but the prediction accuracy is low. We combined the least squares support vector machine (LSSVM) algorithm with semi-supervised learning and established a semi-supervised regression model, which we call the semi-supervised least squares support vector machine (SLSSVM) model. The iterative matrix inversion is also introduced to improve the training ability and training time of the model. We use the UCI data to test the generalization of a semi-supervised and a supervised LSSVM models. The test results suggest that the generalization performance of the LSSVM model greatly improves and with decreasing training samples the generalization performance is better. Moreover, for small-sample models, the SLSSVM method has higher precision than the semi-supervised K-nearest neighbor (SKNN) method. The new semi- supervised LSSVM algorithm was used to predict the distribution of porosity and sandstone in the Jingzhou study area.
基金The National Natural Science Foundation of China(No.61231002,61273266)the Ph.D.Programs Foundation of Ministry of Education of China(No.20110092130004)
文摘Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samples are preprocessed different categories of features including pitch zero-cross rate energy durance formant and Mel frequency cepstrum coefficient MFCC as well as their statistical parameters are extracted from the utterances of samples.In the dimensionality reduction stage before the feature vectors are sent into classifiers parameter-optimized SDA and KSDA are performed to reduce dimensionality.Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning such as linear discriminant analysis LDA locality preserving projections LPP marginal Fisher analysis MFA etc. when multi-class support vector machine SVM classifiers are used.Additionally KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA.
基金This work is supported by the National Natural Science Foundation of China(Nos.61771154,61603239,61772454,6171101570).
文摘Deep Learning(DL)is such a powerful tool that we have seen tremendous success in areas such as Computer Vision,Speech Recognition,and Natural Language Processing.Since Automated Modulation Classification(AMC)is an important part in Cognitive Radio Networks,we try to explore its potential in solving signal modulation recognition problem.It cannot be overlooked that DL model is a complex model,thus making them prone to over-fitting.DL model requires many training data to combat with over-fitting,but adding high quality labels to training data manually is not always cheap and accessible,especially in real-time system,which may counter unprecedented data in dataset.Semi-supervised Learning is a way to exploit unlabeled data effectively to reduce over-fitting in DL.In this paper,we extend Generative Adversarial Networks(GANs)to the semi-supervised learning will show it is a method can be used to create a more dataefficient classifier.
基金supported by the Aviation Science Foundation of China(20152096019)
文摘A method is proposed to resolve the typical problem of air combat situation assessment. Taking the one-to-one air combat as an example and on the basis of air combat data recorded by the air combat maneuvering instrument, the problem of air combat situation assessment is equivalent to the situation classification problem of air combat data. The fuzzy C-means clustering algorithm is proposed to cluster the selected air combat sample data and the situation classification of the data is determined by the data correlation analysis in combination with the clustering results and the pilots' description of the air combat process. On the basis of semi-supervised naive Bayes classifier, an improved algorithm is proposed based on data classification confidence, through which the situation classification of air combat data is carried out. The simulation results show that the improved algorithm can assess the air combat situation effectively and the improvement of the algorithm can promote the classification performance without significantly affecting the efficiency of the classifier.
文摘A Machine Learning (ML)-based Intrusion Detection and Prevention System (IDPS)requires a large amount of labeled up-to-date training data to effectively detect intrusions and generalize well to novel attacks.However,the labeling of data is costly and becomes infeasible when dealing with big data,such as those generated by Intemet of Things applications.To this effect,building an ML model that learns from non-labeled or partially labeled data is of critical importance.This paper proposes a Semi-supervised Mniti-Layered Clustering ((SMLC))model for the detection and prevention of network intrusion.SMLC has the capability to learn from partially labeled data while achieving a detection performance comparable to that of supervised ML-based IDPS.The performance of SMLC is compared with that of a well-known semi-supervised model (tri-training)and of supervised ensemble ML models, namely Random.Forest,Bagging,and AdaboostM1on two benchmark network-intrusion datasets,NSL and Kyoto 2006+.Experimental resnits show that SMLC is superior to tri-training,providing a comparable detection accuracy with 20%less labeled instances of training data.Furthermore,our results demonstrate that our scheme has a detection accuracy comparable to that of the supervised ensemble models.