期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Combining supervised classifiers with unlabeled data
1
作者 刘雪艳 张雪英 +1 位作者 李凤莲 黄丽霞 《Journal of Central South University》 SCIE EI CAS CSCD 2016年第5期1176-1182,共7页
Ensemble learning is a wildly concerned issue.Traditional ensemble techniques are always adopted to seek better results with labeled data and base classifiers.They fail to address the ensemble task where only unlabele... Ensemble learning is a wildly concerned issue.Traditional ensemble techniques are always adopted to seek better results with labeled data and base classifiers.They fail to address the ensemble task where only unlabeled data are available.A label propagation based ensemble(LPBE) approach is proposed to further combine base classification results with unlabeled data.First,a graph is constructed by taking unlabeled data as vertexes,and the weights in the graph are calculated by correntropy function.Average prediction results are gained from base classifiers,and then propagated under a regularization framework and adaptively enhanced over the graph.The proposed approach is further enriched when small labeled data are available.The proposed algorithms are evaluated on several UCI benchmark data sets.Results of simulations show that the proposed algorithms achieve satisfactory performance compared with existing ensemble methods. 展开更多
关键词 correntropy unlabeled data regularization framework ensemble learning
下载PDF
Exploiting Unlabeled Data for Neural Grammatical Error Detection 被引量:3
2
作者 Zhuo-Ran Liu Yang Liu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第4期758-767,共10页
Identifying and correcting grammatical errors in the text written by non-native writers have received increasing attention in recent years. Although a number of annotated corpora have been established to facilitate da... Identifying and correcting grammatical errors in the text written by non-native writers have received increasing attention in recent years. Although a number of annotated corpora have been established to facilitate data-driven grammatical error detection and correction approaches, they are still limited in terms of quantity and coverage because human annotation is labor-intensive, time-consuming, and expensive. In this work, we propose to utilize unlabeled data to train neural network based grammatical error detection models. The basic idea is to cast error detection as a binary classification problem and derive positive and negative training examples from unlabeled data. We introduce an attention-based neural network to capture long-distance dependencies that influence the word being detected. Experiments show that the proposed approach significantly outperforms SVM and convolutional networks with fixed-size context window. 展开更多
关键词 unlabeled data grammatical error detection neural network
原文传递
Regularized canonical correlation analysis with unlabeled data 被引量:1
3
作者 Xi-chuan ZHOU Hai-bin SHEN 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2009年第4期504-511,共8页
In standard canonical correlation analysis (CCA), the data from definite datasets are used to estimate their canonical correlation. In real applications, for example in bilingual text retrieval, it may have a great po... In standard canonical correlation analysis (CCA), the data from definite datasets are used to estimate their canonical correlation. In real applications, for example in bilingual text retrieval, it may have a great portion of data that we do not know which set it belongs to. This part of data is called unlabeled data, while the rest from definite datasets is called labeled data. We propose a novel method called regularized canonical correlation analysis (RCCA), which makes use of both labeled and unlabeled samples. Specifically, we learn to approximate canonical correlation as if all data were labeled. Then, we describe a generalization of RCCA for the multi-set situation. Experiments on four real world datasets, Yeast, Cloud, Iris, and Haberman, demonstrate that, by incorporating the unlabeled data points, the accuracy of correlation coefficients can be improved by over 30%. 展开更多
关键词 Canonical correlation analysis (CCA) REGULARIZATION unlabeled data Generalized canonical correlation analysis(GCCA)
原文传递
Effcient poisoning attacks and defenses for unlabeled data in DDoS prediction of intelligent transportation systems 被引量:1
4
作者 Zhong Li Xianke Wu Changjun Jiang 《Security and Safety》 2022年第1期145-165,共21页
Nowadays,large numbers of smart sensors(e.g.,road-side cameras)which com-municate with nearby base stations could launch distributed denial of services(DDoS)attack storms in intelligent transportation systems.DDoS att... Nowadays,large numbers of smart sensors(e.g.,road-side cameras)which com-municate with nearby base stations could launch distributed denial of services(DDoS)attack storms in intelligent transportation systems.DDoS attacks disable the services provided by base stations.Thus in this paper,considering the uneven communication traffic ows and privacy preserving,we give a hidden Markov model-based prediction model by utilizing the multi-step characteristic of DDoS with a federated learning framework to predict whether DDoS attacks will happen on base stations in the future.However,in the federated learning,we need to consider the problem of poisoning attacks due to malicious participants.The poisoning attacks will lead to the intelligent transportation systems paralysis without security protection.Traditional poisoning attacks mainly apply to the classi cation model with labeled data.In this paper,we propose a reinforcement learning-based poisoningmethod speci cally for poisoning the prediction model with unlabeled data.Besides,previous related defense strategies rely on validation datasets with labeled data in the server.However,it is unrealistic since the local training datasets are not uploaded to the server due to privacy preserving,and our datasets are also unlabeled.Furthermore,we give a validation dataset-free defense strategy based on Dempster-Shafer(D-S)evidence theory avoiding anomaly aggregation to obtain a robust global model for precise DDoS prediction.In our experiments,we simulate 3000 points in combination with DARPA2000 dataset to carry out evaluations.The results indicate that our poisoning method can successfully poison the global prediction model with unlabeled data in a short time.Meanwhile,we compare our proposed defense algorithm with three popularly used defense algorithms.The results show that our defense method has a high accuracy rate of excluding poisoners and can obtain a high attack prediction probability. 展开更多
关键词 Poisoning attacks DEFENSES Multi-step DDoS prediction unlabeled data Intel-ligent transportation systems
原文传递
Iterative Semi-Supervised Learning Using Softmax Probability 被引量:1
5
作者 Heewon Chung Jinseok Lee 《Computers, Materials & Continua》 SCIE EI 2022年第9期5607-5628,共22页
For the classification problem in practice,one of the challenging issues is to obtain enough labeled data for training.Moreover,even if such labeled data has been sufficiently accumulated,most datasets often exhibit l... For the classification problem in practice,one of the challenging issues is to obtain enough labeled data for training.Moreover,even if such labeled data has been sufficiently accumulated,most datasets often exhibit long-tailed distribution with heavy class imbalance,which results in a biased model towards a majority class.To alleviate such class imbalance,semisupervised learning methods using additional unlabeled data have been considered.However,as a matter of course,the accuracy is much lower than that from supervised learning.In this study,under the assumption that additional unlabeled data is available,we propose the iterative semi-supervised learning algorithms,which iteratively correct the labeling of the extra unlabeled data based on softmax probabilities.The results show that the proposed algorithms provide the accuracy as high as that from the supervised learning.To validate the proposed algorithms,we tested on the two scenarios:with the balanced unlabeled dataset and with the imbalanced unlabeled dataset.Under both scenarios,our proposed semi-supervised learning algorithms provided higher accuracy than previous state-of-the-arts.Code is available at https://github.com/HeewonChung92/iterative-semi-learning. 展开更多
关键词 Semi-supervised learning class imbalance iterative learning unlabeled data
下载PDF
Research and Implementation of Unsupervised Clustering-Based Intrusion Detection
6
作者 Luo Min, Zhang Huan\|guo, Wang Li\|na School of Computer, Wuhan University, Wuhan 430072, Hubei, China 《Wuhan University Journal of Natural Sciences》 CAS 2003年第03A期803-807,共5页
An unsupervised clustering\|based intrusion detection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With th... An unsupervised clustering\|based intrusion detection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With the classified data instances, anomaly data clusters can be easily identified by normal cluster ratio and the identified cluster can be used in real data detection. The benefit of the algorithm is that it doesn't need labeled training data sets. The experiment concludes that this approach can detect unknown intrusions efficiently in the real network connections via using the data sets of KDD99. 展开更多
关键词 intrusion detection data mining unsupervised clustering unlabeled data
下载PDF
Geostatistical semi-supervised learning for spatial prediction
7
作者 Francky Fouedjio Hassan Talebi 《Artificial Intelligence in Geosciences》 2022年第1期162-178,共17页
Geoscientists are increasingly tasked with spatially predicting a target variable in the presence of auxiliary information using supervised machine learning algorithms.Typically,the target variable is observed at a fe... Geoscientists are increasingly tasked with spatially predicting a target variable in the presence of auxiliary information using supervised machine learning algorithms.Typically,the target variable is observed at a few sampling locations due to the relatively time-consuming and costly process of obtaining measurements.In contrast,auxiliary variables are often exhaustively observed within the region under study through the increasing development of remote sensing platforms and sensor networks.Supervised machine learning methods do not fully leverage this large amount of auxiliary spatial data.Indeed,in these methods,the training dataset includes only labeled data locations(where both target and auxiliary variables were measured).At the same time,unlabeled data locations(where auxiliary variables were measured but not the target variable)are not considered during the model training phase.Consequently,only a limited amount of auxiliary spatial data is utilized during the model training stage.As an alternative to supervised learning,semi-supervised learning,which learns from labeled as well as unlabeled data,can be used to address this problem.However,conventional semi-supervised learning techniques do not account for the specificities of spatial data.This paper introduces a spatial semi-supervised learning framework where geostatistics and machine learning are combined to harness a large amount of unlabeled spatial data in combination with typically a smaller set of labeled spatial data.The main idea consists of leveraging the target variable’s spatial autocorrelation to generate pseudo labels at unlabeled data points that are geographically close to labeled data points.This is achieved through geostatistical conditional simulation,where an ensemble of pseudo labels is generated to account for the uncertainty in the pseudo labeling process.The observed labels are augmented by this ensemble of pseudo labels to create an ensemble of pseudo training datasets.A supervised machine learning model is then trained on each pseudo training dataset,followed by an aggregation of trained models.The proposed geostatistical semi-supervised learning method is applied to synthetic and real-world spatial datasets.Its predictive performance is compared with some classical supervised and semi-supervised machine learning methods.It appears that it can effectively leverage a large amount of unlabeled spatial data to improve the target variable’s spatial prediction. 展开更多
关键词 Labeled spatial data unlabeled spatial data Spatial autocorrelation Pseudo labeling Spatial prediction
下载PDF
Learning to select pseudo labels: a semi-supervisedmethod for named entity recognition 被引量:2
8
作者 Zhen-zhen LI Da-wei FENG +1 位作者 Dong-sheng LI Xi-cheng LU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第6期903-916,共14页
Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medica... Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medical,financial,and military domains,labeled data is very scarce,while unlabeled data is readily available.Previous studies have used unlabeled data to enrich word representations,but a large amount of entity information in unlabeled data is neglected,which may be beneficial to the NER task.In this study,we propose a semi-supervised method for NER tasks,which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels.Pseudo labels are automatically generated for unlabeled data and used as if they were true labels.Our semi-supervised framework includes three steps:constructing an optimal single neural model for a specific NER task,learning a module that evaluates pseudo labels,and creating new labeled data and improving the NER model iteratively.Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model.Even when we use only pre-trained static word embeddings and do not rely on any external knowledge,our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks. 展开更多
关键词 Named entity recognition unlabeled data Deep learning Semi-supervised method
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部