Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to bes...Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to best improve performance while limiting the number of new labels."Model Change"active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s).We pair this idea with graph-based semi-supervised learning(SSL)methods,that use the spectrum of the graph Laplacian matrix,which can be truncated to avoid prohibitively large computational and storage costs.We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution.We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.展开更多
These problems of nonlinearity, fuzziness and few labeled data were rarely considered in traditional remote sensing image classification. A semi-supervised kernel fuzzy C-means (SSKFCM) algorithm is proposed to over...These problems of nonlinearity, fuzziness and few labeled data were rarely considered in traditional remote sensing image classification. A semi-supervised kernel fuzzy C-means (SSKFCM) algorithm is proposed to overcome these disadvantages of remote sensing image classification in this paper. The SSKFCM algorithm is achieved by introducing a kernel method and semi-supervised learning technique into the standard fuzzy C-means (FCM) algorithm. A set of Beijing-1 micro-satellite's multispectral images are adopted to be classified by several algorithms, such as FCM, kernel FCM (KFCM), semi-supervised FCM (SSFCM) and SSKFCM. The classification results are estimated by corresponding indexes. The results indicate that the SSKFCM algorithm significantly improves the classification accuracy of remote sensing images compared with the others.展开更多
Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medica...Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medical,financial,and military domains,labeled data is very scarce,while unlabeled data is readily available.Previous studies have used unlabeled data to enrich word representations,but a large amount of entity information in unlabeled data is neglected,which may be beneficial to the NER task.In this study,we propose a semi-supervised method for NER tasks,which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels.Pseudo labels are automatically generated for unlabeled data and used as if they were true labels.Our semi-supervised framework includes three steps:constructing an optimal single neural model for a specific NER task,learning a module that evaluates pseudo labels,and creating new labeled data and improving the NER model iteratively.Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model.Even when we use only pre-trained static word embeddings and do not rely on any external knowledge,our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks.展开更多
基金supported by the DOD National Defense Science and Engineering Graduate(NDSEG)Research Fellowshipsupported by the NGA under Contract No.HM04762110003.
文摘Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to best improve performance while limiting the number of new labels."Model Change"active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s).We pair this idea with graph-based semi-supervised learning(SSL)methods,that use the spectrum of the graph Laplacian matrix,which can be truncated to avoid prohibitively large computational and storage costs.We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution.We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.
基金Supported by the National High Technology Research and Development Programme (No.2007AA12Z227) and the National Natural Science Foundation of China (No.40701146).
文摘These problems of nonlinearity, fuzziness and few labeled data were rarely considered in traditional remote sensing image classification. A semi-supervised kernel fuzzy C-means (SSKFCM) algorithm is proposed to overcome these disadvantages of remote sensing image classification in this paper. The SSKFCM algorithm is achieved by introducing a kernel method and semi-supervised learning technique into the standard fuzzy C-means (FCM) algorithm. A set of Beijing-1 micro-satellite's multispectral images are adopted to be classified by several algorithms, such as FCM, kernel FCM (KFCM), semi-supervised FCM (SSFCM) and SSKFCM. The classification results are estimated by corresponding indexes. The results indicate that the SSKFCM algorithm significantly improves the classification accuracy of remote sensing images compared with the others.
基金Project supported by the National Key Research and Development Program of China(No.2016YFB0201305)the National Natural Science Foundation of China(No.61872376)。
文摘Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medical,financial,and military domains,labeled data is very scarce,while unlabeled data is readily available.Previous studies have used unlabeled data to enrich word representations,but a large amount of entity information in unlabeled data is neglected,which may be beneficial to the NER task.In this study,we propose a semi-supervised method for NER tasks,which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels.Pseudo labels are automatically generated for unlabeled data and used as if they were true labels.Our semi-supervised framework includes three steps:constructing an optimal single neural model for a specific NER task,learning a module that evaluates pseudo labels,and creating new labeled data and improving the NER model iteratively.Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model.Even when we use only pre-trained static word embeddings and do not rely on any external knowledge,our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks.