期刊文献+
共找到56篇文章
< 1 2 3 >
每页显示 20 50 100
Research on classification method of high myopic maculopathy based on retinal fundus images and optimized ALFA-Mix active learning algorithm 被引量:1
1
作者 Shao-Jun Zhu Hao-Dong Zhan +4 位作者 Mao-Nian Wu Bo Zheng Bang-Quan Liu Shao-Chong Zhang Wei-Hua Yang 《International Journal of Ophthalmology(English edition)》 SCIE CAS 2023年第7期995-1004,共10页
AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize anno... AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize annotation costs,and to optimize the ALFA-Mix active learning algorithm and apply it to HMM classification.METHODS:The optimized ALFA-Mix algorithm(ALFAMix+)was compared with five algorithms,including ALFA-Mix.Four models,including Res Net18,were established.Each algorithm was combined with four models for experiments on the HMM dataset.Each experiment consisted of 20 active learning rounds,with 100 images selected per round.The algorithm was evaluated by comparing the number of rounds in which ALFA-Mix+outperformed other algorithms.Finally,this study employed six models,including Efficient Former,to classify HMM.The best-performing model among these models was selected as the baseline model and combined with the ALFA-Mix+algorithm to achieve satisfactor y classification results with a small dataset.RESULTS:ALFA-Mix+outperforms other algorithms with an average superiority of 16.6,14.75,16.8,and 16.7 rounds in terms of accuracy,sensitivity,specificity,and Kappa value,respectively.This study conducted experiments on classifying HMM using several advanced deep learning models with a complete training set of 4252 images.The Efficient Former achieved the best results with an accuracy,sensitivity,specificity,and Kappa value of 0.8821,0.8334,0.9693,and 0.8339,respectively.Therefore,by combining ALFA-Mix+with Efficient Former,this study achieved results with an accuracy,sensitivity,specificity,and Kappa value of 0.8964,0.8643,0.9721,and 0.8537,respectively.CONCLUSION:The ALFA-Mix+algorithm reduces the required samples without compromising accuracy.Compared to other algorithms,ALFA-Mix+outperforms in more rounds of experiments.It effectively selects valuable samples compared to other algorithms.In HMM classification,combining ALFA-Mix+with Efficient Former enhances model performance,further demonstrating the effectiveness of ALFA-Mix+. 展开更多
关键词 high myopic maculopathy deep learning active learning image classification ALFA-Mix algorithm
下载PDF
Active learning accelerated Monte-Carlo simulation based on the modified K-nearest neighbors algorithm and its application to reliability estimations
2
作者 Zhifeng Xu Jiyin Cao +2 位作者 Gang Zhang Xuyong Chen Yushun Wu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第10期306-313,共8页
This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a rand... This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a random input point can be postulated through a classifier implemented through the modified K-nearest neighbors algorithm.Compared to other active learning methods resorting to experimental designs,the proposed method is characterized by employing Monte-Carlo simulation for sampling inputs and saving a large portion of the actual evaluations of outputs through an accurate classification,which is applicable for most structural reliability estimation problems.Moreover,the validity,efficiency,and accuracy of the proposed method are demonstrated numerically.In addition,the optimal value of K that maximizes the computational efficiency is studied.Finally,the proposed method is applied to the reliability estimation of the carbon fiber reinforced silicon carbide composite specimens subjected to random displacements,which further validates its practicability. 展开更多
关键词 active learning Monte-carlo simulation K-nearest neighbors Reliability estimation CLASSIFICATION
下载PDF
Active Learning Strategies for Textual Dataset-Automatic Labelling
3
作者 Sher Muhammad Daudpota Saif Hassan +2 位作者 Yazeed Alkhurayyif Abdullah Saleh Alqahtani Muhammad Haris Aziz 《Computers, Materials & Continua》 SCIE EI 2023年第8期1409-1422,共14页
The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in ... The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts.The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning.More specifically,this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels.To achieve this objective,different experiments have been performed on the publicly available dataset.In first set of experiments,we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set.In the second set of experiments,we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set.The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3%on sentence level datasets for auto labelling. 展开更多
关键词 active learning automatic labelling textual datasets
下载PDF
Enhancing Semantic Segmentation through Reinforced Active Learning: Combating Dataset Imbalances and Bolstering Annotation Efficiency
4
作者 Dong Han Huong Pham Samuel Cheng 《Journal of Electronic & Information Systems》 2023年第2期45-60,共16页
This research addresses the challenges of training large semantic segmentation models for image analysis,focusing on expediting the annotation process and mitigating imbalanced datasets.In the context of imbalanced da... This research addresses the challenges of training large semantic segmentation models for image analysis,focusing on expediting the annotation process and mitigating imbalanced datasets.In the context of imbalanced datasets,biases related to age and gender in clinical contexts and skewed representation in natural images can affect model performance.Strategies to mitigate these biases are explored to enhance efficiency and accuracy in semantic segmentation analysis.An in-depth exploration of various reinforced active learning methodologies for image segmentation is conducted,optimizing precision and efficiency across diverse domains.The proposed framework integrates Dueling Deep Q-Networks(DQN),Prioritized Experience Replay,Noisy Networks,and Emphasizing Recent Experience.Extensive experimentation and evaluation of diverse datasets reveal both improvements and limitations associated with various approaches in terms of overall accuracy and efficiency.This research contributes to the expansion of reinforced active learning methodologies for image segmentation,paving the way for more sophisticated and precise segmentation algorithms across diverse domains.The findings emphasize the need for a careful balance between exploration and exploitation strategies in reinforcement learning for effective image segmentation. 展开更多
关键词 Semantic segmentation active learning Reinforcement learning
下载PDF
Analyzing Cross-domain Transportation Big Data of New York City with Semi-supervised and Active Learning 被引量:4
5
作者 Huiyu Sun Suzanne McIntosh 《Computers, Materials & Continua》 SCIE EI 2018年第10期1-9,共9页
The majority of big data analytics applied to transportation datasets suffer from being too domain-specific,that is,they draw conclusions for a dataset based on analytics on the same dataset.This makes models trained ... The majority of big data analytics applied to transportation datasets suffer from being too domain-specific,that is,they draw conclusions for a dataset based on analytics on the same dataset.This makes models trained from one domain(e.g.taxi data)applies badly to a different domain(e.g.Uber data).To achieve accurate analyses on a new domain,substantial amounts of data must be available,which limits practical applications.To remedy this,we propose to use semi-supervised and active learning of big data to accomplish the domain adaptation task:Selectively choosing a small amount of datapoints from a new domain while achieving comparable performances to using all the datapoints.We choose the New York City(NYC)transportation data of taxi and Uber as our dataset,simulating different domains with 90%as the source data domain for training and the remaining 10%as the target data domain for evaluation.We propose semi-supervised and active learning strategies and apply it to the source domain for selecting datapoints.Experimental results show that our adaptation achieves a comparable performance of using all datapoints while using only a fraction of them,substantially reducing the amount of data required.Our approach has two major advantages:It can make accurate analytics and predictions when big datasets are not available,and even if big datasets are available,our approach chooses the most informative datapoints out of the dataset,making the process much more efficient without having to process huge amounts of data. 展开更多
关键词 Big data taxi and uber domain adaptation active learning semi-supervised learning
下载PDF
MII:A Novel Text Classification Model Combining Deep Active Learning with BERT 被引量:4
6
作者 Anman Zhang Bohan Li +2 位作者 Wenhuan Wang Shuo Wan Weitong Chen 《Computers, Materials & Continua》 SCIE EI 2020年第6期1499-1514,共16页
Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rar... Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rare work paid attention to the effectiveness of active learning on it.In this paper,we proposed a deep active learning model with bidirectional encoder representations from transformers(BERT)for text classification.BERT takes advantage of the self-attention mechanism to integrate contextual information,which is beneficial to accelerate the convergence of training.As for the process of active learning,we design an instance selection strategy based on posterior probabilities Margin,Intra-correlation and Inter-correlation(MII).Selected instances are characterized by small margin,low intra-cohesion and high inter-cohesion.We conduct extensive experiments and analytics with our methods.The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets.The results show that our method outperforms the baselines in terms of accuracy. 展开更多
关键词 active learning instance selection deep neural network text classification
下载PDF
Adversarial Active Learning for Named Entity Recognition in Cybersecurity 被引量:2
7
作者 Tao Li Yongjin Hu +1 位作者 Ankang Ju Zhuoran Hu 《Computers, Materials & Continua》 SCIE EI 2021年第1期407-420,共14页
Owing to the continuous barrage of cyber threats,there is a massive amount of cyber threat intelligence.However,a great deal of cyber threat intelligence come from textual sources.For analysis of cyber threat intellig... Owing to the continuous barrage of cyber threats,there is a massive amount of cyber threat intelligence.However,a great deal of cyber threat intelligence come from textual sources.For analysis of cyber threat intelligence,many security analysts rely on cumbersome and time-consuming manual efforts.Cybersecurity knowledge graph plays a significant role in automatics analysis of cyber threat intelligence.As the foundation for constructing cybersecurity knowledge graph,named entity recognition(NER)is required for identifying critical threat-related elements from textual cyber threat intelligence.Recently,deep neural network-based models have attained very good results in NER.However,the performance of these models relies heavily on the amount of labeled data.Since labeled data in cybersecurity is scarce,in this paper,we propose an adversarial active learning framework to effectively select the informative samples for further annotation.In addition,leveraging the long short-term memory(LSTM)network and the bidirectional LSTM(BiLSTM)network,we propose a novel NER model by introducing a dynamic attention mechanism into the BiLSTM-LSTM encoderdecoder.With the selected informative samples annotated,the proposed NER model is retrained.As a result,the performance of the NER model is incrementally enhanced with low labeling cost.Experimental results show the effectiveness of the proposed method. 展开更多
关键词 Adversarial learning active learning named entity recognition dynamic attention mechanism
下载PDF
Active Learning Improves Nursing Student Clinical Performance in an Academic Institution in Macao 被引量:1
8
作者 Cindy Sin U Leong Lynn B.Clutter 《Chinese Nursing Research》 CAS 2015年第3期108-115,共8页
Objective: To assess the outcome of the application of active learning during practicum among nursing students using clinical assessment and evaluation scores as a measurement. Methods: Nursing students were instruc... Objective: To assess the outcome of the application of active learning during practicum among nursing students using clinical assessment and evaluation scores as a measurement. Methods: Nursing students were instructed on the basics of active learning prior to the initiation of their clinical experience. The participants were divided into 5groups of nursing students ( n = 56) across three levels (years 2-4) in a public academic institute of a bachelor degree program in Macao. Final clinical evaluation was averaged and compared between groups with and without intervention. Results: These nursing students were given higher appraisals in verbal and written comments than previous students without interventian. The groups with the invention achieved higher clinical assessment and evaluation scores on average than comparable groups without the active learning intervention. One group of sophomore nursing students (year 2) did not receive as high of evaluations as the other groups, receiving an average score of above 80. Conclusions" Nursing students must engage in active learning to demonstrate that they are willing to gain knowledge of theory, nursing skills and communication skills during the clinical practicum. 展开更多
关键词 active learning Clinical competence Nursing students
下载PDF
Implementing physically active learning:Future directions for research,policy,and practice
9
作者 Andy Daly-Smith Thomas Quarmby +8 位作者 Victoria S.J.Archbold Ash C.Routen Jade L.Morris Catherine Gammon John B.Bartholomew Geir Kare Resaland Bryn Llewellyn Richard Allman Henry Dorling 《Journal of Sport and Health Science》 SCIE 2020年第1期41-49,F0003,共10页
Purpose'. To identify co-produced multi-stakeholder perspectives important for successful widespread physically active learning (PAL) adoptionand implementation.Methods: A total of 35 stakeholders (policymakers ≪ ... Purpose'. To identify co-produced multi-stakeholder perspectives important for successful widespread physically active learning (PAL) adoptionand implementation.Methods: A total of 35 stakeholders (policymakers ≪ = 9;commercial education sector, ≪ = 8;teachers, w = 3;researchers, w = 15) attended adesign thinking PAL workshop. Participants formed 5 multi-disciplinary groups with at least 1 representative from each stakeholder group. Eachgroup, facilitated by a researcher, undertook 2 tasks: (1) using Post-it Notes, the following question was answered: within the school day, whatare the opportunities for learning combined with movement? and (2) structured as a washing-line task, the following question was answered:how can we establish PAL as the norm? All discussions were audio-recorded and transcribed. Inductive analyses were conducted by 4 authors.After the analyses were complete, the main themes and subthemes were assigned to 4 predetermined categories: (1) PAL design and implementation,(2) priorities for practice, (3) priorities for policy, and (4) priorities for research.Results'. The following were the main themes for PAL implementation: opportunities for PAL within the school day, delivery environments,learning approaches, and the intensity of PAL. The main themes for the priorities for practice included teacher confidence and competence,resources to support delivery, and community of practice. The main themes for the policy for priorities included self-governance, the Office forStandards in Education, Children's Services, and Skill, policy investment in initial teacher training, and curriculum reform. The main themes forthe research priorities included establishing a strong evidence base, school-based PAL implementation, and a whole-systems approach.Conclusion-. The present study is the first to identify PAL implementation factors using a combined multi-stakeholder perspective. To achievewider PAL adoption and implementation, future interventions should be evidence based and address implementation factors at the classroomlevel (e.g., approaches and delivery environments), school level (e.g., comm unties of practice), and policy level (e.g., initial teacher training). 展开更多
关键词 Physical activity Physically active learning POLICY SCHOOL
下载PDF
Mining potential social relationship with active learning in LBSN
10
作者 王海平 Zhang Hong +1 位作者 Wang Yong Bing Jia 《High Technology Letters》 EI CAS 2017年第2期198-202,共5页
Rapid development of local-based social network(LBSN) makes it more convenient for researchers to carry out studies related to social network.Mining potential social relationship in LBSN is the most important one.Trad... Rapid development of local-based social network(LBSN) makes it more convenient for researchers to carry out studies related to social network.Mining potential social relationship in LBSN is the most important one.Traditionally,researchers use topological relation of social network or telecommunication network to mine potential social relationship.But the effect is unsatisfactory as the network can not provide complete information of topological relation.In this work,a new model called PSRMAL is proposed for mining potential social relationships with LBSN.With the model,better performance is obtained and guaranteed,and experiments verify the effectiveness. 展开更多
关键词 data preprocessing feature fusion active learning
下载PDF
A Simple yet Effective Framework for Active Learning to Rank
11
作者 Qingzhong Wang Haifang Li +7 位作者 Haoyi Xiong Wen Wang Jiang Bian Yu Lu Shuaiqiang Wang Zhicong Cheng Dejing Dou Dawei Yin 《Machine Intelligence Research》 EI CSCD 2024年第1期169-183,共15页
While China has become the largest online market in the world with approximately 1 billion internet users,Baidu runs the world's largest Chinese search engine serving more than hundreds of millions of daily active... While China has become the largest online market in the world with approximately 1 billion internet users,Baidu runs the world's largest Chinese search engine serving more than hundreds of millions of daily active users and responding to billions of queries per day.To handle the diverse query requests from users at the web-scale,Baidu has made tremendous efforts in understanding users'queries,retrieving relevant content from a pool of trillions of webpages,and ranking the most relevant webpages on the top of the res-ults.Among the components used in Baidu search,learning to rank(LTR)plays a critical role and we need to timely label an extremely large number of queries together with relevant webpages to train and update the online LTR models.To reduce the costs and time con-sumption of query/webpage labelling,we study the problem of active learning to rank(active LTR)that selects unlabeled queries for an-notation and training in this work.Specifically,we first investigate the criterion-Ranking entropy(RE)characterizing the entropy of relevant webpages under a query produced by a sequence of online LTR models updated by different checkpoints,using a query-by-com-mittee(QBC)method.Then,we explore a new criterion namely prediction variances(PV)that measures the variance of prediction res-ults for all relevant webpages under a query.Our empirical studies find that RE may favor low-frequency queries from the pool for la-belling while PV prioritizes high-frequency queries more.Finally,we combine these two complementary criteria as the sample selection strategies for active learning.Extensive experiments with comparisons to baseline algorithms show that the proposed approach could train LTR models to achieve higher discounted cumulative gain(i.e.,the relative improvement DCG4=1.38%)with the same budgeted labellingefforts. 展开更多
关键词 SEARCH information retrieval learning to rank active learning query by committee
原文传递
Active Machine Learning for Chemical Engineers:A Bright Future Lies Ahead! 被引量:1
12
作者 Yannick Ureel Maarten R.Dobbelaere +4 位作者 Yi Ouyang Kevin De Ras Maarten K.Sabbe Guy B.Marin Kevin M.Van Geem 《Engineering》 SCIE EI CAS CSCD 2023年第8期23-30,共8页
By combining machine learning with the design of experiments,thereby achieving so-called active machine learning,more efficient and cheaper research can be conducted.Machine learning algorithms are more flexible and a... By combining machine learning with the design of experiments,thereby achieving so-called active machine learning,more efficient and cheaper research can be conducted.Machine learning algorithms are more flexible and are better than traditional design of experiment algorithms at investigating processes spanning all length scales of chemical engineering.While active machine learning algorithms are maturing,their applications are falling behind.In this article,three types of challenges presented by active machine learning—namely,convincing the experimental researcher,the flexibility of data creation,and the robustness of active machine learning algorithms—are identified,and ways to overcome them are discussed.A bright future lies ahead for active machine learning in chemical engineering,thanks to increasing automation and more efficient algorithms that can drive novel discoveries. 展开更多
关键词 active machine learning active learning Bayesian optimization Chemical engineering Design of experiments
下载PDF
Phase prediction for high-entropy alloys using generative adversarial network and active learning based on small datasets 被引量:1
13
作者 CHEN Cun ZHOU HengRu +2 位作者 LONG WeiMin WANG Gang REN JingLi 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2023年第12期3615-3627,共13页
In this paper,a new machine learning(ML)model combining conditional generative adversarial networks(CGANs)and active learning(AL)is proposed to predict the body-centered cubic(BCC)phase,face-centered cubic(FCC)phase,a... In this paper,a new machine learning(ML)model combining conditional generative adversarial networks(CGANs)and active learning(AL)is proposed to predict the body-centered cubic(BCC)phase,face-centered cubic(FCC)phase,and BCC+FCC phase of high-entropy alloys(HEAs).Considering the lack of data,CGANs are introduced for data augmentation,and AL can achieve high prediction accuracy under a small sample size owing to its special sample selection strategy.Therefore,we propose an ML framework combining CGAN and AL to predict the phase of HEAs.The arithmetic optimization algorithm(AOA)is introduced to improve the artificial neural network(ANN).AOA can overcome the problem of falling into the locally optimal solution for the ANN and reduce the number of training iterations.The AOA-optimized ANN model trained by the AL sample selection strategy achieved high prediction accuracy on the test set.To improve the performance and interpretability of the model,domain knowledge is incorporated into the feature selection.Additionally,considering that the proposed method can alleviate the problem caused by the shortage of experimental data,it can be applied to predictions based on small datasets in other fields. 展开更多
关键词 high-entropy alloys phase prediction machine learning conditional generative adversarial networks active learning
原文传递
Distributed Active Partial Label Learning
14
作者 Zhen Xu Weibin Chen 《Intelligent Automation & Soft Computing》 SCIE 2023年第9期2627-2650,共24页
Active learning(AL)trains a high-precision predictor model from small numbers of labeled data by iteratively annotating the most valuable data sample from an unlabeled data pool with a class label throughout the learn... Active learning(AL)trains a high-precision predictor model from small numbers of labeled data by iteratively annotating the most valuable data sample from an unlabeled data pool with a class label throughout the learning process.However,most current AL methods start with the premise that the labels queried at AL rounds must be free of ambiguity,which may be unrealistic in some real-world applications where only a set of candidate labels can be obtained for selected data.Besides,most of the existing AL algorithms only consider the case of centralized processing,which necessitates gathering together all the unlabeled data in one fusion center for selection.Considering that data are collected/stored at different nodes over a network in many real-world scenarios,distributed processing is chosen here.In this paper,the issue of distributed classification of partially labeled(PL)data obtained by a fully decentralized AL method is focused on,and a distributed active partial label learning(dAPLL)algorithm is proposed.Our proposed algorithm is composed of a fully decentralized sample selection strategy and a distributed partial label learning(PLL)algorithm.During the sample selection process,both the uncertainty and representativeness of the data are measured based on the global cluster centers obtained by a distributed clustering method,and the valuable samples are chosen in turn.Meanwhile,using the disambiguation-free strategy,a series of binary classification problems can be constructed,and the corresponding cost-sensitive classifiers can be cooperatively trained in a distributed manner.The experiment results conducted on several datasets demonstrate that the performance of the dAPLL algorithm is comparable to that of the corresponding centralized method and is superior to the existing active PLL(APLL)method in different parameter configurations.Besides,our proposed algorithm outperforms several current PLL methods using the random selection strategy,especially when only small amounts of data are selected to be assigned with the candidate labels. 展开更多
关键词 active learning partial label learning distributed processing disambiguation-free strategy
下载PDF
A Novel Active Learning Method Using SVM for Text Classification 被引量:17
15
作者 Mohamed Goudjil Mouloud Koudil +1 位作者 Mouldi Bedda Noureddine Ghoggali 《International Journal of Automation and computing》 EI CSCD 2018年第3期290-298,共9页
Support vector machines(SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data... Support vector machines(SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled.The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy. 展开更多
关键词 Text categorization active learning support vector machine (SVM) pool-based active learning pairwise coupling.
原文传递
Combining Committee-Based Semi-Supervised Learning and Active Learning 被引量:6
16
作者 Mohamed Farouk Abdel Hady Friedhelm Schwenker 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第4期681-698,共18页
Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this prob... Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones. 展开更多
关键词 data mining classification active learning CO-TRAINING semi-supervised learning ensemble learning randomsubspace method decision tree nearest neighbor classifier
原文传递
Multi-label active learning by model guided distribution matching 被引量:4
17
作者 Nengneng GAO Sheng-Jun HUANG Songcan CHEN 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第5期845-855,共11页
Multi-label learning is an effective framework for learning with objects that have multiple semantic labels, and has been successfully applied into many real-world tasks, In contrast with traditional single-label lear... Multi-label learning is an effective framework for learning with objects that have multiple semantic labels, and has been successfully applied into many real-world tasks, In contrast with traditional single-label learning, the cost of la- beling a multi-label example is rather high, thus it becomes an important task to train an effective multi-label learning model with as few labeled examples as possible. Active learning, which actively selects the most valuable data to query their labels, is the most important approach to reduce labeling cost. In this paper, we propose a novel approach MADM for batch mode multi-label active learning. On one hand, MADM exploits representativeness and diversity in both the feature and label space by matching the distribution between labeled and unlabeled data. On the other hand, it tends to query predicted positive instances, which are expected to be more informative than negative ones. Experiments on benchmark datasets demonstrate that the proposed approach can reduce the labeling cost significantly. 展开更多
关键词 multi-label learning batch mode active learning distribution matching
原文传递
Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations,clustering,and active learning 被引量:3
18
作者 Mohammad Chegini Jürgen Bernard +3 位作者 Philip Berger Alexei Sourin Keith Andrews Tobias Schreck 《Visual Informatics》 EI 2019年第1期9-17,共9页
Supervised machine learning techniques require labelled multivariate training datasets.Many approaches address the issue of unlabelled datasets by tightly coupling machine learning algorithms with interactive visuali... Supervised machine learning techniques require labelled multivariate training datasets.Many approaches address the issue of unlabelled datasets by tightly coupling machine learning algorithms with interactive visualisations.Using appropriate techniques,analysts can play an active role in a highly interactive and iterative machine learning process to label the dataset and create meaningful partitions.While this principle has been implemented either for unsupervised,semi-supervised,or supervised machine learning tasks,the combination of all three methodologies remains challenging.In this paper,a visual analytics approach is presented,combining a variety of machine learning capabilities with four linked visualisation views,all integrated within the mVis(multivariate Visualiser)system.The available palette of techniques allows an analyst to perform exploratory data analysis on a multivariate dataset and divide it into meaningful labelled partitions,from which a classifier can be built.In the workflow,the analyst can label interesting patterns or outliers in a semi-supervised process supported by active learning.Once a dataset has been interactively labelled,the analyst can continue the workflow with supervised machine learning to assess to what degree the subsequent classifier has effectively learned the concepts expressed in the labelled training dataset.Using a novel technique called automatic dimension selection,interactions the analyst had with dimensions of the multivariate dataset are used to steer the machine learning algorithms.A real-world football dataset is used to show the utility of mVis for a series of analysis and labelling tasks,from initial labelling through iterations of data exploration,clustering,classification,and active learning to refine the named partitions,to finally producing a high-quality labelled training dataset suitable for training a classifier.The tool empowers the analyst with interactive visualisations including scatterplots,parallel coordinates,similarity maps for records,and a new similarity map for partitions. 展开更多
关键词 Labelling CLUSTERING CLASSIFICATION active learning Multivariate data Visualisation
原文传递
A genetic algorithm based entity resolution approach with active learning 被引量:1
19
作者 Chenchen SUN Derong SHEN +2 位作者 Yue KOU Tiezheng NIE Ge YU 《Frontiers of Computer Science》 SCIE EI CSCD 2017年第1期147-159,共13页
Entity resolution is a key aspect in data quality and data integration, identifying which records correspond to the same real world entity in data sources. Many existing ap- proaches require manually designed match ru... Entity resolution is a key aspect in data quality and data integration, identifying which records correspond to the same real world entity in data sources. Many existing ap- proaches require manually designed match rules to solve the problem, which always needs domain knowledge and is time consuming. We propose a novel genetic algorithm based en- tity resolution approach via active learning. It is able to learn effective match rules by logically combining several different attributes' comparisons with proper thresholds. We use ac- tive learning to reduce manually labeled data and speed up the learning process. The extensive evaluation shows that the proposed approach outperforms the sate-of-the-art entity res- olution approaches in accuracy. 展开更多
关键词 entity resolution genetic algorithm active learning data quality data integration
原文传递
A Unified Active Learning Framework for Biomedical Relation Extraction 被引量:1
20
作者 张宏涛 黄民烈 朱小燕 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第6期1302-1313,共12页
Supervised machine learning methods have been employed with great success in the task of biomedical relation extraction. However, existing methods are not practical enough, since manual construction of large training ... Supervised machine learning methods have been employed with great success in the task of biomedical relation extraction. However, existing methods are not practical enough, since manual construction of large training data is very expensive. Therefore, active learning is urgently needed for designing practical relation extraction methods with little human effort. In this paper, we describe a unified active learning framework. Particularly, our framework systematically addresses some practical issues during active learning process, including a strategy for selecting informative data, a data diversity selection algorithm, an active feature acquisition method, and an informative feature selection algorithm, in order to meet the challenges due to the immense amount of complex and diverse biomedical text. The framework is evaluated on protein- protein interaction (PPI) extraction and is shown to achieve promising results with a significant reduction in editorial effort and labeling time. 展开更多
关键词 biomedical relation extraction active learning unified framework
原文传递
上一页 1 2 3 下一页 到第
使用帮助 返回顶部