Machine learning combined with density functional theory(DFT)enables rapid exploration of catalyst descriptors space such as adsorption energy,facilitating rapid and effective catalyst screening.However,there is still...Machine learning combined with density functional theory(DFT)enables rapid exploration of catalyst descriptors space such as adsorption energy,facilitating rapid and effective catalyst screening.However,there is still a lack of models for predicting adsorption energies on oxides,due to the complexity of elemental species and the ambiguous coordination environment.This work proposes an active learning workflow(LeNN)founded on local electronic transfer features(e)and the principle of coordinate rotation invariance.By accurately characterizing the electron transfer to adsorption site atoms and their surrounding geometric structures,LeNN mitigates abrupt feature changes due to different element types and clarifies coordination environments.As a result,it enables the prediction of^(*)H adsorption energy on binary oxide surfaces with a mean absolute error(MAE)below 0.18 eV.Moreover,we incorporate local coverage(θ_(l))and leverage neutral network ensemble to establish an active learning workflow,attaining a prediction MAE below 0.2 eV for 5419 multi-^(*)H adsorption structures.These findings validate the universality and capability of the proposed features in predicting^(*)H adsorption energy on binary oxide surfaces.展开更多
The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression data.However,labeling large datasets demands signific...The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression data.However,labeling large datasets demands significant human,time,and financial resources.Although active learning methods have mitigated the dependency on extensive labeled data,a cold-start problem persists in small to medium-sized expression recognition datasets.This issue arises because the initial labeled data often fails to represent the full spectrum of facial expression characteristics.This paper introduces an active learning approach that integrates uncertainty estimation,aiming to improve the precision of facial expression recognition regardless of dataset scale variations.The method is divided into two primary phases.First,the model undergoes self-supervised pre-training using contrastive learning and uncertainty estimation to bolster its feature extraction capabilities.Second,the model is fine-tuned using the prior knowledge obtained from the pre-training phase to significantly improve recognition accuracy.In the pretraining phase,the model employs contrastive learning to extract fundamental feature representations from the complete unlabeled dataset.These features are then weighted through a self-attention mechanism with rank regularization.Subsequently,data from the low-weighted set is relabeled to further refine the model’s feature extraction ability.The pre-trained model is then utilized in active learning to select and label information-rich samples more efficiently.Experimental results demonstrate that the proposed method significantly outperforms existing approaches,achieving an improvement in recognition accuracy of 5.09%and 3.82%over the best existing active learning methods,Margin,and Least Confidence methods,respectively,and a 1.61%improvement compared to the conventional segmented active learning method.展开更多
Graph learning,when used as a semi-supervised learning(SSL)method,performs well for classification tasks with a low label rate.We provide a graph-based batch active learning pipeline for pixel/patch neighborhood multi...Graph learning,when used as a semi-supervised learning(SSL)method,performs well for classification tasks with a low label rate.We provide a graph-based batch active learning pipeline for pixel/patch neighborhood multi-or hyperspectral image segmentation.Our batch active learning approach selects a collection of unlabeled pixels that satisfy a graph local maximum constraint for the active learning acquisition function that determines the relative importance of each pixel to the classification.This work builds on recent advances in the design of novel active learning acquisition functions(e.g.,the Model Change approach in arXiv:2110.07739)while adding important further developments including patch-neighborhood image analysis and batch active learning methods to further increase the accuracy and greatly increase the computational efficiency of these methods.In addition to improvements in the accuracy,our approach can greatly reduce the number of labeled pixels needed to achieve the same level of the accuracy based on randomly selected labeled pixels.展开更多
Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to bes...Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to best improve performance while limiting the number of new labels."Model Change"active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s).We pair this idea with graph-based semi-supervised learning(SSL)methods,that use the spectrum of the graph Laplacian matrix,which can be truncated to avoid prohibitively large computational and storage costs.We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution.We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.展开更多
The sampling of the training data is a bottleneck in the development of artificial intelligence(AI)models due to the processing of huge amounts of data or to the difficulty of access to the data in industrial practice...The sampling of the training data is a bottleneck in the development of artificial intelligence(AI)models due to the processing of huge amounts of data or to the difficulty of access to the data in industrial practices.Active learning(AL)approaches are useful in such a context since they maximize the performance of the trained model while minimizing the number of training samples.Such smart sampling methodologies iteratively sample the points that should be labeled and added to the training set based on their informativeness and pertinence.To judge the relevance of a data instance,query rules are defined.In this paper,we propose an AL methodology based on a physics-based query rule.Given some industrial objectives from the physical process where the AI model is implied in,the physics-based AL approach iteratively converges to the data instances fulfilling those objectives while sampling training points.Therefore,the trained surrogate model is accurate where the potentially interesting data instances from the industrial point of view are,while coarse everywhere else where the data instances are of no interest in the industrial context studied.展开更多
AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize anno...AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize annotation costs,and to optimize the ALFA-Mix active learning algorithm and apply it to HMM classification.METHODS:The optimized ALFA-Mix algorithm(ALFAMix+)was compared with five algorithms,including ALFA-Mix.Four models,including Res Net18,were established.Each algorithm was combined with four models for experiments on the HMM dataset.Each experiment consisted of 20 active learning rounds,with 100 images selected per round.The algorithm was evaluated by comparing the number of rounds in which ALFA-Mix+outperformed other algorithms.Finally,this study employed six models,including Efficient Former,to classify HMM.The best-performing model among these models was selected as the baseline model and combined with the ALFA-Mix+algorithm to achieve satisfactor y classification results with a small dataset.RESULTS:ALFA-Mix+outperforms other algorithms with an average superiority of 16.6,14.75,16.8,and 16.7 rounds in terms of accuracy,sensitivity,specificity,and Kappa value,respectively.This study conducted experiments on classifying HMM using several advanced deep learning models with a complete training set of 4252 images.The Efficient Former achieved the best results with an accuracy,sensitivity,specificity,and Kappa value of 0.8821,0.8334,0.9693,and 0.8339,respectively.Therefore,by combining ALFA-Mix+with Efficient Former,this study achieved results with an accuracy,sensitivity,specificity,and Kappa value of 0.8964,0.8643,0.9721,and 0.8537,respectively.CONCLUSION:The ALFA-Mix+algorithm reduces the required samples without compromising accuracy.Compared to other algorithms,ALFA-Mix+outperforms in more rounds of experiments.It effectively selects valuable samples compared to other algorithms.In HMM classification,combining ALFA-Mix+with Efficient Former enhances model performance,further demonstrating the effectiveness of ALFA-Mix+.展开更多
By combining machine learning with the design of experiments,thereby achieving so-called active machine learning,more efficient and cheaper research can be conducted.Machine learning algorithms are more flexible and a...By combining machine learning with the design of experiments,thereby achieving so-called active machine learning,more efficient and cheaper research can be conducted.Machine learning algorithms are more flexible and are better than traditional design of experiment algorithms at investigating processes spanning all length scales of chemical engineering.While active machine learning algorithms are maturing,their applications are falling behind.In this article,three types of challenges presented by active machine learning—namely,convincing the experimental researcher,the flexibility of data creation,and the robustness of active machine learning algorithms—are identified,and ways to overcome them are discussed.A bright future lies ahead for active machine learning in chemical engineering,thanks to increasing automation and more efficient algorithms that can drive novel discoveries.展开更多
Personalized education provides an open learning environment which enriches the advanced technologies to establish a paradigm shift, active and dynamic teaching and learning patterns. E-learning has a various establis...Personalized education provides an open learning environment which enriches the advanced technologies to establish a paradigm shift, active and dynamic teaching and learning patterns. E-learning has a various established approaches to the creation and sequencing of content-based, single learner, and self-paced learning objects. However, there is little understanding of how to create sequences of learning activities which involve groups of learners interacting within a structured set of collaborative environments. In this paper, we present an approach for learning activity sequencing based on ontology and activity graph in personalized education system. Modeling and management of learning activity and learner are depicted, and an algorithm is proposed to realize learning activity sequencing and learner ontology dynamically updating.展开更多
This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a rand...This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a random input point can be postulated through a classifier implemented through the modified K-nearest neighbors algorithm.Compared to other active learning methods resorting to experimental designs,the proposed method is characterized by employing Monte-Carlo simulation for sampling inputs and saving a large portion of the actual evaluations of outputs through an accurate classification,which is applicable for most structural reliability estimation problems.Moreover,the validity,efficiency,and accuracy of the proposed method are demonstrated numerically.In addition,the optimal value of K that maximizes the computational efficiency is studied.Finally,the proposed method is applied to the reliability estimation of the carbon fiber reinforced silicon carbide composite specimens subjected to random displacements,which further validates its practicability.展开更多
The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in ...The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts.The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning.More specifically,this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels.To achieve this objective,different experiments have been performed on the publicly available dataset.In first set of experiments,we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set.In the second set of experiments,we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set.The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3%on sentence level datasets for auto labelling.展开更多
Active learning(AL)trains a high-precision predictor model from small numbers of labeled data by iteratively annotating the most valuable data sample from an unlabeled data pool with a class label throughout the learn...Active learning(AL)trains a high-precision predictor model from small numbers of labeled data by iteratively annotating the most valuable data sample from an unlabeled data pool with a class label throughout the learning process.However,most current AL methods start with the premise that the labels queried at AL rounds must be free of ambiguity,which may be unrealistic in some real-world applications where only a set of candidate labels can be obtained for selected data.Besides,most of the existing AL algorithms only consider the case of centralized processing,which necessitates gathering together all the unlabeled data in one fusion center for selection.Considering that data are collected/stored at different nodes over a network in many real-world scenarios,distributed processing is chosen here.In this paper,the issue of distributed classification of partially labeled(PL)data obtained by a fully decentralized AL method is focused on,and a distributed active partial label learning(dAPLL)algorithm is proposed.Our proposed algorithm is composed of a fully decentralized sample selection strategy and a distributed partial label learning(PLL)algorithm.During the sample selection process,both the uncertainty and representativeness of the data are measured based on the global cluster centers obtained by a distributed clustering method,and the valuable samples are chosen in turn.Meanwhile,using the disambiguation-free strategy,a series of binary classification problems can be constructed,and the corresponding cost-sensitive classifiers can be cooperatively trained in a distributed manner.The experiment results conducted on several datasets demonstrate that the performance of the dAPLL algorithm is comparable to that of the corresponding centralized method and is superior to the existing active PLL(APLL)method in different parameter configurations.Besides,our proposed algorithm outperforms several current PLL methods using the random selection strategy,especially when only small amounts of data are selected to be assigned with the candidate labels.展开更多
Background With the development of information technology,there is a significant increase in the number of network traffic logs mixed with various types of cyberattacks.Traditional intrusion detection systems(IDSs)are...Background With the development of information technology,there is a significant increase in the number of network traffic logs mixed with various types of cyberattacks.Traditional intrusion detection systems(IDSs)are limited in detecting new inconstant patterns and identifying malicious traffic traces in real time.Therefore,there is an urgent need to implement more effective intrusion detection technologies to protect computer security.Methods In this study,we designed a hybrid IDS by combining our incremental learning model(KANSOINN)and active learning to learn new log patterns and detect various network anomalies in real time.Conclusions Experimental results on the NSLKDD dataset showed that KAN-SOINN can be continuously improved and effectively detect malicious logs.Meanwhile,comparative experiments proved that using a hybrid query strategy in active learning can improve the model learning efficiency.展开更多
In this study, the author will investigate and utilize advanced machine learning models related to two different methodologies to determine the best and most effective way to predict individuals with heart failure and...In this study, the author will investigate and utilize advanced machine learning models related to two different methodologies to determine the best and most effective way to predict individuals with heart failure and cardiovascular diseases. The first methodology involves a list of classification machine learning algorithms, and the second methodology involves the use of a deep learning algorithm known as MLP or Multilayer Perceptrons. Globally, hospitals are dealing with cases related to cardiovascular diseases and heart failure as they are major causes of death, not only for overweight individuals but also for those who do not adopt a healthy diet and lifestyle. Often, heart failures and cardiovascular diseases can be caused by many factors, including cardiomyopathy, high blood pressure, coronary heart disease, and heart inflammation [1]. Other factors, such as irregular shocks or stress, can also contribute to heart failure or a heart attack. While these events cannot be predicted, continuous data from patients’ health can help doctors predict heart failure. Therefore, this data-driven research utilizes advanced machine learning and deep learning techniques to better analyze and manipulate the data, providing doctors with informative decision-making tools regarding a person’s likelihood of experiencing heart failure. In this paper, the author employed advanced data preprocessing and cleaning techniques. Additionally, the dataset underwent testing using two different methodologies to determine the most effective machine-learning technique for producing optimal predictions. The first methodology involved employing a list of supervised classification machine learning algorithms, including Naïve Bayes (NB), KNN, logistic regression, and the SVM algorithm. The second methodology utilized a deep learning (DL) algorithm known as Multilayer Perceptrons (MLPs). This algorithm provided the author with the flexibility to experiment with different layer sizes and activation functions, such as ReLU, logistic (sigmoid), and Tanh. Both methodologies produced optimal models with high-level accuracy rates. The first methodology involves a list of supervised machine learning algorithms, including KNN, SVM, Adaboost, Logistic Regression, Naive Bayes, and Decision Tree algorithms. They achieved accuracy rates of 86%, 89%, 89%, 81%, 79%, and 99%, respectively. The author clearly explained that Decision Tree algorithm is not suitable for the dataset at hand due to overfitting issues. Therefore, it was discarded as an optimal model to be used. However, the latter methodology (Neural Network) demonstrated the most stable and optimal accuracy, achieving over 87% accuracy while adapting well to real-life situations and requiring low computing power overall. A performance assessment and evaluation were carried out based on a confusion matrix report to demonstrate feasibility and performance. The author concluded that the performance of the model in real-life situations can advance not only the medical field of science but also mathematical concepts. Additionally, the advanced preprocessing approach behind the model can provide value to the Data Science community. The model can be further developed by employing various optimization techniques to handle even larger datasets related to heart failures. Furthermore, different neural network algorithms can be tested to explore alternative approaches and yield different results.展开更多
This research addresses the challenges of training large semantic segmentation models for image analysis,focusing on expediting the annotation process and mitigating imbalanced datasets.In the context of imbalanced da...This research addresses the challenges of training large semantic segmentation models for image analysis,focusing on expediting the annotation process and mitigating imbalanced datasets.In the context of imbalanced datasets,biases related to age and gender in clinical contexts and skewed representation in natural images can affect model performance.Strategies to mitigate these biases are explored to enhance efficiency and accuracy in semantic segmentation analysis.An in-depth exploration of various reinforced active learning methodologies for image segmentation is conducted,optimizing precision and efficiency across diverse domains.The proposed framework integrates Dueling Deep Q-Networks(DQN),Prioritized Experience Replay,Noisy Networks,and Emphasizing Recent Experience.Extensive experimentation and evaluation of diverse datasets reveal both improvements and limitations associated with various approaches in terms of overall accuracy and efficiency.This research contributes to the expansion of reinforced active learning methodologies for image segmentation,paving the way for more sophisticated and precise segmentation algorithms across diverse domains.The findings emphasize the need for a careful balance between exploration and exploitation strategies in reinforcement learning for effective image segmentation.展开更多
Background and Objective:Social media(SoMe)has emerged as a tool in health professions education(HPE),particularly amidst the challenges posed by the coronavirus disease 2019(COVID-19)pandemic.Despite the academia’s ...Background and Objective:Social media(SoMe)has emerged as a tool in health professions education(HPE),particularly amidst the challenges posed by the coronavirus disease 2019(COVID-19)pandemic.Despite the academia’s initial skepticism SoMe has been gaining traction in supporting learning communities,and offering opportunities for innovation in HPE.Our study aims to explore the integration of SoMe in HPE.Four key components were outlined as necessary for a successful integration,and include designing learning experiences,defining educator roles,selecting appropriate platforms,and establishing educational objectives.Methods:This article stemmed from the online Teaching Skills Series module on SoMe in education from the Ophthalmology Foundation,and drew upon evidence supporting learning theories relevant to SoMe integration and models of education.Additionally,we conducted a literature review considering Englishlanguage articles on the application of SoMe in ophthalmology from PubMed over the past decade.Key Content and Findings:Early adopters of SoMe platforms in HPE have leveraged these tools to enhance learning experiences through interaction,dialogue,content sharing,and active learning strategies.By integrating SoMe into educational programs,both online and in-person,educators can overcome time and geographical constraints,fostering more diverse and inclusive learning communities.Careful consideration is,however,necessary to address potential limitations within HPE.Conclusions:This article lays groundwork for expanding SoMe integration in HPE design,emphasizing the supportive scaffold of various learning theories,and the need of furthering robust research on examining its advantages over traditional educational formats.Our literature review underscores an ongoing multifaceted,random application of SoMe platforms in ophthalmology education.We advocate for an effective incorporation of SoMe in HPE education,with the need to comply with good educational practice.展开更多
Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurat...Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurations.This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms.Initially,an improved active learning approach is employed to select the most valuable unlabeled samples,which are subsequently submitted for expert labeling.This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set.Then the labeled samples are utilized to train the model for network configuration entity extraction.Furthermore,the multi-head self-attention of the transformer model is enhanced by introducing the Adaptive Weighting method based on the Laplace mixture distribution.This enhancement enables the transformer model to dynamically adapt its focus to words in various positions,displaying exceptional adaptability to abnormal data and further elevating the accuracy of the proposed model.Through comparisons with Random Sampling(RANDOM),Maximum Normalized Log-Probability(MNLP),Least Confidence(LC),Token Entrop(TE),and Entropy Query by Bagging(EQB),the proposed method,Entropy Query by Bagging and Maximum Influence Active Learning(EQBMIAL),achieves comparable performance with only 40% of the samples on both datasets,while other algorithms require 50% of the samples.Furthermore,the entity extraction algorithm with the Adaptive Weighted Multi-head Attention mechanism(AW-MHA)is compared with BILSTM-CRF,Mutil_Attention-Bilstm-Crf,Deep_Neural_Model_NER and BERT_Transformer,achieving precision rates of 75.98% and 98.32% on the two datasets,respectively.Statistical tests demonstrate the statistical significance and effectiveness of the proposed algorithms in this paper.展开更多
This paper describes a new method for active learning in content-based image retrieval. The proposed method firstly uses support vector machine (SVM) classifiers to learn an initial query concept. Then the proposed ac...This paper describes a new method for active learning in content-based image retrieval. The proposed method firstly uses support vector machine (SVM) classifiers to learn an initial query concept. Then the proposed active learning scheme employs similarity measure to check the current version space and selects images with maximum expected information gain to solicit user's label. Finally, the learned query is refined based on the user's further feedback. With the combination of SVM classifier and similarity measure, the proposed method can alleviate model bias existing in each of them. Our experiments on several query concepts show that the proposed method can learn the user's query concept quickly and effectively only with several iterations.展开更多
This paper explores the integration of the bridge-in,objectives,pre-assessment,participatory activities,post-assessment and summary(BOPPPS)teaching model within the context of the post-graduates Academic English cours...This paper explores the integration of the bridge-in,objectives,pre-assessment,participatory activities,post-assessment and summary(BOPPPS)teaching model within the context of the post-graduates Academic English course.It discusses how this structured approach can effectively enhance students’language proficiency,foster critical thinking skills,and align with the multifaceted objectives of advanced English language education.The study provides a detailed examination of each BOPPPS component as applied to the post-graduates Academic English curriculum,supported by theoretical underpinnings and practical implications.展开更多
This study investigates the efficacy of the Mathematics Independent Learning Activity Practice and Play Unite Scheme(MILAPlus)as an instructional strategy to improve the proficiency levels of Grade 9 students in quadr...This study investigates the efficacy of the Mathematics Independent Learning Activity Practice and Play Unite Scheme(MILAPlus)as an instructional strategy to improve the proficiency levels of Grade 9 students in quadratic equations and functions through a study carried out at Quezon National High School.The research involved 116 Grade 9 students and utilized a quantitative approach,incorporating both pre-assessment and post-assessment measures.The research utilizes a quasi-experimental design,examining the academic performance of students before and after the introduction of MILAPlus.The pre-assessment establishes a baseline,and the subsequent post-assessment measures the impact of the instructional strategy.Statistical analyses,including t-tests,assess the significance of differences in mean scores and mean percentage scores,providing quantitative insights into the effectiveness of MILAPlus.Findings from the study revealed a statistically significant improvement in both mean scores and mean percentage scores after the utilization of MILAPlus,indicating enhanced proficiency in quadratic equations and functions.The Mean Proficiency Scores(MPS)also showed a substantial increase,demonstrating a marked improvement in overall proficiency levels among Grade 9 students.In light of the results,recommendations were given including the continued utilization of MILAPlus as an instructional strategy and aligning its development with prescribed learning competencies.Emphasizing the consistent adherence to policies and guidelines for MILAPlus implementation is suggested for sustaining positive effects on students’long-term performance in mathematics.This research contributes valuable insights into the practical application and effectiveness of MILAPlus within the context of Grade 9 mathematics education at Quezon National High School.展开更多
In this paper, we present a novel Support Vector Machine active learning algorithm for effective 3D model retrieval using the concept of relevance feedback. The proposed method learns from the most informative objects...In this paper, we present a novel Support Vector Machine active learning algorithm for effective 3D model retrieval using the concept of relevance feedback. The proposed method learns from the most informative objects which are marked by the user, and then creates a boundary separating the relevant models from irrelevant ones. What it needs is only a small number of 3D models labelled by the user. It can grasp the user's semantic knowledge rapidly and accurately. Experimental results showed that the proposed algorithm significantly improves the retrieval effectiveness. Compared with four state-of-the-art query refinement schemes for 3D model retrieval, it provides superior retrieval performance after no more than two rounds of relevance feedback.展开更多
基金supported by the National Natural Science Foundation of China(No.52488201)the Natural Science Basic Research Program of Shaanxi(No.2024JC-YBMS-284)+1 种基金the Key Research and Development Program of Shaanxi(No.2024GHYBXM-02)the Fundamental Research Funds for the Central Universities.
文摘Machine learning combined with density functional theory(DFT)enables rapid exploration of catalyst descriptors space such as adsorption energy,facilitating rapid and effective catalyst screening.However,there is still a lack of models for predicting adsorption energies on oxides,due to the complexity of elemental species and the ambiguous coordination environment.This work proposes an active learning workflow(LeNN)founded on local electronic transfer features(e)and the principle of coordinate rotation invariance.By accurately characterizing the electron transfer to adsorption site atoms and their surrounding geometric structures,LeNN mitigates abrupt feature changes due to different element types and clarifies coordination environments.As a result,it enables the prediction of^(*)H adsorption energy on binary oxide surfaces with a mean absolute error(MAE)below 0.18 eV.Moreover,we incorporate local coverage(θ_(l))and leverage neutral network ensemble to establish an active learning workflow,attaining a prediction MAE below 0.2 eV for 5419 multi-^(*)H adsorption structures.These findings validate the universality and capability of the proposed features in predicting^(*)H adsorption energy on binary oxide surfaces.
基金supported by National Science Foundation of China(61971078)Chongqing Municipal Education Commission Science and Technology Major Project(KJZDM202301901).
文摘The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression data.However,labeling large datasets demands significant human,time,and financial resources.Although active learning methods have mitigated the dependency on extensive labeled data,a cold-start problem persists in small to medium-sized expression recognition datasets.This issue arises because the initial labeled data often fails to represent the full spectrum of facial expression characteristics.This paper introduces an active learning approach that integrates uncertainty estimation,aiming to improve the precision of facial expression recognition regardless of dataset scale variations.The method is divided into two primary phases.First,the model undergoes self-supervised pre-training using contrastive learning and uncertainty estimation to bolster its feature extraction capabilities.Second,the model is fine-tuned using the prior knowledge obtained from the pre-training phase to significantly improve recognition accuracy.In the pretraining phase,the model employs contrastive learning to extract fundamental feature representations from the complete unlabeled dataset.These features are then weighted through a self-attention mechanism with rank regularization.Subsequently,data from the low-weighted set is relabeled to further refine the model’s feature extraction ability.The pre-trained model is then utilized in active learning to select and label information-rich samples more efficiently.Experimental results demonstrate that the proposed method significantly outperforms existing approaches,achieving an improvement in recognition accuracy of 5.09%and 3.82%over the best existing active learning methods,Margin,and Least Confidence methods,respectively,and a 1.61%improvement compared to the conventional segmented active learning method.
基金supported by the UC-National Lab In-Residence Graduate Fellowship Grant L21GF3606supported by a DOD National Defense Science and Engineering Graduate(NDSEG)Research Fellowship+1 种基金supported by the Laboratory Directed Research and Development program of Los Alamos National Laboratory under project numbers 20170668PRD1 and 20210213ERsupported by the NGA under Contract No.HM04762110003.
文摘Graph learning,when used as a semi-supervised learning(SSL)method,performs well for classification tasks with a low label rate.We provide a graph-based batch active learning pipeline for pixel/patch neighborhood multi-or hyperspectral image segmentation.Our batch active learning approach selects a collection of unlabeled pixels that satisfy a graph local maximum constraint for the active learning acquisition function that determines the relative importance of each pixel to the classification.This work builds on recent advances in the design of novel active learning acquisition functions(e.g.,the Model Change approach in arXiv:2110.07739)while adding important further developments including patch-neighborhood image analysis and batch active learning methods to further increase the accuracy and greatly increase the computational efficiency of these methods.In addition to improvements in the accuracy,our approach can greatly reduce the number of labeled pixels needed to achieve the same level of the accuracy based on randomly selected labeled pixels.
基金supported by the DOD National Defense Science and Engineering Graduate(NDSEG)Research Fellowshipsupported by the NGA under Contract No.HM04762110003.
文摘Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to best improve performance while limiting the number of new labels."Model Change"active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s).We pair this idea with graph-based semi-supervised learning(SSL)methods,that use the spectrum of the graph Laplacian matrix,which can be truncated to avoid prohibitively large computational and storage costs.We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution.We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.
文摘The sampling of the training data is a bottleneck in the development of artificial intelligence(AI)models due to the processing of huge amounts of data or to the difficulty of access to the data in industrial practices.Active learning(AL)approaches are useful in such a context since they maximize the performance of the trained model while minimizing the number of training samples.Such smart sampling methodologies iteratively sample the points that should be labeled and added to the training set based on their informativeness and pertinence.To judge the relevance of a data instance,query rules are defined.In this paper,we propose an AL methodology based on a physics-based query rule.Given some industrial objectives from the physical process where the AI model is implied in,the physics-based AL approach iteratively converges to the data instances fulfilling those objectives while sampling training points.Therefore,the trained surrogate model is accurate where the potentially interesting data instances from the industrial point of view are,while coarse everywhere else where the data instances are of no interest in the industrial context studied.
基金Supported by the National Natural Science Foundation of China(No.61906066)the Zhejiang Provincial Philosophy and Social Science Planning Project(No.21NDJC021Z)+4 种基金Shenzhen Fund for Guangdong Provincial High-level Clinical Key Specialties(No.SZGSP014)Sanming Project of Medicine in Shenzhen(No.SZSM202011015)Shenzhen Science and Technology Planning Project(No.KCXFZ20211020163813019)the Natural Science Foundation of Ningbo City(No.202003N4072)the Postgraduate Research and Innovation Project of Huzhou University(No.2023KYCX52)。
文摘AIM:To conduct a classification study of high myopic maculopathy(HMM)using limited datasets,including tessellated fundus,diffuse chorioretinal atrophy,patchy chorioretinal atrophy,and macular atrophy,and minimize annotation costs,and to optimize the ALFA-Mix active learning algorithm and apply it to HMM classification.METHODS:The optimized ALFA-Mix algorithm(ALFAMix+)was compared with five algorithms,including ALFA-Mix.Four models,including Res Net18,were established.Each algorithm was combined with four models for experiments on the HMM dataset.Each experiment consisted of 20 active learning rounds,with 100 images selected per round.The algorithm was evaluated by comparing the number of rounds in which ALFA-Mix+outperformed other algorithms.Finally,this study employed six models,including Efficient Former,to classify HMM.The best-performing model among these models was selected as the baseline model and combined with the ALFA-Mix+algorithm to achieve satisfactor y classification results with a small dataset.RESULTS:ALFA-Mix+outperforms other algorithms with an average superiority of 16.6,14.75,16.8,and 16.7 rounds in terms of accuracy,sensitivity,specificity,and Kappa value,respectively.This study conducted experiments on classifying HMM using several advanced deep learning models with a complete training set of 4252 images.The Efficient Former achieved the best results with an accuracy,sensitivity,specificity,and Kappa value of 0.8821,0.8334,0.9693,and 0.8339,respectively.Therefore,by combining ALFA-Mix+with Efficient Former,this study achieved results with an accuracy,sensitivity,specificity,and Kappa value of 0.8964,0.8643,0.9721,and 0.8537,respectively.CONCLUSION:The ALFA-Mix+algorithm reduces the required samples without compromising accuracy.Compared to other algorithms,ALFA-Mix+outperforms in more rounds of experiments.It effectively selects valuable samples compared to other algorithms.In HMM classification,combining ALFA-Mix+with Efficient Former enhances model performance,further demonstrating the effectiveness of ALFA-Mix+.
基金financial support from the Fund for Scientific Research Flanders(FWO Flanders)through the doctoral fellowship grants(1185822N,1S45522N,and 3F018119)funding from the European Research Council(ERC)under the European Union’s Horizon 2020 research and innovation programme(818607)。
文摘By combining machine learning with the design of experiments,thereby achieving so-called active machine learning,more efficient and cheaper research can be conducted.Machine learning algorithms are more flexible and are better than traditional design of experiment algorithms at investigating processes spanning all length scales of chemical engineering.While active machine learning algorithms are maturing,their applications are falling behind.In this article,three types of challenges presented by active machine learning—namely,convincing the experimental researcher,the flexibility of data creation,and the robustness of active machine learning algorithms—are identified,and ways to overcome them are discussed.A bright future lies ahead for active machine learning in chemical engineering,thanks to increasing automation and more efficient algorithms that can drive novel discoveries.
基金the National Natural Science Foundation of China (60473076, 60573095)
文摘Personalized education provides an open learning environment which enriches the advanced technologies to establish a paradigm shift, active and dynamic teaching and learning patterns. E-learning has a various established approaches to the creation and sequencing of content-based, single learner, and self-paced learning objects. However, there is little understanding of how to create sequences of learning activities which involve groups of learners interacting within a structured set of collaborative environments. In this paper, we present an approach for learning activity sequencing based on ontology and activity graph in personalized education system. Modeling and management of learning activity and learner are depicted, and an algorithm is proposed to realize learning activity sequencing and learner ontology dynamically updating.
基金supported by the National Natural Science Foundation of China(Grant No.12002246 and No.52178301)Knowledge Innovation Program of Wuhan(Grant No.2022010801020357)+2 种基金the Science Research Foundation of Wuhan Institute of Technology(Grant No.K2021030)2020 annual Open Fund of Failure Mechanics&Engineering Disaster Prevention and Mitigation,Key Laboratory of Sichuan Province(Sichuan University)(Grant No.2020JDS0022)Open Research Fund Program of Hubei Provincial Key Laboratory of Chemical Equipment Intensification and Intrinsic Safety(Grant No.2019KA03)。
文摘This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a random input point can be postulated through a classifier implemented through the modified K-nearest neighbors algorithm.Compared to other active learning methods resorting to experimental designs,the proposed method is characterized by employing Monte-Carlo simulation for sampling inputs and saving a large portion of the actual evaluations of outputs through an accurate classification,which is applicable for most structural reliability estimation problems.Moreover,the validity,efficiency,and accuracy of the proposed method are demonstrated numerically.In addition,the optimal value of K that maximizes the computational efficiency is studied.Finally,the proposed method is applied to the reliability estimation of the carbon fiber reinforced silicon carbide composite specimens subjected to random displacements,which further validates its practicability.
基金the Deanship of Scientific Research at Shaqra University for supporting this work.
文摘The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts.The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning.More specifically,this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels.To achieve this objective,different experiments have been performed on the publicly available dataset.In first set of experiments,we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set.In the second set of experiments,we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set.The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3%on sentence level datasets for auto labelling.
基金supported by the National Natural Science Foundation of China(62201398)Natural Science Foundation of Zhejiang Province(LY21F020001),Science and Technology Plan Project of Wenzhou(ZG2020026).
文摘Active learning(AL)trains a high-precision predictor model from small numbers of labeled data by iteratively annotating the most valuable data sample from an unlabeled data pool with a class label throughout the learning process.However,most current AL methods start with the premise that the labels queried at AL rounds must be free of ambiguity,which may be unrealistic in some real-world applications where only a set of candidate labels can be obtained for selected data.Besides,most of the existing AL algorithms only consider the case of centralized processing,which necessitates gathering together all the unlabeled data in one fusion center for selection.Considering that data are collected/stored at different nodes over a network in many real-world scenarios,distributed processing is chosen here.In this paper,the issue of distributed classification of partially labeled(PL)data obtained by a fully decentralized AL method is focused on,and a distributed active partial label learning(dAPLL)algorithm is proposed.Our proposed algorithm is composed of a fully decentralized sample selection strategy and a distributed partial label learning(PLL)algorithm.During the sample selection process,both the uncertainty and representativeness of the data are measured based on the global cluster centers obtained by a distributed clustering method,and the valuable samples are chosen in turn.Meanwhile,using the disambiguation-free strategy,a series of binary classification problems can be constructed,and the corresponding cost-sensitive classifiers can be cooperatively trained in a distributed manner.The experiment results conducted on several datasets demonstrate that the performance of the dAPLL algorithm is comparable to that of the corresponding centralized method and is superior to the existing active PLL(APLL)method in different parameter configurations.Besides,our proposed algorithm outperforms several current PLL methods using the random selection strategy,especially when only small amounts of data are selected to be assigned with the candidate labels.
基金Supported by SJTU-HUAWEI TECH Cybersecurity Innovation Lab。
文摘Background With the development of information technology,there is a significant increase in the number of network traffic logs mixed with various types of cyberattacks.Traditional intrusion detection systems(IDSs)are limited in detecting new inconstant patterns and identifying malicious traffic traces in real time.Therefore,there is an urgent need to implement more effective intrusion detection technologies to protect computer security.Methods In this study,we designed a hybrid IDS by combining our incremental learning model(KANSOINN)and active learning to learn new log patterns and detect various network anomalies in real time.Conclusions Experimental results on the NSLKDD dataset showed that KAN-SOINN can be continuously improved and effectively detect malicious logs.Meanwhile,comparative experiments proved that using a hybrid query strategy in active learning can improve the model learning efficiency.
文摘In this study, the author will investigate and utilize advanced machine learning models related to two different methodologies to determine the best and most effective way to predict individuals with heart failure and cardiovascular diseases. The first methodology involves a list of classification machine learning algorithms, and the second methodology involves the use of a deep learning algorithm known as MLP or Multilayer Perceptrons. Globally, hospitals are dealing with cases related to cardiovascular diseases and heart failure as they are major causes of death, not only for overweight individuals but also for those who do not adopt a healthy diet and lifestyle. Often, heart failures and cardiovascular diseases can be caused by many factors, including cardiomyopathy, high blood pressure, coronary heart disease, and heart inflammation [1]. Other factors, such as irregular shocks or stress, can also contribute to heart failure or a heart attack. While these events cannot be predicted, continuous data from patients’ health can help doctors predict heart failure. Therefore, this data-driven research utilizes advanced machine learning and deep learning techniques to better analyze and manipulate the data, providing doctors with informative decision-making tools regarding a person’s likelihood of experiencing heart failure. In this paper, the author employed advanced data preprocessing and cleaning techniques. Additionally, the dataset underwent testing using two different methodologies to determine the most effective machine-learning technique for producing optimal predictions. The first methodology involved employing a list of supervised classification machine learning algorithms, including Naïve Bayes (NB), KNN, logistic regression, and the SVM algorithm. The second methodology utilized a deep learning (DL) algorithm known as Multilayer Perceptrons (MLPs). This algorithm provided the author with the flexibility to experiment with different layer sizes and activation functions, such as ReLU, logistic (sigmoid), and Tanh. Both methodologies produced optimal models with high-level accuracy rates. The first methodology involves a list of supervised machine learning algorithms, including KNN, SVM, Adaboost, Logistic Regression, Naive Bayes, and Decision Tree algorithms. They achieved accuracy rates of 86%, 89%, 89%, 81%, 79%, and 99%, respectively. The author clearly explained that Decision Tree algorithm is not suitable for the dataset at hand due to overfitting issues. Therefore, it was discarded as an optimal model to be used. However, the latter methodology (Neural Network) demonstrated the most stable and optimal accuracy, achieving over 87% accuracy while adapting well to real-life situations and requiring low computing power overall. A performance assessment and evaluation were carried out based on a confusion matrix report to demonstrate feasibility and performance. The author concluded that the performance of the model in real-life situations can advance not only the medical field of science but also mathematical concepts. Additionally, the advanced preprocessing approach behind the model can provide value to the Data Science community. The model can be further developed by employing various optimization techniques to handle even larger datasets related to heart failures. Furthermore, different neural network algorithms can be tested to explore alternative approaches and yield different results.
基金This work is partially supported by the Vice President for Research and Partnerships of the University of Oklahoma,the Data Institute for Societal Challenges,and the Stephenson Cancer Center through DISC/SCC Seed Grant Award.
文摘This research addresses the challenges of training large semantic segmentation models for image analysis,focusing on expediting the annotation process and mitigating imbalanced datasets.In the context of imbalanced datasets,biases related to age and gender in clinical contexts and skewed representation in natural images can affect model performance.Strategies to mitigate these biases are explored to enhance efficiency and accuracy in semantic segmentation analysis.An in-depth exploration of various reinforced active learning methodologies for image segmentation is conducted,optimizing precision and efficiency across diverse domains.The proposed framework integrates Dueling Deep Q-Networks(DQN),Prioritized Experience Replay,Noisy Networks,and Emphasizing Recent Experience.Extensive experimentation and evaluation of diverse datasets reveal both improvements and limitations associated with various approaches in terms of overall accuracy and efficiency.This research contributes to the expansion of reinforced active learning methodologies for image segmentation,paving the way for more sophisticated and precise segmentation algorithms across diverse domains.The findings emphasize the need for a careful balance between exploration and exploitation strategies in reinforcement learning for effective image segmentation.
文摘Background and Objective:Social media(SoMe)has emerged as a tool in health professions education(HPE),particularly amidst the challenges posed by the coronavirus disease 2019(COVID-19)pandemic.Despite the academia’s initial skepticism SoMe has been gaining traction in supporting learning communities,and offering opportunities for innovation in HPE.Our study aims to explore the integration of SoMe in HPE.Four key components were outlined as necessary for a successful integration,and include designing learning experiences,defining educator roles,selecting appropriate platforms,and establishing educational objectives.Methods:This article stemmed from the online Teaching Skills Series module on SoMe in education from the Ophthalmology Foundation,and drew upon evidence supporting learning theories relevant to SoMe integration and models of education.Additionally,we conducted a literature review considering Englishlanguage articles on the application of SoMe in ophthalmology from PubMed over the past decade.Key Content and Findings:Early adopters of SoMe platforms in HPE have leveraged these tools to enhance learning experiences through interaction,dialogue,content sharing,and active learning strategies.By integrating SoMe into educational programs,both online and in-person,educators can overcome time and geographical constraints,fostering more diverse and inclusive learning communities.Careful consideration is,however,necessary to address potential limitations within HPE.Conclusions:This article lays groundwork for expanding SoMe integration in HPE design,emphasizing the supportive scaffold of various learning theories,and the need of furthering robust research on examining its advantages over traditional educational formats.Our literature review underscores an ongoing multifaceted,random application of SoMe platforms in ophthalmology education.We advocate for an effective incorporation of SoMe in HPE education,with the need to comply with good educational practice.
基金supported by the National Key R&D Program of China(2019YFB2103202).
文摘Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurations.This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms.Initially,an improved active learning approach is employed to select the most valuable unlabeled samples,which are subsequently submitted for expert labeling.This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set.Then the labeled samples are utilized to train the model for network configuration entity extraction.Furthermore,the multi-head self-attention of the transformer model is enhanced by introducing the Adaptive Weighting method based on the Laplace mixture distribution.This enhancement enables the transformer model to dynamically adapt its focus to words in various positions,displaying exceptional adaptability to abnormal data and further elevating the accuracy of the proposed model.Through comparisons with Random Sampling(RANDOM),Maximum Normalized Log-Probability(MNLP),Least Confidence(LC),Token Entrop(TE),and Entropy Query by Bagging(EQB),the proposed method,Entropy Query by Bagging and Maximum Influence Active Learning(EQBMIAL),achieves comparable performance with only 40% of the samples on both datasets,while other algorithms require 50% of the samples.Furthermore,the entity extraction algorithm with the Adaptive Weighted Multi-head Attention mechanism(AW-MHA)is compared with BILSTM-CRF,Mutil_Attention-Bilstm-Crf,Deep_Neural_Model_NER and BERT_Transformer,achieving precision rates of 75.98% and 98.32% on the two datasets,respectively.Statistical tests demonstrate the statistical significance and effectiveness of the proposed algorithms in this paper.
文摘This paper describes a new method for active learning in content-based image retrieval. The proposed method firstly uses support vector machine (SVM) classifiers to learn an initial query concept. Then the proposed active learning scheme employs similarity measure to check the current version space and selects images with maximum expected information gain to solicit user's label. Finally, the learned query is refined based on the user's further feedback. With the combination of SVM classifier and similarity measure, the proposed method can alleviate model bias existing in each of them. Our experiments on several query concepts show that the proposed method can learn the user's query concept quickly and effectively only with several iterations.
文摘This paper explores the integration of the bridge-in,objectives,pre-assessment,participatory activities,post-assessment and summary(BOPPPS)teaching model within the context of the post-graduates Academic English course.It discusses how this structured approach can effectively enhance students’language proficiency,foster critical thinking skills,and align with the multifaceted objectives of advanced English language education.The study provides a detailed examination of each BOPPPS component as applied to the post-graduates Academic English curriculum,supported by theoretical underpinnings and practical implications.
文摘This study investigates the efficacy of the Mathematics Independent Learning Activity Practice and Play Unite Scheme(MILAPlus)as an instructional strategy to improve the proficiency levels of Grade 9 students in quadratic equations and functions through a study carried out at Quezon National High School.The research involved 116 Grade 9 students and utilized a quantitative approach,incorporating both pre-assessment and post-assessment measures.The research utilizes a quasi-experimental design,examining the academic performance of students before and after the introduction of MILAPlus.The pre-assessment establishes a baseline,and the subsequent post-assessment measures the impact of the instructional strategy.Statistical analyses,including t-tests,assess the significance of differences in mean scores and mean percentage scores,providing quantitative insights into the effectiveness of MILAPlus.Findings from the study revealed a statistically significant improvement in both mean scores and mean percentage scores after the utilization of MILAPlus,indicating enhanced proficiency in quadratic equations and functions.The Mean Proficiency Scores(MPS)also showed a substantial increase,demonstrating a marked improvement in overall proficiency levels among Grade 9 students.In light of the results,recommendations were given including the continued utilization of MILAPlus as an instructional strategy and aligning its development with prescribed learning competencies.Emphasizing the consistent adherence to policies and guidelines for MILAPlus implementation is suggested for sustaining positive effects on students’long-term performance in mathematics.This research contributes valuable insights into the practical application and effectiveness of MILAPlus within the context of Grade 9 mathematics education at Quezon National High School.
基金the National Basic Research Program (973) of China (No. 2004CB719401)the National Research Foundation for the Doctoral Program of Higher Education of China (No.20060003060)
文摘In this paper, we present a novel Support Vector Machine active learning algorithm for effective 3D model retrieval using the concept of relevance feedback. The proposed method learns from the most informative objects which are marked by the user, and then creates a boundary separating the relevant models from irrelevant ones. What it needs is only a small number of 3D models labelled by the user. It can grasp the user's semantic knowledge rapidly and accurately. Experimental results showed that the proposed algorithm significantly improves the retrieval effectiveness. Compared with four state-of-the-art query refinement schemes for 3D model retrieval, it provides superior retrieval performance after no more than two rounds of relevance feedback.