In healthcare,the persistent challenge of arrhythmias,a leading cause of global mortality,has sparked extensive research into the automation of detection using machine learning(ML)algorithms.However,traditional ML and...In healthcare,the persistent challenge of arrhythmias,a leading cause of global mortality,has sparked extensive research into the automation of detection using machine learning(ML)algorithms.However,traditional ML and AutoML approaches have revealed their limitations,notably regarding feature generalization and automation efficiency.This glaring research gap has motivated the development of AutoRhythmAI,an innovative solution that integrates both machine and deep learning to revolutionize the diagnosis of arrhythmias.Our approach encompasses two distinct pipelines tailored for binary-class and multi-class arrhythmia detection,effectively bridging the gap between data preprocessing and model selection.To validate our system,we have rigorously tested AutoRhythmAI using a multimodal dataset,surpassing the accuracy achieved using a single dataset and underscoring the robustness of our methodology.In the first pipeline,we employ signal filtering and ML algorithms for preprocessing,followed by data balancing and split for training.The second pipeline is dedicated to feature extraction and classification,utilizing deep learning models.Notably,we introduce the‘RRI-convoluted trans-former model’as a novel addition for binary-class arrhythmias.An ensemble-based approach then amalgamates all models,considering their respective weights,resulting in an optimal model pipeline.In our study,the VGGRes Model achieved impressive results in multi-class arrhythmia detection,with an accuracy of 97.39%and firm performance in precision(82.13%),recall(31.91%),and F1-score(82.61%).In the binary-class task,the proposed model achieved an outstanding accuracy of 96.60%.These results highlight the effectiveness of our approach in improving arrhythmia detection,with notably high accuracy and well-balanced performance metrics.展开更多
Long-term time series forecasting stands as a crucial research domain within the realm of automated machine learning(AutoML).At present,forecasting,whether rooted in machine learning or statistical learning,typically ...Long-term time series forecasting stands as a crucial research domain within the realm of automated machine learning(AutoML).At present,forecasting,whether rooted in machine learning or statistical learning,typically relies on expert input and necessitates substantial manual involvement.This manual effort spans model development,feature engineering,hyper-parameter tuning,and the intricate construction of time series models.The complexity of these tasks renders complete automation unfeasible,as they inherently demand human intervention at multiple junctures.To surmount these challenges,this article proposes leveraging Long Short-Term Memory,which is the variant of Recurrent Neural Networks,harnessing memory cells and gating mechanisms to facilitate long-term time series prediction.However,forecasting accuracy by particular neural network and traditional models can degrade significantly,when addressing long-term time-series tasks.Therefore,our research demonstrates that this innovative approach outperforms the traditional Autoregressive Integrated Moving Average(ARIMA)method in forecasting long-term univariate time series.ARIMA is a high-quality and competitive model in time series prediction,and yet it requires significant preprocessing efforts.Using multiple accuracy metrics,we have evaluated both ARIMA and proposed method on the simulated time-series data and real data in both short and long term.Furthermore,our findings indicate its superiority over alternative network architectures,including Fully Connected Neural Networks,Convolutional Neural Networks,and Nonpooling Convolutional Neural Networks.Our AutoML approach enables non-professional to attain highly accurate and effective time series forecasting,and can be widely applied to various domains,particularly in business and finance.展开更多
Landslide hazard mapping is essential for regional landslide hazard management.The main objective of this study is to construct a rainfall-induced landslide hazard map of Luhe County,China based on an automated machin...Landslide hazard mapping is essential for regional landslide hazard management.The main objective of this study is to construct a rainfall-induced landslide hazard map of Luhe County,China based on an automated machine learning framework(AutoGluon).A total of 2241 landslides were identified from satellite images before and after the rainfall event,and 10 impact factors including elevation,slope,aspect,normalized difference vegetation index(NDVI),topographic wetness index(TWI),lithology,land cover,distance to roads,distance to rivers,and rainfall were selected as indicators.The WeightedEnsemble model,which is an ensemble of 13 basic machine learning models weighted together,was used to output the landslide hazard assessment results.The results indicate that landslides mainly occurred in the central part of the study area,especially in Hetian and Shanghu.Totally 102.44 s were spent to train all the models,and the ensemble model WeightedEnsemble has an Area Under the Curve(AUC)value of92.36%in the test set.In addition,14.95%of the study area was determined to be at very high hazard,with a landslide density of 12.02 per square kilometer.This study serves as a significant reference for the prevention and mitigation of geological hazards and land use planning in Luhe County.展开更多
By identifying and responding to any malicious behavior that could endanger the system,the Intrusion Detection System(IDS)is crucial for preserving the security of the Industrial Internet of Things(IIoT)network.The be...By identifying and responding to any malicious behavior that could endanger the system,the Intrusion Detection System(IDS)is crucial for preserving the security of the Industrial Internet of Things(IIoT)network.The benefit of anomaly-based IDS is that they are able to recognize zeroday attacks due to the fact that they do not rely on a signature database to identify abnormal activity.In order to improve control over datasets and the process,this study proposes using an automated machine learning(AutoML)technique to automate the machine learning processes for IDS.Our groundbreaking architecture,known as AID4I,makes use of automatic machine learning methods for intrusion detection.Through automation of preprocessing,feature selection,model selection,and hyperparameter tuning,the objective is to identify an appropriate machine learning model for intrusion detection.Experimental studies demonstrate that the AID4I framework successfully proposes a suitablemodel.The integrity,security,and confidentiality of data transmitted across the IIoT network can be ensured by automating machine learning processes in the IDS to enhance its capacity to identify and stop threatening activities.With a comprehensive solution that takes advantage of the latest advances in automated machine learning methods to improve network security,AID4I is a powerful and effective instrument for intrusion detection.In preprocessing module,three distinct imputation methods are utilized to handle missing data,ensuring the robustness of the intrusion detection system in the presence of incomplete information.Feature selection module adopts a hybrid approach that combines Shapley values and genetic algorithm.The Parameter Optimization module encompasses a diverse set of 14 classification methods,allowing for thorough exploration and optimization of the parameters associated with each algorithm.By carefully tuning these parameters,the framework enhances its adaptability and accuracy in identifying potential intrusions.Experimental results demonstrate that the AID4I framework can achieve high levels of accuracy in detecting network intrusions up to 14.39%on public datasets,outperforming traditional intrusion detection methods while concurrently reducing the elapsed time for training and testing.展开更多
Epilepsy is a common neurological disease and severely affects the daily life of patients.The automatic detection and diagnosis system of epilepsy based on electroencephalogram(EEG)is of great significance to help pat...Epilepsy is a common neurological disease and severely affects the daily life of patients.The automatic detection and diagnosis system of epilepsy based on electroencephalogram(EEG)is of great significance to help patients with epilepsy return to normal life.With the development of deep learning technology and the increase in the amount of EEG data,the performance of deep learning based automatic detection algorithm for epilepsy EEG has gradually surpassed the traditional hand-crafted approaches.However,the neural architecture design for epilepsy EEG analysis is time-consuming and laborious,and the designed structure is difficult to adapt to the changing EEG collection environment,which limits the application of the epilepsy EEG automatic detection system.In this paper,we explore the possibility of Automated Machine Learning(AutoML)playing a role in the task of epilepsy EEG detection.We apply the neural architecture search(NAS)algorithm in the AutoKeras platform to design the model for epilepsy EEG analysis and utilize feature interpretability methods to ensure the reliability of the searched model.The experimental results show that the model obtained through NAS outperforms the baseline model in performance.The searched model improves classification accuracy,F1-score and Cohen’s kappa coefficient by 7.68%,7.82%and 9.60%respectively than the baseline model.Furthermore,NASbased model is capable of extracting EEG features related to seizures for classification.展开更多
File labeling techniques have a long history in analyzing the anthological trends in computational linguistics.The situation becomes worse in the case of files downloaded into systems from the Internet.Currently,most ...File labeling techniques have a long history in analyzing the anthological trends in computational linguistics.The situation becomes worse in the case of files downloaded into systems from the Internet.Currently,most users either have to change file names manually or leave a meaningless name of the files,which increases the time to search required files and results in redundancy and duplications of user files.Currently,no significant work is done on automated file labeling during the organization of heterogeneous user files.A few attempts have been made in topic modeling.However,one major drawback of current topic modeling approaches is better results.They rely on specific language types and domain similarity of the data.In this research,machine learning approaches have been employed to analyze and extract the information from heterogeneous corpus.A different file labeling technique has also been used to get the meaningful and`cohesive topic of the files.The results show that the proposed methodology can generate relevant and context-sensitive names for heterogeneous data files and provide additional insight into automated file labeling in operating systems.展开更多
The influence of a deep excavation on existing shield tunnels nearby is a vital issue in tunnelling engineering.Whereas,there lacks robust methods to predict excavation-induced tunnel displacements.In this study,an au...The influence of a deep excavation on existing shield tunnels nearby is a vital issue in tunnelling engineering.Whereas,there lacks robust methods to predict excavation-induced tunnel displacements.In this study,an auto machine learning(AutoML)-based approach is proposed to precisely solve the issue.Seven input parameters are considered in the database covering two physical aspects,namely soil property,and spatial characteristics of the deep excavation.The 10-fold cross-validation method is employed to overcome the scarcity of data,and promote model’s robustness.Six genetic algorithm(GA)-ML models are established as well for comparison.The results indicated that the proposed AutoML model is a comprehensive model that integrates efficiency and robustness.Importance analysis reveals that the ratio of the average shear strength to the vertical effective stress E_(ur)/σ′_(v),the excavation depth H,and the excavation width B are the most influential variables for the displacements.Finally,the AutoML model is further validated by practical engineering.The prediction results are in a good agreement with monitoring data,signifying that our model can be applied in real projects.展开更多
Diabetic retinopathy (DR) is a retinal disease that causes irreversible blindness.DR occurs due to the high blood sugar level of the patient, and it is clumsy tobe detected at an early stage as no early symptoms appea...Diabetic retinopathy (DR) is a retinal disease that causes irreversible blindness.DR occurs due to the high blood sugar level of the patient, and it is clumsy tobe detected at an early stage as no early symptoms appear at the initial level. To preventblindness, early detection and regular treatment are needed. Automated detectionbased on machine intelligence may assist the ophthalmologist in examining thepatients’ condition more accurately and efficiently. The purpose of this study is toproduce an automated screening system for recognition and grading of diabetic retinopathyusing machine learning through deep transfer and representational learning.The artificial intelligence technique used is transfer learning on the deep neural network,Inception-v4. Two configuration variants of transfer learning are applied onInception-v4: Fine-tune mode and fixed feature extractor mode. Both configurationmodes have achieved decent accuracy values, but the fine-tuning method outperformsthe fixed feature extractor configuration mode. Fine-tune configuration modehas gained 96.6% accuracy in early detection of DR and 97.7% accuracy in gradingthe disease and has outperformed the state of the art methods in the relevant literature.展开更多
The continuous growth in the scale of unmanned aerial vehicle (UAV) applications in transmission line inspection has resulted in a corresponding increase in the demand for UAV inspection image processing. Owing to its...The continuous growth in the scale of unmanned aerial vehicle (UAV) applications in transmission line inspection has resulted in a corresponding increase in the demand for UAV inspection image processing. Owing to its excellent performance in computer vision, deep learning has been applied to UAV inspection image processing tasks such as power line identification and insulator defect detection. Despite their excellent performance, electric power UAV inspection image processing models based on deep learning face several problems such as a small application scope, the need for constant retraining and optimization, and high R&D monetary and time costs due to the black-box and scene data-driven characteristics of deep learning. In this study, an automated deep learning system for electric power UAV inspection image analysis and processing is proposed as a solution to the aforementioned problems. This system design is based on the three critical design principles of generalizability, extensibility, and automation. Pre-trained models, fine-tuning (downstream task adaptation), and automated machine learning, which are closely related to these design principles, are reviewed. In addition, an automated deep learning system architecture for electric power UAV inspection image analysis and processing is presented. A prototype system was constructed and experiments were conducted on the two electric power UAV inspection image analysis and processing tasks of insulator self-detonation and bird nest recognition. The models constructed using the prototype system achieved 91.36% and 86.13% mAP for insulator self-detonation and bird nest recognition, respectively. This demonstrates that the system design concept is reasonable and the system architecture feasible .展开更多
AutoML(Automated Machine Learning)is an emerging field that aims to automate the process of building machine learning models.AutoML emerged to increase productivity and efficiency by automating as much as possible the...AutoML(Automated Machine Learning)is an emerging field that aims to automate the process of building machine learning models.AutoML emerged to increase productivity and efficiency by automating as much as possible the inefficient work that occurs while repeating this process whenever machine learning is applied.In particular,research has been conducted for a long time on technologies that can effectively develop high-quality models by minimizing the intervention of model developers in the process from data preprocessing to algorithm selection and tuning.In this semantic review research,we summarize the data processing requirements for AutoML approaches and provide a detailed explanation.We place greater emphasis on neural architecture search(NAS)as it currently represents a highly popular sub-topic within the field of AutoML.NAS methods use machine learning algorithms to search through a large space of possible architectures and find the one that performs best on a given task.We provide a summary of the performance achieved by representative NAS algorithms on the CIFAR-10,CIFAR-100,ImageNet and wellknown benchmark datasets.Additionally,we delve into several noteworthy research directions in NAS methods including one/two-stage NAS,one-shot NAS and joint hyperparameter with architecture optimization.We discussed how the search space size and complexity in NAS can vary depending on the specific problem being addressed.To conclude,we examine several open problems(SOTA problems)within current AutoML methods that assure further investigation in future research.展开更多
BACKGROUND Digital pathology image(DPI)analysis has been developed by machine learning(ML)techniques.However,little attention has been paid to the reproducibility of ML-based histological classification in heterochron...BACKGROUND Digital pathology image(DPI)analysis has been developed by machine learning(ML)techniques.However,little attention has been paid to the reproducibility of ML-based histological classification in heterochronously obtained DPIs of the same hematoxylin and eosin(HE)slide.AIM To elucidate the frequency and preventable causes of discordant classification results of DPI analysis using ML for the heterochronously obtained DPIs.METHODS We created paired DPIs by scanning 298 HE stained slides containing 584 tissues twice with a virtual slide scanner.The paired DPIs were analyzed by our MLaided classification model.We defined non-flipped and flipped groups as the paired DPIs with concordant and discordant classification results,respectively.We compared differences in color and blur between the non-flipped and flipped groups by L1-norm and a blur index,respectively.RESULTS We observed discordant classification results in 23.1%of the paired DPIs obtained by two independent scans of the same microscope slide.We detected no significant difference in the L1-norm of each color channel between the two groups;however,the flipped group showed a significantly higher blur index than the non-flipped group.CONCLUSION Our results suggest that differences in the blur-not the color-of the paired DPIs may cause discordant classification results.An ML-aided classification model for DPI should be tested for this potential cause of the reduced reproducibility of the model.In a future study,a slide scanner and/or a preprocessing method of minimizing DPI blur should be developed.展开更多
Groundwater contamination source identification(GCSI)is a prerequisite for contamination risk evaluation and efficient groundwater contamination remediation programs.The boundary condition generally is set as known va...Groundwater contamination source identification(GCSI)is a prerequisite for contamination risk evaluation and efficient groundwater contamination remediation programs.The boundary condition generally is set as known variables in previous GCSI studies.However,in many practical cases,the boundary condition is complicated and cannot be estimated accurately in advance.Setting the boundary condition as known variables may seriously deviate from the actual situation and lead to distorted identification results.And the results of GCSI are affected by multiple factors,including contaminant source information,model parameters,boundary condition,etc.Therefore,if the boundary condition is not estimated accurately,other factors will also be estimated inaccurately.This study focuses on the unknown boundary condition and proposed to identify three types of unknown variables(contaminant source information,model parameters and boundary condition)innovatively.When simulation-optimization(S-O)method is applied to GCSI,the huge computational load is usually reduced by building surrogate models.However,when building surrogate models,the researchers need to select the models and optimize the hyperparameters to make the model powerful,which can be a lengthy process.The automated machine learning(AutoML)method was used to build surrogate model,which automates the model selection and hyperparameter optimization in machine learning engineering,largely reducing human operations and saving time.The accuracy of AutoML surrogate model is compared with the surrogate model used in eXtreme Gradient Boosting method(XGBoost),random forest method(RF),extra trees regressor method(ETR)and elasticnet method(EN)respectively,which are automatically selected in AutoML engineering.The results show that the surrogate model constructed by AutoML method has the best accuracy compared with the other four methods.This study provides reliable and strong support for GCSI.展开更多
We use an automated research platform combined with machine learning to assess and understand the resilience against air and light during production of organic photovoltaic(OPV)devices from over 40 donor and acceptor ...We use an automated research platform combined with machine learning to assess and understand the resilience against air and light during production of organic photovoltaic(OPV)devices from over 40 donor and acceptor combina-tions.The standardized protocol and high reproducibility of the platform results in a dataset of high variety and veracity to deploy machine learning models to encounter links between stability and chemical,energetic,and mor-phological structure.We find that the strongest predictor for air/light resilience during production is the effective gap Eg,eff which points to singlet oxygen rather than the superoxide anion being the dominant agent in degradation under processing conditions.A similarly good prediction of air/light resilience can also be achieved by considering only features from chemical structure,that is,information which is available prior to any experimentation.展开更多
A novel radar-based system for longwall coal mine machine localisation is described. The system, based on a radar-ranging sensor and designed to localise mining equipment with respect to the mine tunnel gate road infr...A novel radar-based system for longwall coal mine machine localisation is described. The system, based on a radar-ranging sensor and designed to localise mining equipment with respect to the mine tunnel gate road infrastructure, is developed and trialled in an underground coal mine. The challenges of reliable sensing in the mine environment are considered, and the use of a radar sensor for localisation is justified. The difficulties of achieving reliable positioning using only the radar sensor are examined. Several probabilistic data processing techniques are explored in order to estimate two key localisation parameters from a single radar signal, namely along-track position and across-track position, with respect to the gate road structures. For the case of across-track position, a conventional Kalman filter approach is sufficient to achieve a reliable estimate. However for along-track position estimation, specific infrastructure elements on the gate road rib-wall must be identified by a tracking algorithm. Due to complexities associated with this data processing problem, a novel visual analytics approach was explored in a 3D interactive display to facilitate identification of significant features for use in a classifier algorithm. Based on the classifier output, identified elements are used as location waypoints to provide a robust and accurate mining equipment localisation estimate.展开更多
Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,lo...Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,low accuracy,and inconsistent performance concerning data size and structure.To address these challenges,a novel clustering algorithm called the fully automated density-based clustering method(FADBC)is proposed.The FADBC method consists of two stages:parameter selection and cluster extraction.In the first stage,a proposed method extracts optimal parameters for the dataset,including the epsilon size and a minimum number of points thresholds.These parameters are then used in a density-based technique to scan each point in the dataset and evaluate neighborhood densities to find clusters.The proposed method was evaluated on different benchmark datasets andmetrics,and the experimental results demonstrate its competitive performance without requiring manual inputs.The results show that the FADBC method outperforms well-known clustering methods such as the agglomerative hierarchical method,k-means,spectral clustering,DBSCAN,FCDCSD,Gaussian mixtures,and density-based spatial clustering methods.It can handle any kind of data set well and perform excellently.展开更多
Current successes in artificial intelligence domain have revitalized interest in spacecraft pursuit-evasion game,which is an interception problem with a non-cooperative maneuvering target.The paper presents an automat...Current successes in artificial intelligence domain have revitalized interest in spacecraft pursuit-evasion game,which is an interception problem with a non-cooperative maneuvering target.The paper presents an automated machine learning(AutoML)based method to generate optimal trajectories in long-distance scenarios.Compared with conventional deep neural network(DNN)methods,the proposed method dramatically reduces the reliance on manual intervention and machine learning expertise.Firstly,based on differential game theory and costate normalization technique,the trajectory optimization problem is formulated under the assumption of continuous thrust.Secondly,the AutoML technique based on sequential model-based optimization(SMBO)framework is introduced to automate DNN design in deep learning process.If recommended DNN architecture exists,the tree-structured Parzen estimator(TPE)is used,otherwise the efficient neural architecture search(NAS)with network morphism is used.Thus,a novel trajectory optimization method with high computational efficiency is achieved.Finally,numerical results demonstrate the feasibility and efficiency of the proposed method.展开更多
文摘In healthcare,the persistent challenge of arrhythmias,a leading cause of global mortality,has sparked extensive research into the automation of detection using machine learning(ML)algorithms.However,traditional ML and AutoML approaches have revealed their limitations,notably regarding feature generalization and automation efficiency.This glaring research gap has motivated the development of AutoRhythmAI,an innovative solution that integrates both machine and deep learning to revolutionize the diagnosis of arrhythmias.Our approach encompasses two distinct pipelines tailored for binary-class and multi-class arrhythmia detection,effectively bridging the gap between data preprocessing and model selection.To validate our system,we have rigorously tested AutoRhythmAI using a multimodal dataset,surpassing the accuracy achieved using a single dataset and underscoring the robustness of our methodology.In the first pipeline,we employ signal filtering and ML algorithms for preprocessing,followed by data balancing and split for training.The second pipeline is dedicated to feature extraction and classification,utilizing deep learning models.Notably,we introduce the‘RRI-convoluted trans-former model’as a novel addition for binary-class arrhythmias.An ensemble-based approach then amalgamates all models,considering their respective weights,resulting in an optimal model pipeline.In our study,the VGGRes Model achieved impressive results in multi-class arrhythmia detection,with an accuracy of 97.39%and firm performance in precision(82.13%),recall(31.91%),and F1-score(82.61%).In the binary-class task,the proposed model achieved an outstanding accuracy of 96.60%.These results highlight the effectiveness of our approach in improving arrhythmia detection,with notably high accuracy and well-balanced performance metrics.
文摘Long-term time series forecasting stands as a crucial research domain within the realm of automated machine learning(AutoML).At present,forecasting,whether rooted in machine learning or statistical learning,typically relies on expert input and necessitates substantial manual involvement.This manual effort spans model development,feature engineering,hyper-parameter tuning,and the intricate construction of time series models.The complexity of these tasks renders complete automation unfeasible,as they inherently demand human intervention at multiple junctures.To surmount these challenges,this article proposes leveraging Long Short-Term Memory,which is the variant of Recurrent Neural Networks,harnessing memory cells and gating mechanisms to facilitate long-term time series prediction.However,forecasting accuracy by particular neural network and traditional models can degrade significantly,when addressing long-term time-series tasks.Therefore,our research demonstrates that this innovative approach outperforms the traditional Autoregressive Integrated Moving Average(ARIMA)method in forecasting long-term univariate time series.ARIMA is a high-quality and competitive model in time series prediction,and yet it requires significant preprocessing efforts.Using multiple accuracy metrics,we have evaluated both ARIMA and proposed method on the simulated time-series data and real data in both short and long term.Furthermore,our findings indicate its superiority over alternative network architectures,including Fully Connected Neural Networks,Convolutional Neural Networks,and Nonpooling Convolutional Neural Networks.Our AutoML approach enables non-professional to attain highly accurate and effective time series forecasting,and can be widely applied to various domains,particularly in business and finance.
基金supported by the State Administration of Science,Technology and Industry for National Defence,PRC(KJSP2020020303)the National Institute of Natural Hazards,Ministry of Emergency Management of China(ZDJ2021-12)。
文摘Landslide hazard mapping is essential for regional landslide hazard management.The main objective of this study is to construct a rainfall-induced landslide hazard map of Luhe County,China based on an automated machine learning framework(AutoGluon).A total of 2241 landslides were identified from satellite images before and after the rainfall event,and 10 impact factors including elevation,slope,aspect,normalized difference vegetation index(NDVI),topographic wetness index(TWI),lithology,land cover,distance to roads,distance to rivers,and rainfall were selected as indicators.The WeightedEnsemble model,which is an ensemble of 13 basic machine learning models weighted together,was used to output the landslide hazard assessment results.The results indicate that landslides mainly occurred in the central part of the study area,especially in Hetian and Shanghu.Totally 102.44 s were spent to train all the models,and the ensemble model WeightedEnsemble has an Area Under the Curve(AUC)value of92.36%in the test set.In addition,14.95%of the study area was determined to be at very high hazard,with a landslide density of 12.02 per square kilometer.This study serves as a significant reference for the prevention and mitigation of geological hazards and land use planning in Luhe County.
文摘By identifying and responding to any malicious behavior that could endanger the system,the Intrusion Detection System(IDS)is crucial for preserving the security of the Industrial Internet of Things(IIoT)network.The benefit of anomaly-based IDS is that they are able to recognize zeroday attacks due to the fact that they do not rely on a signature database to identify abnormal activity.In order to improve control over datasets and the process,this study proposes using an automated machine learning(AutoML)technique to automate the machine learning processes for IDS.Our groundbreaking architecture,known as AID4I,makes use of automatic machine learning methods for intrusion detection.Through automation of preprocessing,feature selection,model selection,and hyperparameter tuning,the objective is to identify an appropriate machine learning model for intrusion detection.Experimental studies demonstrate that the AID4I framework successfully proposes a suitablemodel.The integrity,security,and confidentiality of data transmitted across the IIoT network can be ensured by automating machine learning processes in the IDS to enhance its capacity to identify and stop threatening activities.With a comprehensive solution that takes advantage of the latest advances in automated machine learning methods to improve network security,AID4I is a powerful and effective instrument for intrusion detection.In preprocessing module,three distinct imputation methods are utilized to handle missing data,ensuring the robustness of the intrusion detection system in the presence of incomplete information.Feature selection module adopts a hybrid approach that combines Shapley values and genetic algorithm.The Parameter Optimization module encompasses a diverse set of 14 classification methods,allowing for thorough exploration and optimization of the parameters associated with each algorithm.By carefully tuning these parameters,the framework enhances its adaptability and accuracy in identifying potential intrusions.Experimental results demonstrate that the AID4I framework can achieve high levels of accuracy in detecting network intrusions up to 14.39%on public datasets,outperforming traditional intrusion detection methods while concurrently reducing the elapsed time for training and testing.
基金This work is supported by Fundamental Research Funds for the Central Universities(Grant No.FRF-TP-19-006A3).
文摘Epilepsy is a common neurological disease and severely affects the daily life of patients.The automatic detection and diagnosis system of epilepsy based on electroencephalogram(EEG)is of great significance to help patients with epilepsy return to normal life.With the development of deep learning technology and the increase in the amount of EEG data,the performance of deep learning based automatic detection algorithm for epilepsy EEG has gradually surpassed the traditional hand-crafted approaches.However,the neural architecture design for epilepsy EEG analysis is time-consuming and laborious,and the designed structure is difficult to adapt to the changing EEG collection environment,which limits the application of the epilepsy EEG automatic detection system.In this paper,we explore the possibility of Automated Machine Learning(AutoML)playing a role in the task of epilepsy EEG detection.We apply the neural architecture search(NAS)algorithm in the AutoKeras platform to design the model for epilepsy EEG analysis and utilize feature interpretability methods to ensure the reliability of the searched model.The experimental results show that the model obtained through NAS outperforms the baseline model in performance.The searched model improves classification accuracy,F1-score and Cohen’s kappa coefficient by 7.68%,7.82%and 9.60%respectively than the baseline model.Furthermore,NASbased model is capable of extracting EEG features related to seizures for classification.
文摘File labeling techniques have a long history in analyzing the anthological trends in computational linguistics.The situation becomes worse in the case of files downloaded into systems from the Internet.Currently,most users either have to change file names manually or leave a meaningless name of the files,which increases the time to search required files and results in redundancy and duplications of user files.Currently,no significant work is done on automated file labeling during the organization of heterogeneous user files.A few attempts have been made in topic modeling.However,one major drawback of current topic modeling approaches is better results.They rely on specific language types and domain similarity of the data.In this research,machine learning approaches have been employed to analyze and extract the information from heterogeneous corpus.A different file labeling technique has also been used to get the meaningful and`cohesive topic of the files.The results show that the proposed methodology can generate relevant and context-sensitive names for heterogeneous data files and provide additional insight into automated file labeling in operating systems.
基金supported by the National Natural Science Foundation of China(Grant Nos.51978517,52090082,and 52108381)Innovation Program of Shanghai Municipal Education Commission(Grant No.2019-01-07-00-07-456 E00051)Shanghai Science and Technology Committee Program(Grant Nos.21DZ1200601 and 20DZ1201404).
文摘The influence of a deep excavation on existing shield tunnels nearby is a vital issue in tunnelling engineering.Whereas,there lacks robust methods to predict excavation-induced tunnel displacements.In this study,an auto machine learning(AutoML)-based approach is proposed to precisely solve the issue.Seven input parameters are considered in the database covering two physical aspects,namely soil property,and spatial characteristics of the deep excavation.The 10-fold cross-validation method is employed to overcome the scarcity of data,and promote model’s robustness.Six genetic algorithm(GA)-ML models are established as well for comparison.The results indicated that the proposed AutoML model is a comprehensive model that integrates efficiency and robustness.Importance analysis reveals that the ratio of the average shear strength to the vertical effective stress E_(ur)/σ′_(v),the excavation depth H,and the excavation width B are the most influential variables for the displacements.Finally,the AutoML model is further validated by practical engineering.The prediction results are in a good agreement with monitoring data,signifying that our model can be applied in real projects.
基金the National Research Foundation(NRF)of Korea under the auspices of the Ministry of Science and ICT,Republic of Korea(Grant No.NRF-2020R1G1A1012741)received by M.R.Bhutta.https://nrf.kird.re.kr/main.do.
文摘Diabetic retinopathy (DR) is a retinal disease that causes irreversible blindness.DR occurs due to the high blood sugar level of the patient, and it is clumsy tobe detected at an early stage as no early symptoms appear at the initial level. To preventblindness, early detection and regular treatment are needed. Automated detectionbased on machine intelligence may assist the ophthalmologist in examining thepatients’ condition more accurately and efficiently. The purpose of this study is toproduce an automated screening system for recognition and grading of diabetic retinopathyusing machine learning through deep transfer and representational learning.The artificial intelligence technique used is transfer learning on the deep neural network,Inception-v4. Two configuration variants of transfer learning are applied onInception-v4: Fine-tune mode and fixed feature extractor mode. Both configurationmodes have achieved decent accuracy values, but the fine-tuning method outperformsthe fixed feature extractor configuration mode. Fine-tune configuration modehas gained 96.6% accuracy in early detection of DR and 97.7% accuracy in gradingthe disease and has outperformed the state of the art methods in the relevant literature.
基金This work was supported by Science and Technology Project of State Grid Corporation“Research on Key Technologies of Power Artificial Intelligence Open Platform”(5700-202155260A-0-0-00).
文摘The continuous growth in the scale of unmanned aerial vehicle (UAV) applications in transmission line inspection has resulted in a corresponding increase in the demand for UAV inspection image processing. Owing to its excellent performance in computer vision, deep learning has been applied to UAV inspection image processing tasks such as power line identification and insulator defect detection. Despite their excellent performance, electric power UAV inspection image processing models based on deep learning face several problems such as a small application scope, the need for constant retraining and optimization, and high R&D monetary and time costs due to the black-box and scene data-driven characteristics of deep learning. In this study, an automated deep learning system for electric power UAV inspection image analysis and processing is proposed as a solution to the aforementioned problems. This system design is based on the three critical design principles of generalizability, extensibility, and automation. Pre-trained models, fine-tuning (downstream task adaptation), and automated machine learning, which are closely related to these design principles, are reviewed. In addition, an automated deep learning system architecture for electric power UAV inspection image analysis and processing is presented. A prototype system was constructed and experiments were conducted on the two electric power UAV inspection image analysis and processing tasks of insulator self-detonation and bird nest recognition. The models constructed using the prototype system achieved 91.36% and 86.13% mAP for insulator self-detonation and bird nest recognition, respectively. This demonstrates that the system design concept is reasonable and the system architecture feasible .
文摘AutoML(Automated Machine Learning)is an emerging field that aims to automate the process of building machine learning models.AutoML emerged to increase productivity and efficiency by automating as much as possible the inefficient work that occurs while repeating this process whenever machine learning is applied.In particular,research has been conducted for a long time on technologies that can effectively develop high-quality models by minimizing the intervention of model developers in the process from data preprocessing to algorithm selection and tuning.In this semantic review research,we summarize the data processing requirements for AutoML approaches and provide a detailed explanation.We place greater emphasis on neural architecture search(NAS)as it currently represents a highly popular sub-topic within the field of AutoML.NAS methods use machine learning algorithms to search through a large space of possible architectures and find the one that performs best on a given task.We provide a summary of the performance achieved by representative NAS algorithms on the CIFAR-10,CIFAR-100,ImageNet and wellknown benchmark datasets.Additionally,we delve into several noteworthy research directions in NAS methods including one/two-stage NAS,one-shot NAS and joint hyperparameter with architecture optimization.We discussed how the search space size and complexity in NAS can vary depending on the specific problem being addressed.To conclude,we examine several open problems(SOTA problems)within current AutoML methods that assure further investigation in future research.
文摘BACKGROUND Digital pathology image(DPI)analysis has been developed by machine learning(ML)techniques.However,little attention has been paid to the reproducibility of ML-based histological classification in heterochronously obtained DPIs of the same hematoxylin and eosin(HE)slide.AIM To elucidate the frequency and preventable causes of discordant classification results of DPI analysis using ML for the heterochronously obtained DPIs.METHODS We created paired DPIs by scanning 298 HE stained slides containing 584 tissues twice with a virtual slide scanner.The paired DPIs were analyzed by our MLaided classification model.We defined non-flipped and flipped groups as the paired DPIs with concordant and discordant classification results,respectively.We compared differences in color and blur between the non-flipped and flipped groups by L1-norm and a blur index,respectively.RESULTS We observed discordant classification results in 23.1%of the paired DPIs obtained by two independent scans of the same microscope slide.We detected no significant difference in the L1-norm of each color channel between the two groups;however,the flipped group showed a significantly higher blur index than the non-flipped group.CONCLUSION Our results suggest that differences in the blur-not the color-of the paired DPIs may cause discordant classification results.An ML-aided classification model for DPI should be tested for this potential cause of the reduced reproducibility of the model.In a future study,a slide scanner and/or a preprocessing method of minimizing DPI blur should be developed.
基金supported by the National Natural Science Foundation of China (Grant Nos.42272283 and 41972252)the Graduate Innovation Fund of Jilin University (No.2022186).
文摘Groundwater contamination source identification(GCSI)is a prerequisite for contamination risk evaluation and efficient groundwater contamination remediation programs.The boundary condition generally is set as known variables in previous GCSI studies.However,in many practical cases,the boundary condition is complicated and cannot be estimated accurately in advance.Setting the boundary condition as known variables may seriously deviate from the actual situation and lead to distorted identification results.And the results of GCSI are affected by multiple factors,including contaminant source information,model parameters,boundary condition,etc.Therefore,if the boundary condition is not estimated accurately,other factors will also be estimated inaccurately.This study focuses on the unknown boundary condition and proposed to identify three types of unknown variables(contaminant source information,model parameters and boundary condition)innovatively.When simulation-optimization(S-O)method is applied to GCSI,the huge computational load is usually reduced by building surrogate models.However,when building surrogate models,the researchers need to select the models and optimize the hyperparameters to make the model powerful,which can be a lengthy process.The automated machine learning(AutoML)method was used to build surrogate model,which automates the model selection and hyperparameter optimization in machine learning engineering,largely reducing human operations and saving time.The accuracy of AutoML surrogate model is compared with the surrogate model used in eXtreme Gradient Boosting method(XGBoost),random forest method(RF),extra trees regressor method(ETR)and elasticnet method(EN)respectively,which are automatically selected in AutoML engineering.The results show that the surrogate model constructed by AutoML method has the best accuracy compared with the other four methods.This study provides reliable and strong support for GCSI.
基金The authors want to thank the Deutsche Forschungsge-meinschaft(DFG)for financial support in scope of the DFG INST 90/917-1 FUGGof the projects FKZ BR 4031/21-1 and BR 4031/22-1+6 种基金We also gratefully acknowledge the grants“ELF-PV-Design and development of solution processed functional materials for the next generations of PV technologies”(No.44-6521a/20/4)“Solar Factory of the Future”(FKZ 20.2-3410.5-4-5)by the Bavarian State GovernmentC.J.B.gratefully acknowledges the financial support through the Bavarian Initiative“Solar Technologies go Hybrid”(SolTech)and the SFB 953(DFG,project No.182849149)N.L.and C.J.B.acknowledge the financial support by the DFG research unit project“POPULAR”(FOR 5387,project no.461909888)X.D.thanks Natural Science Foundation of Shandong Province(2022HWYQ-012)N.L.acknowledges the financial support by the National Natural Science Foundation of China(52394273,52373179)L.Ding thanks the National Key Research and Development Program of China(2023YFE0116800)for financial support.
文摘We use an automated research platform combined with machine learning to assess and understand the resilience against air and light during production of organic photovoltaic(OPV)devices from over 40 donor and acceptor combina-tions.The standardized protocol and high reproducibility of the platform results in a dataset of high variety and veracity to deploy machine learning models to encounter links between stability and chemical,energetic,and mor-phological structure.We find that the strongest predictor for air/light resilience during production is the effective gap Eg,eff which points to singlet oxygen rather than the superoxide anion being the dominant agent in degradation under processing conditions.A similarly good prediction of air/light resilience can also be achieved by considering only features from chemical structure,that is,information which is available prior to any experimentation.
文摘A novel radar-based system for longwall coal mine machine localisation is described. The system, based on a radar-ranging sensor and designed to localise mining equipment with respect to the mine tunnel gate road infrastructure, is developed and trialled in an underground coal mine. The challenges of reliable sensing in the mine environment are considered, and the use of a radar sensor for localisation is justified. The difficulties of achieving reliable positioning using only the radar sensor are examined. Several probabilistic data processing techniques are explored in order to estimate two key localisation parameters from a single radar signal, namely along-track position and across-track position, with respect to the gate road structures. For the case of across-track position, a conventional Kalman filter approach is sufficient to achieve a reliable estimate. However for along-track position estimation, specific infrastructure elements on the gate road rib-wall must be identified by a tracking algorithm. Due to complexities associated with this data processing problem, a novel visual analytics approach was explored in a 3D interactive display to facilitate identification of significant features for use in a classifier algorithm. Based on the classifier output, identified elements are used as location waypoints to provide a robust and accurate mining equipment localisation estimate.
基金the Deanship of Scientific Research at Umm Al-Qura University,Grant Code:(23UQU4361009DSR001).
文摘Cluster analysis is a crucial technique in unsupervised machine learning,pattern recognition,and data analysis.However,current clustering algorithms suffer from the need for manual determination of parameter values,low accuracy,and inconsistent performance concerning data size and structure.To address these challenges,a novel clustering algorithm called the fully automated density-based clustering method(FADBC)is proposed.The FADBC method consists of two stages:parameter selection and cluster extraction.In the first stage,a proposed method extracts optimal parameters for the dataset,including the epsilon size and a minimum number of points thresholds.These parameters are then used in a density-based technique to scan each point in the dataset and evaluate neighborhood densities to find clusters.The proposed method was evaluated on different benchmark datasets andmetrics,and the experimental results demonstrate its competitive performance without requiring manual inputs.The results show that the FADBC method outperforms well-known clustering methods such as the agglomerative hierarchical method,k-means,spectral clustering,DBSCAN,FCDCSD,Gaussian mixtures,and density-based spatial clustering methods.It can handle any kind of data set well and perform excellently.
基金supported by the National Defense Science and Technology Innovation program(18-163-15-LZ-001-004-13).
文摘Current successes in artificial intelligence domain have revitalized interest in spacecraft pursuit-evasion game,which is an interception problem with a non-cooperative maneuvering target.The paper presents an automated machine learning(AutoML)based method to generate optimal trajectories in long-distance scenarios.Compared with conventional deep neural network(DNN)methods,the proposed method dramatically reduces the reliance on manual intervention and machine learning expertise.Firstly,based on differential game theory and costate normalization technique,the trajectory optimization problem is formulated under the assumption of continuous thrust.Secondly,the AutoML technique based on sequential model-based optimization(SMBO)framework is introduced to automate DNN design in deep learning process.If recommended DNN architecture exists,the tree-structured Parzen estimator(TPE)is used,otherwise the efficient neural architecture search(NAS)with network morphism is used.Thus,a novel trajectory optimization method with high computational efficiency is achieved.Finally,numerical results demonstrate the feasibility and efficiency of the proposed method.