In traditional medicine and ethnomedicine,medicinal plants have long been recognized as the basis for materials in therapeutic applications worldwide.In particular,the remarkable curative effect of traditional Chinese...In traditional medicine and ethnomedicine,medicinal plants have long been recognized as the basis for materials in therapeutic applications worldwide.In particular,the remarkable curative effect of traditional Chinese medicine during corona virus disease 2019(COVID-19)pandemic has attracted extensive attention globally.Medicinal plants have,therefore,become increasingly popular among the public.However,with increasing demand for and profit with medicinal plants,commercial fraudulent events such as adulteration or counterfeits sometimes occur,which poses a serious threat to the clinical outcomes and interests of consumers.With rapid advances in artificial intelligence,machine learning can be used to mine information on various medicinal plants to establish an ideal resource database.We herein present a review that mainly introduces common machine learning algorithms and discusses their application in multi-source data analysis of medicinal plants.The combination of machine learning algorithms and multi-source data analysis facilitates a comprehensive analysis and aids in the effective evaluation of the quality of medicinal plants.The findings of this review provide new possibilities for promoting the development and utilization of medicinal plants.展开更多
Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary w...Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary with a deformation condition.This study proposes a novel approach for accurately predicting an anisotropic deformation behavior of wrought Mg alloys using machine learning(ML)with data augmentation.The developed model combines four key strategies from data science:learning the entire flow curves,generative adversarial networks(GAN),algorithm-driven hyperparameter tuning,and gated recurrent unit(GRU)architecture.The proposed model,namely GAN-aided GRU,was extensively evaluated for various predictive scenarios,such as interpolation,extrapolation,and a limited dataset size.The model exhibited significant predictability and improved generalizability for estimating the anisotropic compressive behavior of ZK60 Mg alloys under 11 annealing conditions and for three loading directions.The GAN-aided GRU results were superior to those of previous ML models and constitutive equations.The superior performance was attributed to hyperparameter optimization,GAN-based data augmentation,and the inherent predictivity of the GRU for extrapolation.As a first attempt to employ ML techniques other than artificial neural networks,this study proposes a novel perspective on predicting the anisotropic deformation behaviors of wrought Mg alloys.展开更多
Accurate capacity estimation is of great importance for the reliable state monitoring,timely maintenance,and second-life utilization of lithium-ion batteries.Despite numerous works on battery capacity estimation using...Accurate capacity estimation is of great importance for the reliable state monitoring,timely maintenance,and second-life utilization of lithium-ion batteries.Despite numerous works on battery capacity estimation using laboratory datasets,most of them are applied to battery cells and lack satisfactory fidelity when extended to real-world electric vehicle(EV)battery packs.The challenges intensify for large-sized EV battery packs,where unpredictable operating profiles and low-quality data acquisition hinder precise capacity estimation.To fill the gap,this study introduces a novel data-driven battery pack capacity estimation method grounded in field data.The proposed approach begins by determining labeled capacity through an innovative combination of the inverse ampere-hour integral,open circuit voltage-based,and resistance-based correction methods.Then,multiple health features are extracted from incremental capacity curves,voltage curves,equivalent circuit model parameters,and operating temperature to thoroughly characterize battery aging behavior.A feature selection procedure is performed to determine the optimal feature set based on the Pearson correlation coefficient.Moreover,a convolutional neural network and bidirectional gated recurrent unit,enhanced by an attention mechanism,are employed to estimate the battery pack capacity in real-world EV applications.Finally,the proposed method is validated with a field dataset from two EVs,covering approximately 35,000 kilometers.The results demonstrate that the proposed method exhibits better estimation performance with an error of less than 1.1%compared to existing methods.This work shows great potential for accurate large-sized EV battery pack capacity estimation based on field data,which provides significant insights into reliable labeled capacity calculation,effective features extraction,and machine learning-enabled health diagnosis.展开更多
Leveraging the extraordinary phenomena of quantum superposition and quantum correlation,quantum computing offers unprecedented potential for addressing challenges beyond the reach of classical computers.This paper tac...Leveraging the extraordinary phenomena of quantum superposition and quantum correlation,quantum computing offers unprecedented potential for addressing challenges beyond the reach of classical computers.This paper tackles two pivotal challenges in the realm of quantum computing:firstly,the development of an effective encoding protocol for translating classical data into quantum states,a critical step for any quantum computation.Different encoding strategies can significantly influence quantum computer performance.Secondly,we address the need to counteract the inevitable noise that can hinder quantum acceleration.Our primary contribution is the introduction of a novel variational data encoding method,grounded in quantum regression algorithm models.By adapting the learning concept from machine learning,we render data encoding a learnable process.This allowed us to study the role of quantum correlation in data encoding.Through numerical simulations of various regression tasks,we demonstrate the efficacy of our variational data encoding,particularly post-learning from instructional data.Moreover,we delve into the role of quantum correlation in enhancing task performance,especially in noisy environments.Our findings underscore the critical role of quantum correlation in not only bolstering performance but also in mitigating noise interference,thus advancing the frontier of quantum computing.展开更多
With the popularization of the Internet and the development of technology,cyber threats are increasing day by day.Threats such as malware,hacking,and data breaches have had a serious impact on cybersecurity.The networ...With the popularization of the Internet and the development of technology,cyber threats are increasing day by day.Threats such as malware,hacking,and data breaches have had a serious impact on cybersecurity.The network security environment in the era of big data presents the characteristics of large amounts of data,high diversity,and high real-time requirements.Traditional security defense methods and tools have been unable to cope with the complex and changing network security threats.This paper proposes a machine-learning security defense algorithm based on metadata association features.Emphasize control over unauthorized users through privacy,integrity,and availability.The user model is established and the mapping between the user model and the metadata of the data source is generated.By analyzing the user model and its corresponding mapping relationship,the query of the user model can be decomposed into the query of various heterogeneous data sources,and the integration of heterogeneous data sources based on the metadata association characteristics can be realized.Define and classify customer information,automatically identify and perceive sensitive data,build a behavior audit and analysis platform,analyze user behavior trajectories,and complete the construction of a machine learning customer information security defense system.The experimental results show that when the data volume is 5×103 bit,the data storage integrity of the proposed method is 92%.The data accuracy is 98%,and the success rate of data intrusion is only 2.6%.It can be concluded that the data storage method in this paper is safe,the data accuracy is always at a high level,and the data disaster recovery performance is good.This method can effectively resist data intrusion and has high air traffic control security.It can not only detect all viruses in user data storage,but also realize integrated virus processing,and further optimize the security defense effect of user big data.展开更多
In Decentralized Machine Learning(DML)systems,system participants contribute their resources to assist others in developing machine learning solutions.Identifying malicious contributions in DML systems is challenging,...In Decentralized Machine Learning(DML)systems,system participants contribute their resources to assist others in developing machine learning solutions.Identifying malicious contributions in DML systems is challenging,which has led to the exploration of blockchain technology.Blockchain leverages its transparency and immutability to record the provenance and reliability of training data.However,storing massive datasets or implementing model evaluation processes on smart contracts incurs high computational costs.Additionally,current research on preventing malicious contributions in DML systems primarily focuses on protecting models from being exploited by workers who contribute incorrect or misleading data.However,less attention has been paid to the scenario where malicious requesters intentionally manipulate test data during evaluation to gain an unfair advantage.This paper proposes a transparent and accountable training data sharing method that securely shares data among potentially malicious system participants.First,we introduce a blockchain-based DML system architecture that supports secure training data sharing through the IPFS network.Second,we design a blockchain smart contract to transparently split training datasets into training and test datasets,respectively,without involving system participants.Under the system,transparent and accountable training data sharing can be achieved with attribute-based proxy re-encryption.We demonstrate the security analysis for the system,and conduct experiments on the Ethereum and IPFS platforms to show the feasibility and practicality of the system.展开更多
One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelli...One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelligence (AI) havebecome the basis for making strategic decisions in many sensitive areas, such as fraud detection, risk management,medical diagnosis, and counter-terrorism. However, there is still a need to assess how terrorist attacks are related,initiated, and detected. For this purpose, we propose a novel framework for classifying and predicting terroristattacks. The proposed framework posits that neglected text attributes included in the Global Terrorism Database(GTD) can influence the accuracy of the model’s classification of terrorist attacks, where each part of the datacan provide vital information to enrich the ability of classifier learning. Each data point in a multiclass taxonomyhas one or more tags attached to it, referred as “related tags.” We applied machine learning classifiers to classifyterrorist attack incidents obtained from the GTD. A transformer-based technique called DistilBERT extracts andlearns contextual features from text attributes to acquiremore information from text data. The extracted contextualfeatures are combined with the “key features” of the dataset and used to perform the final classification. Thestudy explored different experimental setups with various classifiers to evaluate the model’s performance. Theexperimental results show that the proposed framework outperforms the latest techniques for classifying terroristattacks with an accuracy of 98.7% using a combined feature set and extreme gradient boosting classifier.展开更多
Analyzing big data, especially medical data, helps to provide good health care to patients and face the risks of death. The COVID-19 pandemic has had a significant impact on public health worldwide, emphasizing the ne...Analyzing big data, especially medical data, helps to provide good health care to patients and face the risks of death. The COVID-19 pandemic has had a significant impact on public health worldwide, emphasizing the need for effective risk prediction models. Machine learning (ML) techniques have shown promise in analyzing complex data patterns and predicting disease outcomes. The accuracy of these techniques is greatly affected by changing their parameters. Hyperparameter optimization plays a crucial role in improving model performance. In this work, the Particle Swarm Optimization (PSO) algorithm was used to effectively search the hyperparameter space and improve the predictive power of the machine learning models by identifying the optimal hyperparameters that can provide the highest accuracy. A dataset with a variety of clinical and epidemiological characteristics linked to COVID-19 cases was used in this study. Various machine learning models, including Random Forests, Decision Trees, Support Vector Machines, and Neural Networks, were utilized to capture the complex relationships present in the data. To evaluate the predictive performance of the models, the accuracy metric was employed. The experimental findings showed that the suggested method of estimating COVID-19 risk is effective. When compared to baseline models, the optimized machine learning models performed better and produced better results.展开更多
As some recent information security legislation endowed users with unconditional rights to be forgotten by any trained machine learning model,personalised IoT service pro-viders have to put unlearning functionality in...As some recent information security legislation endowed users with unconditional rights to be forgotten by any trained machine learning model,personalised IoT service pro-viders have to put unlearning functionality into their consideration.The most straight-forward method to unlearn users'contribution is to retrain the model from the initial state,which is not realistic in high throughput applications with frequent unlearning requests.Though some machine unlearning frameworks have been proposed to speed up the retraining process,they fail to match decentralised learning scenarios.A decentralised unlearning framework called heterogeneous decentralised unlearning framework with seed(HDUS)is designed,which uses distilled seed models to construct erasable en-sembles for all clients.Moreover,the framework is compatible with heterogeneous on-device models,representing stronger scalability in real-world applications.Extensive experiments on three real-world datasets show that our HDUS achieves state-of-the-art performance.展开更多
Metallic alloys for a given application are usually designed to achieve the desired properties by devising experimentsbased on experience, thermodynamic and kinetic principles, and various modeling and simulation exer...Metallic alloys for a given application are usually designed to achieve the desired properties by devising experimentsbased on experience, thermodynamic and kinetic principles, and various modeling and simulation exercises.However, the influence of process parameters and material properties is often non-linear and non-colligative. Inrecent years, machine learning (ML) has emerged as a promising tool to dealwith the complex interrelation betweencomposition, properties, and process parameters to facilitate accelerated discovery and development of new alloysand functionalities. In this study, we adopt an ML-based approach, coupled with genetic algorithm (GA) principles,to design novel copper alloys for achieving seemingly contradictory targets of high strength and high electricalconductivity. Initially, we establish a correlation between the alloy composition (binary to multi-component) andthe target properties, namely, electrical conductivity and mechanical strength. Catboost, an ML model coupledwith GA, was used for this task. The accuracy of the model was above 93.5%. Next, for obtaining the optimizedcompositions the outputs fromthe initial model were refined by combining the concepts of data augmentation andPareto front. Finally, the ultimate objective of predicting the target composition that would deliver the desired rangeof properties was achieved by developing an advancedMLmodel through data segregation and data augmentation.To examine the reliability of this model, results were rigorously compared and verified using several independentdata reported in the literature. This comparison substantiates that the results predicted by our model regarding thevariation of conductivity and evolution ofmicrostructure and mechanical properties with composition are in goodagreement with the reports published in the literature.展开更多
Over the past two decades,machine learning techniques have been extensively used in predicting reservoir properties.While this approach has significantly contributed to the industry,selecting an appropriate model is s...Over the past two decades,machine learning techniques have been extensively used in predicting reservoir properties.While this approach has significantly contributed to the industry,selecting an appropriate model is still challenging for most researchers.Relying solely on statistical metrics to select the best model for a particular problem may not always be the most effective approach.This study encourages researchers to incorporate data visualization in their analysis and model selection process.To evaluate the suitability of different models in predicting horizontal permeability in the Volve field,wireline logs were used to train Extra-Trees,Ridge,Bagging,and XGBoost models.The Random Forest feature selection technique was applied to select the relevant logs as inputs for the models.Based on statistical metrics,the Extra-Trees model achieved the highest test accuracy of 0.996,RMSE of 19.54 mD,and MAE of 3.18 mD,with XGBoost coming in second.However,when the results were visualised,it was discovered that the XGBoost model was more suitable for the problem being tackled.The XGBoost model was a better predictor within the sandstone interval,while the Extra-Trees model was more appropriate in non-sandstone intervals.Since this study aims to predict permeability in the reservoir interval,the XGBoost model is the most suitable.These contrasting results demonstrate the importance of incorporating data visualisation techniques as an evaluation metric.Given the heterogeneity of the subsurface,relying solely on statistical metrics may not be sufficient to determine which model is best suited for a particular problem.展开更多
Broadcasting gateway equipment generally uses a method of simply switching to a spare input stream when a failure occurs in a main input stream.However,when the transmission environment is unstable,problems such as re...Broadcasting gateway equipment generally uses a method of simply switching to a spare input stream when a failure occurs in a main input stream.However,when the transmission environment is unstable,problems such as reduction in the lifespan of equipment due to frequent switching and interruption,delay,and stoppage of services may occur.Therefore,applying a machine learning(ML)method,which is possible to automatically judge and classify network-related service anomaly,and switch multi-input signals without dropping or changing signals by predicting or quickly determining the time of error occurrence for smooth stream switching when there are problems such as transmission errors,is required.In this paper,we propose an intelligent packet switching method based on the ML method of classification,which is one of the supervised learning methods,that presents the risk level of abnormal multi-stream occurring in broadcasting gateway equipment based on data.Furthermore,we subdivide the risk levels obtained from classification techniques into probabilities and then derive vectorized representative values for each attribute value of the collected input data and continuously update them.The obtained reference vector value is used for switching judgment through the cosine similarity value between input data obtained when a dangerous situation occurs.In the broadcasting gateway equipment to which the proposed method is applied,it is possible to perform more stable and smarter switching than before by solving problems of reliability and broadcasting accidents of the equipment and can maintain stable video streaming as well.展开更多
The Internet of Things(IoT)is growing rapidly and impacting almost every aspect of our lives,fromwearables and healthcare to security,traffic management,and fleet management systems.This has generated massive volumes ...The Internet of Things(IoT)is growing rapidly and impacting almost every aspect of our lives,fromwearables and healthcare to security,traffic management,and fleet management systems.This has generated massive volumes of data and security,and data privacy risks are increasing with the advancement of technology and network connections.Traditional access control solutions are inadequate for establishing access control in IoT systems to provide data protection owing to their vulnerability to single-point OF failure.Additionally,conventional privacy preservation methods have high latency costs and overhead for resource-constrained devices.Previous machine learning approaches were also unable to detect denial-of-service(DoS)attacks.This study introduced a novel decentralized and secure framework for blockchain integration.To avoid single-point OF failure,an accredited access control scheme is incorporated,combining blockchain with local peers to record each transaction and verify the signature to access.Blockchain-based attribute-based cryptography is implemented to protect data storage privacy by generating threshold parameters,managing keys,and revoking users on the blockchain.An innovative contract-based DOS attack mitigation method is also incorporated to effectively validate devices with intelligent contracts as trusted or untrusted,preventing the server from becoming overwhelmed.The proposed framework effectively controls access,safeguards data privacy,and reduces the risk of cyberattacks.The results depict that the suggested framework outperforms the results in terms of accuracy,precision,sensitivity,recall,and F-measure at 96.9%,98.43%,98.8%,98.43%,and 98.4%,respectively.展开更多
Spatial heterogeneity refers to the variation or differences in characteristics or features across different locations or areas in space. Spatial data refers to information that explicitly or indirectly belongs to a p...Spatial heterogeneity refers to the variation or differences in characteristics or features across different locations or areas in space. Spatial data refers to information that explicitly or indirectly belongs to a particular geographic region or location, also known as geo-spatial data or geographic information. Focusing on spatial heterogeneity, we present a hybrid machine learning model combining two competitive algorithms: the Random Forest Regressor and CNN. The model is fine-tuned using cross validation for hyper-parameter adjustment and performance evaluation, ensuring robustness and generalization. Our approach integrates Global Moran’s I for examining global autocorrelation, and local Moran’s I for assessing local spatial autocorrelation in the residuals. To validate our approach, we implemented the hybrid model on a real-world dataset and compared its performance with that of the traditional machine learning models. Results indicate superior performance with an R-squared of 0.90, outperforming RF 0.84 and CNN 0.74. This study contributed to a detailed understanding of spatial variations in data considering the geographical information (Longitude & Latitude) present in the dataset. Our results, also assessed using the Root Mean Squared Error (RMSE), indicated that the hybrid yielded lower errors, showing a deviation of 53.65% from the RF model and 63.24% from the CNN model. Additionally, the global Moran’s I index was observed to be 0.10. This study underscores that the hybrid was able to predict correctly the house prices both in clusters and in dispersed areas.展开更多
Background:Stroke is one of the most dangerous and life-threatening disease as it can cause lasting brain damage,long-term disability,or even death.The early detection of warning signs of a stroke can help save the li...Background:Stroke is one of the most dangerous and life-threatening disease as it can cause lasting brain damage,long-term disability,or even death.The early detection of warning signs of a stroke can help save the life of a patient.In this paper,we adopted machine learning approaches to predict strokes and identify the three most important factors that are associated with strokes.Methods:This study used an open-access stroke prediction dataset.We developed 11 machine learning models and compare the results to those found in prior studies.Results:The accuracy,recall and area under the curve for the random forest model in our study is significantly higher than those of other studies.Machine learning models,particularly the random forest algorithm,can accurately predict the risk of stroke and support medical decision making.Conclusion:Our findings can be applied to design clinical prediction systems at the point of care.展开更多
Airplanes are a social necessity for movement of humans,goods,and other.They are generally safe modes of transportation;however,incidents and accidents occasionally occur.To prevent aviation accidents,it is necessary ...Airplanes are a social necessity for movement of humans,goods,and other.They are generally safe modes of transportation;however,incidents and accidents occasionally occur.To prevent aviation accidents,it is necessary to develop a machine-learning model to detect and predict commercial flights using automatic dependent surveillance–broadcast data.This study combined data-quality detection,anomaly detection,and abnormality-classification-model development.The research methodology involved the following stages:problem statement,data selection and labeling,prediction-model development,deployment,and testing.The data labeling process was based on the rules framed by the international civil aviation organization for commercial,jet-engine flights and validated by expert commercial pilots.The results showed that the best prediction model,the quadratic-discriminant-analysis,was 93%accurate,indicating a“good fit”.Moreover,the model’s area-under-the-curve results for abnormal and normal detection were 0.97 and 0.96,respectively,thus confirming its“good fit”.展开更多
To address the challenges of current college student employment management,this study designed and implemented a machine learning-based decision support system for college student employment management.The system coll...To address the challenges of current college student employment management,this study designed and implemented a machine learning-based decision support system for college student employment management.The system collects and analyzes multidimensional data,uses machine learning algorithms for prediction and matching,provides personalized employment guidance for students,and provides decision support for universities and enterprises.The research results indicate that the system can effectively improve the efficiency and accuracy of employment guidance,promote school-enterprise cooperation,and achieve a win-win situation for all parties.展开更多
This work leveraged predictive modeling techniques in machine learning (ML) to predict heart disease using a dataset sourced from the Center for Disease Control and Prevention in the US. The dataset was preprocessed a...This work leveraged predictive modeling techniques in machine learning (ML) to predict heart disease using a dataset sourced from the Center for Disease Control and Prevention in the US. The dataset was preprocessed and used to train five machine learning models: random forest, support vector machine, logistic regression, extreme gradient boosting and light gradient boosting. The goal was to use the best performing model to develop a web application capable of reliably predicting heart disease based on user-provided data. The extreme gradient boosting classifier provided the most reliable results with precision, recall and F1-score of 97%, 72%, and 83% respectively for Class 0 (no heart disease) and 21% (precision), 81% (recall) and 34% (F1-score) for Class 1 (heart disease). The model was further deployed as a web application.展开更多
Typically, magnesium alloys have been designed using a so-called hill-climbing approach, with rather incremental advances over the past century. Iterative and incremental alloy design is slow and expensive, but more i...Typically, magnesium alloys have been designed using a so-called hill-climbing approach, with rather incremental advances over the past century. Iterative and incremental alloy design is slow and expensive, but more importantly it does not harness all the data that exists in the field. In this work, a new approach is proposed that utilises data science and provides a detailed understanding of the data that exists in the field of Mg-alloy design to date. In this approach, first a consolidated alloy database that incorporates 916 datapoints was developed from the literature and experimental work. To analyse the characteristics of the database, alloying and thermomechanical processing effects on mechanical properties were explored via composition-process-property matrices. An unsupervised machine learning(ML) method of clustering was also implemented, using unlabelled data, with the aim of revealing potentially useful information for an alloy representation space of low dimensionality. In addition, the alloy database was correlated to thermodynamically stable secondary phases to further understand the relationships between microstructure and mechanical properties. This work not only introduces an invaluable open-source database, but it also provides, for the first-time data, insights that enable future accelerated digital Mg-alloy design.展开更多
Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of i...Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.:U2202213)the Special Program for the Major Science and Technology Projects of Yunnan Province,China(Grant Nos.:202102AE090051-1-01,and 202202AE090001).
文摘In traditional medicine and ethnomedicine,medicinal plants have long been recognized as the basis for materials in therapeutic applications worldwide.In particular,the remarkable curative effect of traditional Chinese medicine during corona virus disease 2019(COVID-19)pandemic has attracted extensive attention globally.Medicinal plants have,therefore,become increasingly popular among the public.However,with increasing demand for and profit with medicinal plants,commercial fraudulent events such as adulteration or counterfeits sometimes occur,which poses a serious threat to the clinical outcomes and interests of consumers.With rapid advances in artificial intelligence,machine learning can be used to mine information on various medicinal plants to establish an ideal resource database.We herein present a review that mainly introduces common machine learning algorithms and discusses their application in multi-source data analysis of medicinal plants.The combination of machine learning algorithms and multi-source data analysis facilitates a comprehensive analysis and aids in the effective evaluation of the quality of medicinal plants.The findings of this review provide new possibilities for promoting the development and utilization of medicinal plants.
基金Korea Institute of Energy Technology Evaluation and Planning(KETEP)grant funded by the Korea government(Grant No.20214000000140,Graduate School of Convergence for Clean Energy Integrated Power Generation)Korea Basic Science Institute(National Research Facilities and Equipment Center)grant funded by the Ministry of Education(2021R1A6C101A449)the National Research Foundation of Korea grant funded by the Ministry of Science and ICT(2021R1A2C1095139),Republic of Korea。
文摘Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary with a deformation condition.This study proposes a novel approach for accurately predicting an anisotropic deformation behavior of wrought Mg alloys using machine learning(ML)with data augmentation.The developed model combines four key strategies from data science:learning the entire flow curves,generative adversarial networks(GAN),algorithm-driven hyperparameter tuning,and gated recurrent unit(GRU)architecture.The proposed model,namely GAN-aided GRU,was extensively evaluated for various predictive scenarios,such as interpolation,extrapolation,and a limited dataset size.The model exhibited significant predictability and improved generalizability for estimating the anisotropic compressive behavior of ZK60 Mg alloys under 11 annealing conditions and for three loading directions.The GAN-aided GRU results were superior to those of previous ML models and constitutive equations.The superior performance was attributed to hyperparameter optimization,GAN-based data augmentation,and the inherent predictivity of the GRU for extrapolation.As a first attempt to employ ML techniques other than artificial neural networks,this study proposes a novel perspective on predicting the anisotropic deformation behaviors of wrought Mg alloys.
基金supported in part by the National Key Research and Development Program of China(No.2022YFB3305403)Project of basic research funds for central universities(2022CDJDX006)+1 种基金Talent Plan Project of Chongqing(No.cstc2021ycjhbgzxm0295)National Natural Science Foundation of China(No.52111530194)。
文摘Accurate capacity estimation is of great importance for the reliable state monitoring,timely maintenance,and second-life utilization of lithium-ion batteries.Despite numerous works on battery capacity estimation using laboratory datasets,most of them are applied to battery cells and lack satisfactory fidelity when extended to real-world electric vehicle(EV)battery packs.The challenges intensify for large-sized EV battery packs,where unpredictable operating profiles and low-quality data acquisition hinder precise capacity estimation.To fill the gap,this study introduces a novel data-driven battery pack capacity estimation method grounded in field data.The proposed approach begins by determining labeled capacity through an innovative combination of the inverse ampere-hour integral,open circuit voltage-based,and resistance-based correction methods.Then,multiple health features are extracted from incremental capacity curves,voltage curves,equivalent circuit model parameters,and operating temperature to thoroughly characterize battery aging behavior.A feature selection procedure is performed to determine the optimal feature set based on the Pearson correlation coefficient.Moreover,a convolutional neural network and bidirectional gated recurrent unit,enhanced by an attention mechanism,are employed to estimate the battery pack capacity in real-world EV applications.Finally,the proposed method is validated with a field dataset from two EVs,covering approximately 35,000 kilometers.The results demonstrate that the proposed method exhibits better estimation performance with an error of less than 1.1%compared to existing methods.This work shows great potential for accurate large-sized EV battery pack capacity estimation based on field data,which provides significant insights into reliable labeled capacity calculation,effective features extraction,and machine learning-enabled health diagnosis.
基金the National Natural Science Foun-dation of China(Grant Nos.12105090 and 12175057).
文摘Leveraging the extraordinary phenomena of quantum superposition and quantum correlation,quantum computing offers unprecedented potential for addressing challenges beyond the reach of classical computers.This paper tackles two pivotal challenges in the realm of quantum computing:firstly,the development of an effective encoding protocol for translating classical data into quantum states,a critical step for any quantum computation.Different encoding strategies can significantly influence quantum computer performance.Secondly,we address the need to counteract the inevitable noise that can hinder quantum acceleration.Our primary contribution is the introduction of a novel variational data encoding method,grounded in quantum regression algorithm models.By adapting the learning concept from machine learning,we render data encoding a learnable process.This allowed us to study the role of quantum correlation in data encoding.Through numerical simulations of various regression tasks,we demonstrate the efficacy of our variational data encoding,particularly post-learning from instructional data.Moreover,we delve into the role of quantum correlation in enhancing task performance,especially in noisy environments.Our findings underscore the critical role of quantum correlation in not only bolstering performance but also in mitigating noise interference,thus advancing the frontier of quantum computing.
基金This work was supported by the National Natural Science Foundation of China(U2133208,U20A20161).
文摘With the popularization of the Internet and the development of technology,cyber threats are increasing day by day.Threats such as malware,hacking,and data breaches have had a serious impact on cybersecurity.The network security environment in the era of big data presents the characteristics of large amounts of data,high diversity,and high real-time requirements.Traditional security defense methods and tools have been unable to cope with the complex and changing network security threats.This paper proposes a machine-learning security defense algorithm based on metadata association features.Emphasize control over unauthorized users through privacy,integrity,and availability.The user model is established and the mapping between the user model and the metadata of the data source is generated.By analyzing the user model and its corresponding mapping relationship,the query of the user model can be decomposed into the query of various heterogeneous data sources,and the integration of heterogeneous data sources based on the metadata association characteristics can be realized.Define and classify customer information,automatically identify and perceive sensitive data,build a behavior audit and analysis platform,analyze user behavior trajectories,and complete the construction of a machine learning customer information security defense system.The experimental results show that when the data volume is 5×103 bit,the data storage integrity of the proposed method is 92%.The data accuracy is 98%,and the success rate of data intrusion is only 2.6%.It can be concluded that the data storage method in this paper is safe,the data accuracy is always at a high level,and the data disaster recovery performance is good.This method can effectively resist data intrusion and has high air traffic control security.It can not only detect all viruses in user data storage,but also realize integrated virus processing,and further optimize the security defense effect of user big data.
基金supported by the MSIT(Ministry of Science and ICT),Korea,under the Special R&D Zone Development Project(R&D)—Development of R&D Innovation Valley support program(2023-DD-RD-0152)supervised by the Innovation Foundation.It was also partially supported by the Ministry of Science and ICT(MSIT),Korea,under the Information Technology Research Center(ITRC)support program(IITP-2024-2020-0-01797)supervised by the Institute for Information&Communications Technology Planning&Evaluation(IITP).
文摘In Decentralized Machine Learning(DML)systems,system participants contribute their resources to assist others in developing machine learning solutions.Identifying malicious contributions in DML systems is challenging,which has led to the exploration of blockchain technology.Blockchain leverages its transparency and immutability to record the provenance and reliability of training data.However,storing massive datasets or implementing model evaluation processes on smart contracts incurs high computational costs.Additionally,current research on preventing malicious contributions in DML systems primarily focuses on protecting models from being exploited by workers who contribute incorrect or misleading data.However,less attention has been paid to the scenario where malicious requesters intentionally manipulate test data during evaluation to gain an unfair advantage.This paper proposes a transparent and accountable training data sharing method that securely shares data among potentially malicious system participants.First,we introduce a blockchain-based DML system architecture that supports secure training data sharing through the IPFS network.Second,we design a blockchain smart contract to transparently split training datasets into training and test datasets,respectively,without involving system participants.Under the system,transparent and accountable training data sharing can be achieved with attribute-based proxy re-encryption.We demonstrate the security analysis for the system,and conduct experiments on the Ethereum and IPFS platforms to show the feasibility and practicality of the system.
文摘One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelligence (AI) havebecome the basis for making strategic decisions in many sensitive areas, such as fraud detection, risk management,medical diagnosis, and counter-terrorism. However, there is still a need to assess how terrorist attacks are related,initiated, and detected. For this purpose, we propose a novel framework for classifying and predicting terroristattacks. The proposed framework posits that neglected text attributes included in the Global Terrorism Database(GTD) can influence the accuracy of the model’s classification of terrorist attacks, where each part of the datacan provide vital information to enrich the ability of classifier learning. Each data point in a multiclass taxonomyhas one or more tags attached to it, referred as “related tags.” We applied machine learning classifiers to classifyterrorist attack incidents obtained from the GTD. A transformer-based technique called DistilBERT extracts andlearns contextual features from text attributes to acquiremore information from text data. The extracted contextualfeatures are combined with the “key features” of the dataset and used to perform the final classification. Thestudy explored different experimental setups with various classifiers to evaluate the model’s performance. Theexperimental results show that the proposed framework outperforms the latest techniques for classifying terroristattacks with an accuracy of 98.7% using a combined feature set and extreme gradient boosting classifier.
文摘Analyzing big data, especially medical data, helps to provide good health care to patients and face the risks of death. The COVID-19 pandemic has had a significant impact on public health worldwide, emphasizing the need for effective risk prediction models. Machine learning (ML) techniques have shown promise in analyzing complex data patterns and predicting disease outcomes. The accuracy of these techniques is greatly affected by changing their parameters. Hyperparameter optimization plays a crucial role in improving model performance. In this work, the Particle Swarm Optimization (PSO) algorithm was used to effectively search the hyperparameter space and improve the predictive power of the machine learning models by identifying the optimal hyperparameters that can provide the highest accuracy. A dataset with a variety of clinical and epidemiological characteristics linked to COVID-19 cases was used in this study. Various machine learning models, including Random Forests, Decision Trees, Support Vector Machines, and Neural Networks, were utilized to capture the complex relationships present in the data. To evaluate the predictive performance of the models, the accuracy metric was employed. The experimental findings showed that the suggested method of estimating COVID-19 risk is effective. When compared to baseline models, the optimized machine learning models performed better and produced better results.
基金Australian Research Council,Grant/Award Numbers:FT210100624,DP190101985,DE230101033。
文摘As some recent information security legislation endowed users with unconditional rights to be forgotten by any trained machine learning model,personalised IoT service pro-viders have to put unlearning functionality into their consideration.The most straight-forward method to unlearn users'contribution is to retrain the model from the initial state,which is not realistic in high throughput applications with frequent unlearning requests.Though some machine unlearning frameworks have been proposed to speed up the retraining process,they fail to match decentralised learning scenarios.A decentralised unlearning framework called heterogeneous decentralised unlearning framework with seed(HDUS)is designed,which uses distilled seed models to construct erasable en-sembles for all clients.Moreover,the framework is compatible with heterogeneous on-device models,representing stronger scalability in real-world applications.Extensive experiments on three real-world datasets show that our HDUS achieves state-of-the-art performance.
文摘Metallic alloys for a given application are usually designed to achieve the desired properties by devising experimentsbased on experience, thermodynamic and kinetic principles, and various modeling and simulation exercises.However, the influence of process parameters and material properties is often non-linear and non-colligative. Inrecent years, machine learning (ML) has emerged as a promising tool to dealwith the complex interrelation betweencomposition, properties, and process parameters to facilitate accelerated discovery and development of new alloysand functionalities. In this study, we adopt an ML-based approach, coupled with genetic algorithm (GA) principles,to design novel copper alloys for achieving seemingly contradictory targets of high strength and high electricalconductivity. Initially, we establish a correlation between the alloy composition (binary to multi-component) andthe target properties, namely, electrical conductivity and mechanical strength. Catboost, an ML model coupledwith GA, was used for this task. The accuracy of the model was above 93.5%. Next, for obtaining the optimizedcompositions the outputs fromthe initial model were refined by combining the concepts of data augmentation andPareto front. Finally, the ultimate objective of predicting the target composition that would deliver the desired rangeof properties was achieved by developing an advancedMLmodel through data segregation and data augmentation.To examine the reliability of this model, results were rigorously compared and verified using several independentdata reported in the literature. This comparison substantiates that the results predicted by our model regarding thevariation of conductivity and evolution ofmicrostructure and mechanical properties with composition are in goodagreement with the reports published in the literature.
文摘Over the past two decades,machine learning techniques have been extensively used in predicting reservoir properties.While this approach has significantly contributed to the industry,selecting an appropriate model is still challenging for most researchers.Relying solely on statistical metrics to select the best model for a particular problem may not always be the most effective approach.This study encourages researchers to incorporate data visualization in their analysis and model selection process.To evaluate the suitability of different models in predicting horizontal permeability in the Volve field,wireline logs were used to train Extra-Trees,Ridge,Bagging,and XGBoost models.The Random Forest feature selection technique was applied to select the relevant logs as inputs for the models.Based on statistical metrics,the Extra-Trees model achieved the highest test accuracy of 0.996,RMSE of 19.54 mD,and MAE of 3.18 mD,with XGBoost coming in second.However,when the results were visualised,it was discovered that the XGBoost model was more suitable for the problem being tackled.The XGBoost model was a better predictor within the sandstone interval,while the Extra-Trees model was more appropriate in non-sandstone intervals.Since this study aims to predict permeability in the reservoir interval,the XGBoost model is the most suitable.These contrasting results demonstrate the importance of incorporating data visualisation techniques as an evaluation metric.Given the heterogeneity of the subsurface,relying solely on statistical metrics may not be sufficient to determine which model is best suited for a particular problem.
基金This work was supported by a research grant from Seoul Women’s University(2023-0183).
文摘Broadcasting gateway equipment generally uses a method of simply switching to a spare input stream when a failure occurs in a main input stream.However,when the transmission environment is unstable,problems such as reduction in the lifespan of equipment due to frequent switching and interruption,delay,and stoppage of services may occur.Therefore,applying a machine learning(ML)method,which is possible to automatically judge and classify network-related service anomaly,and switch multi-input signals without dropping or changing signals by predicting or quickly determining the time of error occurrence for smooth stream switching when there are problems such as transmission errors,is required.In this paper,we propose an intelligent packet switching method based on the ML method of classification,which is one of the supervised learning methods,that presents the risk level of abnormal multi-stream occurring in broadcasting gateway equipment based on data.Furthermore,we subdivide the risk levels obtained from classification techniques into probabilities and then derive vectorized representative values for each attribute value of the collected input data and continuously update them.The obtained reference vector value is used for switching judgment through the cosine similarity value between input data obtained when a dangerous situation occurs.In the broadcasting gateway equipment to which the proposed method is applied,it is possible to perform more stable and smarter switching than before by solving problems of reliability and broadcasting accidents of the equipment and can maintain stable video streaming as well.
文摘The Internet of Things(IoT)is growing rapidly and impacting almost every aspect of our lives,fromwearables and healthcare to security,traffic management,and fleet management systems.This has generated massive volumes of data and security,and data privacy risks are increasing with the advancement of technology and network connections.Traditional access control solutions are inadequate for establishing access control in IoT systems to provide data protection owing to their vulnerability to single-point OF failure.Additionally,conventional privacy preservation methods have high latency costs and overhead for resource-constrained devices.Previous machine learning approaches were also unable to detect denial-of-service(DoS)attacks.This study introduced a novel decentralized and secure framework for blockchain integration.To avoid single-point OF failure,an accredited access control scheme is incorporated,combining blockchain with local peers to record each transaction and verify the signature to access.Blockchain-based attribute-based cryptography is implemented to protect data storage privacy by generating threshold parameters,managing keys,and revoking users on the blockchain.An innovative contract-based DOS attack mitigation method is also incorporated to effectively validate devices with intelligent contracts as trusted or untrusted,preventing the server from becoming overwhelmed.The proposed framework effectively controls access,safeguards data privacy,and reduces the risk of cyberattacks.The results depict that the suggested framework outperforms the results in terms of accuracy,precision,sensitivity,recall,and F-measure at 96.9%,98.43%,98.8%,98.43%,and 98.4%,respectively.
文摘Spatial heterogeneity refers to the variation or differences in characteristics or features across different locations or areas in space. Spatial data refers to information that explicitly or indirectly belongs to a particular geographic region or location, also known as geo-spatial data or geographic information. Focusing on spatial heterogeneity, we present a hybrid machine learning model combining two competitive algorithms: the Random Forest Regressor and CNN. The model is fine-tuned using cross validation for hyper-parameter adjustment and performance evaluation, ensuring robustness and generalization. Our approach integrates Global Moran’s I for examining global autocorrelation, and local Moran’s I for assessing local spatial autocorrelation in the residuals. To validate our approach, we implemented the hybrid model on a real-world dataset and compared its performance with that of the traditional machine learning models. Results indicate superior performance with an R-squared of 0.90, outperforming RF 0.84 and CNN 0.74. This study contributed to a detailed understanding of spatial variations in data considering the geographical information (Longitude & Latitude) present in the dataset. Our results, also assessed using the Root Mean Squared Error (RMSE), indicated that the hybrid yielded lower errors, showing a deviation of 53.65% from the RF model and 63.24% from the CNN model. Additionally, the global Moran’s I index was observed to be 0.10. This study underscores that the hybrid was able to predict correctly the house prices both in clusters and in dispersed areas.
文摘Background:Stroke is one of the most dangerous and life-threatening disease as it can cause lasting brain damage,long-term disability,or even death.The early detection of warning signs of a stroke can help save the life of a patient.In this paper,we adopted machine learning approaches to predict strokes and identify the three most important factors that are associated with strokes.Methods:This study used an open-access stroke prediction dataset.We developed 11 machine learning models and compare the results to those found in prior studies.Results:The accuracy,recall and area under the curve for the random forest model in our study is significantly higher than those of other studies.Machine learning models,particularly the random forest algorithm,can accurately predict the risk of stroke and support medical decision making.Conclusion:Our findings can be applied to design clinical prediction systems at the point of care.
文摘Airplanes are a social necessity for movement of humans,goods,and other.They are generally safe modes of transportation;however,incidents and accidents occasionally occur.To prevent aviation accidents,it is necessary to develop a machine-learning model to detect and predict commercial flights using automatic dependent surveillance–broadcast data.This study combined data-quality detection,anomaly detection,and abnormality-classification-model development.The research methodology involved the following stages:problem statement,data selection and labeling,prediction-model development,deployment,and testing.The data labeling process was based on the rules framed by the international civil aviation organization for commercial,jet-engine flights and validated by expert commercial pilots.The results showed that the best prediction model,the quadratic-discriminant-analysis,was 93%accurate,indicating a“good fit”.Moreover,the model’s area-under-the-curve results for abnormal and normal detection were 0.97 and 0.96,respectively,thus confirming its“good fit”.
文摘To address the challenges of current college student employment management,this study designed and implemented a machine learning-based decision support system for college student employment management.The system collects and analyzes multidimensional data,uses machine learning algorithms for prediction and matching,provides personalized employment guidance for students,and provides decision support for universities and enterprises.The research results indicate that the system can effectively improve the efficiency and accuracy of employment guidance,promote school-enterprise cooperation,and achieve a win-win situation for all parties.
文摘This work leveraged predictive modeling techniques in machine learning (ML) to predict heart disease using a dataset sourced from the Center for Disease Control and Prevention in the US. The dataset was preprocessed and used to train five machine learning models: random forest, support vector machine, logistic regression, extreme gradient boosting and light gradient boosting. The goal was to use the best performing model to develop a web application capable of reliably predicting heart disease based on user-provided data. The extreme gradient boosting classifier provided the most reliable results with precision, recall and F1-score of 97%, 72%, and 83% respectively for Class 0 (no heart disease) and 21% (precision), 81% (recall) and 34% (F1-score) for Class 1 (heart disease). The model was further deployed as a web application.
基金the support of the Monash-IITB Academy Scholarshipfunded in part by the Australian Research Council (DP190103592)。
文摘Typically, magnesium alloys have been designed using a so-called hill-climbing approach, with rather incremental advances over the past century. Iterative and incremental alloy design is slow and expensive, but more importantly it does not harness all the data that exists in the field. In this work, a new approach is proposed that utilises data science and provides a detailed understanding of the data that exists in the field of Mg-alloy design to date. In this approach, first a consolidated alloy database that incorporates 916 datapoints was developed from the literature and experimental work. To analyse the characteristics of the database, alloying and thermomechanical processing effects on mechanical properties were explored via composition-process-property matrices. An unsupervised machine learning(ML) method of clustering was also implemented, using unlabelled data, with the aim of revealing potentially useful information for an alloy representation space of low dimensionality. In addition, the alloy database was correlated to thermodynamically stable secondary phases to further understand the relationships between microstructure and mechanical properties. This work not only introduces an invaluable open-source database, but it also provides, for the first-time data, insights that enable future accelerated digital Mg-alloy design.
基金This work is supported by the National Natural Science Foundation of China(Grant No.51991392)Key Deployment Projects of Chinese Academy of Sciences(Grant No.ZDRW-ZS-2021-3-3)the Second Tibetan Plateau Scientific Expedition and Research Program(STEP)(Grant No.2019QZKK0904).
文摘Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.