Technological advancements in recent decades have greatly transformed the field of material chemistry.Juxtaposing the accentuating energy demand with the pollution associated,urgent measures are required to ensure ene...Technological advancements in recent decades have greatly transformed the field of material chemistry.Juxtaposing the accentuating energy demand with the pollution associated,urgent measures are required to ensure energy maximization,while reducing the extended experimental time cycle involved in energy production.In lieu of this,the prominence of catalysts in chemical reactions,particularly energy related reactions cannot be undermined,and thus it is critical to discover and design catalyst,towards the optimization of chemical processes and generation of sustainable energy.Most recently,artificial intelligence(AI)has been incorporated into several fields,particularly in advancing catalytic processes.The integration of intensive data set,machine learning models and robotics,provides a very powerful tool in modifying material synthesis and optimization by generating multifarious dataset amenable with machine learning techniques.The employment of robots automates the process of dataset and machine learning models integration in screening intermetallic surfaces of catalyst,with extreme accuracy and swiftness comparable to a number of human researchers.Although,the utilization of robots in catalyst discovery is still in its infancy,in this review we summarize current sway of artificial intelligence in catalyst discovery,briefly describe the application of databases,machine learning models and robots in this field,with emphasis on the consolidation of these monomeric units into a tripartite flow process.We point out current trends of machine learning and hybrid models of first principle calculations(DFT)for generating dataset,which is integrable into autonomous flow process of catalyst discovery.Also,we discuss catalyst discovery for renewable energy related reactions using this tripartite flow process with predetermined descriptors.展开更多
The discovery of novel materials with desired properties is essential to the advancements of energy-related technologies.Despite the rapid development of computational infrastructures and theoretical approaches,progre...The discovery of novel materials with desired properties is essential to the advancements of energy-related technologies.Despite the rapid development of computational infrastructures and theoretical approaches,progress so far has been limited by the empirical and serial nature of experimental work.Fortunately,the situation is changing thanks to the maturation of theoretical tools such as density functional theory,high-throughput screening,crystal structure prediction,and emerging approaches based on machine learning.Together these recent innovations in computational chemistry,data informatics,and machine learning have acted as catalysts for revolutionizing material design and hopefully will lead to faster kinetics in the development of energy-related industries.In this report,recent advances in material discovery methods are reviewed for energy devices.Three paradigms based on empiricism-driven experiments,database-driven high-throughput screening,and data informatics-driven machine learning are discussed critically.Key methodological advancements involved are reviewed including high-throughput screening,crystal structure prediction,and generative models for target material design.Their applications in energy-related devices such as batteries,catalysts,and photovoltaics are selectively showcased.展开更多
Discovery of useful forecasting rules from observational weather data is an outstanding interesting topic.The traditional methods of acquiring forecasting knowledge are manual analysis and investigation performed by h...Discovery of useful forecasting rules from observational weather data is an outstanding interesting topic.The traditional methods of acquiring forecasting knowledge are manual analysis and investigation performed by human scientists.This paper presents the experimental results of an automatic machine learning system which derives forecasting rules from real observational data.We tested the system on the two large real data sets from the areas of centra! China and Victoria of Australia.The experimental results show that the forecasting rules discovered by the system are very competitive to human experts.The forecasting accuracy rates are 86.4% and 78% of the two data sets respectively展开更多
This research examines industry-based dissertation research in a doctoralcomputing program through the lens of machine learning algorithms todetermine if natural language processing-based categorization on abstractsal...This research examines industry-based dissertation research in a doctoralcomputing program through the lens of machine learning algorithms todetermine if natural language processing-based categorization on abstractsalone is adequate for classification. This research categorizes dissertationby both their abstracts and by their full-text using the GraphLabCreate library from Apple’s Turi to identify if abstract analysis is anadequate measure of content categorization, which we found was not. Wealso compare the dissertation categorizations using IBM’s Watson Discoverydeep machine learning tool. Our research provides perspectiveson the practicality of the manual classification of technical documents;and, it provides insights into the: (1) categories of academic work createdby experienced fulltime working professionals in a Computing doctoralprogram, (2) viability and performance of automated categorization of theabstract analysis against the fulltext dissertation analysis, and (3) natuallanguage processing versus human manual text classification abstraction.展开更多
Finding materials with specific properties is a hot topic in materials science.Traditional materials design relies on empirical and trial-and-error methods,requiring extensive experiments and time,resulting in high co...Finding materials with specific properties is a hot topic in materials science.Traditional materials design relies on empirical and trial-and-error methods,requiring extensive experiments and time,resulting in high costs.With the development of physics,statistics,computer science,and other fields,machine learning offers opportunities for systematically discovering new materials.Especially through machine learning-based inverse design,machine learning algorithms analyze the mapping relationships between materials and their properties to find materials with desired properties.This paper first outlines the basic concepts of materials inverse design and the challenges faced by machine learning-based approaches to materials inverse design.Then,three main inverse design methods—exploration-based,model-based,and optimization-based—are analyzed in the context of different application scenarios.Finally,the applications of inverse design methods in alloys,optical materials,and acoustic materials are elaborated on,and the prospects for materials inverse design are discussed.The authors hope to accelerate the discovery of new materials and provide new possibilities for advancing materials science and innovative design methods.展开更多
The screening of novel materials with good performance and the modelling of quantitative structureactivity relationships(QSARs),among other issues,are hot topics in the field of materials science.Traditional experimen...The screening of novel materials with good performance and the modelling of quantitative structureactivity relationships(QSARs),among other issues,are hot topics in the field of materials science.Traditional experiments and computational modelling often consume tremendous time and resources and are limited by their experimental conditions and theoretical foundations.Thus,it is imperative to develop a new method of accelerating the discovery and design process for novel materials.Recently,materials discovery and design using machine learning have been receiving increasing attention and have achieved great improvements in both time efficiency and prediction accuracy.In this review,we first outline the typical mode of and basic procedures for applying machine learning in materials science,and we classify and compare the main algorithms.Then,the current research status is reviewed with regard to applications of machine learning in material property prediction,in new materials discovery and for other purposes.Finally,we discuss problems related to machine learning in materials science,propose possible solutions,and forecast potential directions of future research.By directly combining computational studies with experiments,we hope to provide insight into the parameters that affect the properties of materials,thereby enabling more efficient and target-oriented research on materials discovery and design.展开更多
The discovery of new materials is one of the driving forces to promote the development of modern society and technology innovation,the traditional materials research mainly depended on the trial-and-error method,which...The discovery of new materials is one of the driving forces to promote the development of modern society and technology innovation,the traditional materials research mainly depended on the trial-and-error method,which is time-consuming and laborious.Recently,machine learning(ML)methods have made great progress in the researches of materials science with the arrival of the big-data era,which gives a deep revolution in human society and advance science greatly.However,there exist few systematic generalization and summaries about the applications of ML methods in materials science.In this review,we first provide a brief account of the progress of researches on materials science with ML employed,the main ideas and basic procedures of this method are emphatically introduced.Then the algorithms of ML which were frequently used in the researches of materials science are classified and compared.Finally,the recent meaningful applications of ML in metal materials,battery materials,photovoltaic materials and metallic glass are reviewed.展开更多
The current rise of artificial intelligence and machine learning has been significant.It has reduced the human workload improved quality of life significantly.This article describes the use of artificial intelligence ...The current rise of artificial intelligence and machine learning has been significant.It has reduced the human workload improved quality of life significantly.This article describes the use of artificial intelligence and machine learning to augment drug discovery and development to make them more efficient and accurate.In this study,a systematic evaluation of studies was carried out;these were selected based on prior knowledge of the authors and a keyword search in publicly available databases which were filtered based on related context,abstract,methodology,and full text.This body of work supported the roles of machine learning and artificial intelligence in facilitating drug development and discovery processes,making them more cost-effective or altogether eliminating the need for clinical trials,owing to the ability to conduct simulations using these technologies.They also enabled researchers to study different molecules more extensively,without any trials.The results of this paper demonstrate the prevalent application of machine learning and artificial intelligence methods in drug discovery,and indicate a promising future for these technologies;these results should enable researchers,students,and pharmaceutical industry to dive deeper into machine learning and artificial intelligence in a drug discovery and development context.展开更多
Discovering new materials with excellent performance is a hot issue in the materials genome initiative.Traditional experiments and calculations often waste large amounts of time and money and are also limited by vario...Discovering new materials with excellent performance is a hot issue in the materials genome initiative.Traditional experiments and calculations often waste large amounts of time and money and are also limited by various conditions. Therefore, it is imperative to develop a new method to accelerate the discovery and design of new materials. In recent years, material discovery and design methods using machine learning have attracted much attention from material experts and have made some progress. This review first outlines available materials database and material data analytics tools and then elaborates on the machine learning algorithms used in materials science. Next, the field of application of machine learning in materials science is summarized, focusing on the aspects of structure determination, performance prediction, fingerprint prediction, and new material discovery. Finally, the review points out the problems of data and machine learning in materials science and points to future research. Using machine learning algorithms, the authors hope to achieve amazing results in material discovery and design.展开更多
Breast cancer is presently one of the most common malignancies worldwide,with a higher fatality rate.In this study,a quantitative structure-activity relationship(QSAR)model of compound biological activity and ADMET(Ab...Breast cancer is presently one of the most common malignancies worldwide,with a higher fatality rate.In this study,a quantitative structure-activity relationship(QSAR)model of compound biological activity and ADMET(Absorption,Distribution,Metabolism,Excretion,Toxicity)properties prediction model were performed using estrogen receptor alpha(ERα)antagonist information collected from compound samples.We first utilized grey relation analysis(GRA)in conjunction with the random forest(RF)algorithm to identify the top 20 molecular descriptor variables that have the greatest influence on biological activity,and then we used Spearman correlation analysis to identify 16 independent variables.Second,a QSAR model of the compound were developed based on BP neural network(BPNN),genetic algorithm optimized BP neural network(GA-BPNN),and support vector regression(SVR).The BPNN,the SVR,and the logistic regression(LR)models were then used to identify and predict the ADMET properties of substances,with the prediction impacts of each model compared and assessed.The results reveal that a SVR model was used in QSAR quantitative prediction,and in the classification prediction of ADMET properties:the SVR model predicts the Caco-2 and hERG(human Ether-a-go-go Related Gene)properties,the LR model predicts the cytochrome P450 enzyme 3A4 subtype(CYP3A4)and Micronucleus(MN)properties,and the BPNN model predicts the Human Oral Bioavailability(HOB)properties.Finally,information entropy theory is used to validate the rationality of variable screening,and sensitivity analysis of the model demonstrates that the constructed model has high accuracy and stability,which can be used as a reference for screening probable active compounds and drug discovery.展开更多
The screening of advanced materials coupled with the modeling of their quantitative structural-activity relation-ships has recently become one of the hot and trending topics in energy materials due to the diverse chal...The screening of advanced materials coupled with the modeling of their quantitative structural-activity relation-ships has recently become one of the hot and trending topics in energy materials due to the diverse challenges,including low success probabilities,high time consumption,and high computational cost associated with the traditional methods of developing energy materials.Following this,new research concepts and technologies to promote the research and development of energy materials become necessary.The latest advancements in ar-tificial intelligence and machine learning have therefore increased the expectation that data-driven materials science would revolutionize scientific discoveries towards providing new paradigms for the development of en-ergy materials.Furthermore,the current advances in data-driven materials engineering also demonstrate that the application of machine learning technology would not only significantly facilitate the design and development of advanced energy materials but also enhance their discovery and deployment.In this article,the importance and necessity of developing new energy materials towards contributing to the global carbon neutrality are presented.A comprehensive introduction to the fundamentals of machine learning is also provided,including open-source databases,feature engineering,machine learning algorithms,and analysis of machine learning model.Afterwards,the latest progress in data-driven materials science and engineering,including alkaline ion battery materials,pho-tovoltaic materials,catalytic materials,and carbon dioxide capture materials,is discussed.Finally,relevant clues to the successful applications of machine learning and the remaining challenges towards the development of advanced energy materials are highlighted.展开更多
When detecting deletions in complex human genomes,split-read approaches using short reads generated with next-generation sequencing still face the challenge that either false discovery rate is high,or sensitivity is l...When detecting deletions in complex human genomes,split-read approaches using short reads generated with next-generation sequencing still face the challenge that either false discovery rate is high,or sensitivity is low.To address the problem,an integrated strategy is proposed.It organically combines the fundamental theories of the three mainstream methods(read-pair approaches,split-read technologies and read-depth analysis) with modern machine learning algorithms,using the recipe of feature extraction as a bridge.Compared with the state-of-art split-read methods for deletion detection in both low and high sequence coverage,the machine-learning-aided strategy shows great ability in intelligently balancing sensitivity and false discovery rate and getting a both more sensitive and more precise call set at single-base-pair resolution.Thus,users do not need to rely on former experience to make an unnecessary trade-off beforehand and adjust parameters over and over again any more.It should be noted that modern machine learning models can play an important role in the field of structural variation prediction.展开更多
AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a...AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.展开更多
Causality,the science of cause and effect,has made it possible to create a new family of models.Such models are often referred to as causal models.Unlike those of mathematical,numerical,empirical,or machine learning(M...Causality,the science of cause and effect,has made it possible to create a new family of models.Such models are often referred to as causal models.Unlike those of mathematical,numerical,empirical,or machine learning(ML)nature,causal models hope to tie the cause(s)to the effect(s)pertaining to a phenomenon(i.e.,data generating process)through causal principles.This paper presents one of the first works at creating causal models in the area of structural and construction engineering.To this end,this paper starts with a brief review of the principles of causality and then adopts four causal discovery algorithms,namely,PC(Peter-Clark),FCI(fast causal inference),GES(greedy equivalence search),and GRa SP(greedy relaxation of the sparsest permutation),have been used to examine four phenomena,including predicting the load-bearing capacity of axially loaded members,fire resistance of structural members,shear strength of beams,and resistance of walls against impulsive(blast)loading.Findings from this study reveal the possibility and merit of discovering complete and partial causal models.Finally,this study also proposes two simple metrics that can help assess the performance of causal discovery algorithms.展开更多
The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to th...The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.展开更多
This paper describes an equation discovery approach based on machine learning using LAGRAMGE as an equation discovery tool, with two sources of input, a dataset and model presented in context-free grammar. The approac...This paper describes an equation discovery approach based on machine learning using LAGRAMGE as an equation discovery tool, with two sources of input, a dataset and model presented in context-free grammar. The approach is searching a large range of po- tential equations by a specific inodel. The parameters of the equation are fitted to find the best equations. The experiments are illustratedwith commodity prices from the London Metal Exchange for the period of January-October 2009. The outputs of the experiments are a large mumber of equations; some of the equations display that the predicted prices are following the market trends in perfect patterns.展开更多
入侵检测作为一种网络主动防御技术,能够有效阻止来自黑客的多种手段攻击。随着机器学习的发展,相关技术也开始应用到入侵检测中。本文采用sklearn库中preprocessing模块的函数对KDD CUP 99数据集进行预处理,基于朴素贝叶斯和逻辑回归算...入侵检测作为一种网络主动防御技术,能够有效阻止来自黑客的多种手段攻击。随着机器学习的发展,相关技术也开始应用到入侵检测中。本文采用sklearn库中preprocessing模块的函数对KDD CUP 99数据集进行预处理,基于朴素贝叶斯和逻辑回归算法,建立了网络入侵检测模型,并利用信息增益算法对入侵相关特征进行选择,然后进行训练与预测。实验结果表明,选择特征子集进行训练和预测能够保证预测准确率并大幅提高检测效率。研究成果可为高速铁路信号系统网络入侵检测模型的设计和建立提供参考。展开更多
基金Shenzhen-Hong Kong-Macao Technology Research Programme(Type C,202011033000145)Shenzhen Excellent Science and Technology Innovation Talent Training Project-Outstanding Youth Project(RCJC20200714114435061)Functional Materials Interfaces Genome(FIG)project.
文摘Technological advancements in recent decades have greatly transformed the field of material chemistry.Juxtaposing the accentuating energy demand with the pollution associated,urgent measures are required to ensure energy maximization,while reducing the extended experimental time cycle involved in energy production.In lieu of this,the prominence of catalysts in chemical reactions,particularly energy related reactions cannot be undermined,and thus it is critical to discover and design catalyst,towards the optimization of chemical processes and generation of sustainable energy.Most recently,artificial intelligence(AI)has been incorporated into several fields,particularly in advancing catalytic processes.The integration of intensive data set,machine learning models and robotics,provides a very powerful tool in modifying material synthesis and optimization by generating multifarious dataset amenable with machine learning techniques.The employment of robots automates the process of dataset and machine learning models integration in screening intermetallic surfaces of catalyst,with extreme accuracy and swiftness comparable to a number of human researchers.Although,the utilization of robots in catalyst discovery is still in its infancy,in this review we summarize current sway of artificial intelligence in catalyst discovery,briefly describe the application of databases,machine learning models and robots in this field,with emphasis on the consolidation of these monomeric units into a tripartite flow process.We point out current trends of machine learning and hybrid models of first principle calculations(DFT)for generating dataset,which is integrable into autonomous flow process of catalyst discovery.Also,we discuss catalyst discovery for renewable energy related reactions using this tripartite flow process with predetermined descriptors.
文摘The discovery of novel materials with desired properties is essential to the advancements of energy-related technologies.Despite the rapid development of computational infrastructures and theoretical approaches,progress so far has been limited by the empirical and serial nature of experimental work.Fortunately,the situation is changing thanks to the maturation of theoretical tools such as density functional theory,high-throughput screening,crystal structure prediction,and emerging approaches based on machine learning.Together these recent innovations in computational chemistry,data informatics,and machine learning have acted as catalysts for revolutionizing material design and hopefully will lead to faster kinetics in the development of energy-related industries.In this report,recent advances in material discovery methods are reviewed for energy devices.Three paradigms based on empiricism-driven experiments,database-driven high-throughput screening,and data informatics-driven machine learning are discussed critically.Key methodological advancements involved are reviewed including high-throughput screening,crystal structure prediction,and generative models for target material design.Their applications in energy-related devices such as batteries,catalysts,and photovoltaics are selectively showcased.
文摘Discovery of useful forecasting rules from observational weather data is an outstanding interesting topic.The traditional methods of acquiring forecasting knowledge are manual analysis and investigation performed by human scientists.This paper presents the experimental results of an automatic machine learning system which derives forecasting rules from real observational data.We tested the system on the two large real data sets from the areas of centra! China and Victoria of Australia.The experimental results show that the forecasting rules discovered by the system are very competitive to human experts.The forecasting accuracy rates are 86.4% and 78% of the two data sets respectively
文摘This research examines industry-based dissertation research in a doctoralcomputing program through the lens of machine learning algorithms todetermine if natural language processing-based categorization on abstractsalone is adequate for classification. This research categorizes dissertationby both their abstracts and by their full-text using the GraphLabCreate library from Apple’s Turi to identify if abstract analysis is anadequate measure of content categorization, which we found was not. Wealso compare the dissertation categorizations using IBM’s Watson Discoverydeep machine learning tool. Our research provides perspectiveson the practicality of the manual classification of technical documents;and, it provides insights into the: (1) categories of academic work createdby experienced fulltime working professionals in a Computing doctoralprogram, (2) viability and performance of automated categorization of theabstract analysis against the fulltext dissertation analysis, and (3) natuallanguage processing versus human manual text classification abstraction.
基金funded by theNationalNatural Science Foundation of China(52061020)Major Science and Technology Projects in Yunnan Province(202302AG050009)Yunnan Fundamental Research Projects(202301AV070003).
文摘Finding materials with specific properties is a hot topic in materials science.Traditional materials design relies on empirical and trial-and-error methods,requiring extensive experiments and time,resulting in high costs.With the development of physics,statistics,computer science,and other fields,machine learning offers opportunities for systematically discovering new materials.Especially through machine learning-based inverse design,machine learning algorithms analyze the mapping relationships between materials and their properties to find materials with desired properties.This paper first outlines the basic concepts of materials inverse design and the challenges faced by machine learning-based approaches to materials inverse design.Then,three main inverse design methods—exploration-based,model-based,and optimization-based—are analyzed in the context of different application scenarios.Finally,the applications of inverse design methods in alloys,optical materials,and acoustic materials are elaborated on,and the prospects for materials inverse design are discussed.The authors hope to accelerate the discovery of new materials and provide new possibilities for advancing materials science and innovative design methods.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.U1630134,51622207 and 51372228)the National Key Research and Development Program of China(Grant Nos.2017YFB0701600 and 2017YFB0701500)+2 种基金the Shanghai Institute of Materials Genome from the Shanghai Municipal Science and Technology Commission(Grant No.14DZ2261200)the Shanghai Municipal Education Commission(Grant No.14ZZ099)the Natural Science Foundation of Shanghai(Grant No.16ZR1411200).
文摘The screening of novel materials with good performance and the modelling of quantitative structureactivity relationships(QSARs),among other issues,are hot topics in the field of materials science.Traditional experiments and computational modelling often consume tremendous time and resources and are limited by their experimental conditions and theoretical foundations.Thus,it is imperative to develop a new method of accelerating the discovery and design process for novel materials.Recently,materials discovery and design using machine learning have been receiving increasing attention and have achieved great improvements in both time efficiency and prediction accuracy.In this review,we first outline the typical mode of and basic procedures for applying machine learning in materials science,and we classify and compare the main algorithms.Then,the current research status is reviewed with regard to applications of machine learning in material property prediction,in new materials discovery and for other purposes.Finally,we discuss problems related to machine learning in materials science,propose possible solutions,and forecast potential directions of future research.By directly combining computational studies with experiments,we hope to provide insight into the parameters that affect the properties of materials,thereby enabling more efficient and target-oriented research on materials discovery and design.
基金This work was financially supported by the National Natural Science Foundation of China(No.51627802)。
文摘The discovery of new materials is one of the driving forces to promote the development of modern society and technology innovation,the traditional materials research mainly depended on the trial-and-error method,which is time-consuming and laborious.Recently,machine learning(ML)methods have made great progress in the researches of materials science with the arrival of the big-data era,which gives a deep revolution in human society and advance science greatly.However,there exist few systematic generalization and summaries about the applications of ML methods in materials science.In this review,we first provide a brief account of the progress of researches on materials science with ML employed,the main ideas and basic procedures of this method are emphatically introduced.Then the algorithms of ML which were frequently used in the researches of materials science are classified and compared.Finally,the recent meaningful applications of ML in metal materials,battery materials,photovoltaic materials and metallic glass are reviewed.
文摘The current rise of artificial intelligence and machine learning has been significant.It has reduced the human workload improved quality of life significantly.This article describes the use of artificial intelligence and machine learning to augment drug discovery and development to make them more efficient and accurate.In this study,a systematic evaluation of studies was carried out;these were selected based on prior knowledge of the authors and a keyword search in publicly available databases which were filtered based on related context,abstract,methodology,and full text.This body of work supported the roles of machine learning and artificial intelligence in facilitating drug development and discovery processes,making them more cost-effective or altogether eliminating the need for clinical trials,owing to the ability to conduct simulations using these technologies.They also enabled researchers to study different molecules more extensively,without any trials.The results of this paper demonstrate the prevalent application of machine learning and artificial intelligence methods in drug discovery,and indicate a promising future for these technologies;these results should enable researchers,students,and pharmaceutical industry to dive deeper into machine learning and artificial intelligence in a drug discovery and development context.
基金financially supported by the National Natural Science Foundation of China (Nos. 61971208, 61671225 and 51864027)the Yunnan Applied Basic Research Projects (No. 2018FA034)+2 种基金the Yunnan Reserve Talents of Young and Middleaged Academic and Technical Leaders (Shen Tao, 2018)the Yunnan Young Top Talents of Ten Thousands Plan (Shen Tao, Zhu Yan, Yunren Social Development No. 2018 73)the Scientific Research Foundation of Kunming University of Science and Technology (No. KKSY201703016)。
文摘Discovering new materials with excellent performance is a hot issue in the materials genome initiative.Traditional experiments and calculations often waste large amounts of time and money and are also limited by various conditions. Therefore, it is imperative to develop a new method to accelerate the discovery and design of new materials. In recent years, material discovery and design methods using machine learning have attracted much attention from material experts and have made some progress. This review first outlines available materials database and material data analytics tools and then elaborates on the machine learning algorithms used in materials science. Next, the field of application of machine learning in materials science is summarized, focusing on the aspects of structure determination, performance prediction, fingerprint prediction, and new material discovery. Finally, the review points out the problems of data and machine learning in materials science and points to future research. Using machine learning algorithms, the authors hope to achieve amazing results in material discovery and design.
基金Supported by the Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX23_0082)
文摘Breast cancer is presently one of the most common malignancies worldwide,with a higher fatality rate.In this study,a quantitative structure-activity relationship(QSAR)model of compound biological activity and ADMET(Absorption,Distribution,Metabolism,Excretion,Toxicity)properties prediction model were performed using estrogen receptor alpha(ERα)antagonist information collected from compound samples.We first utilized grey relation analysis(GRA)in conjunction with the random forest(RF)algorithm to identify the top 20 molecular descriptor variables that have the greatest influence on biological activity,and then we used Spearman correlation analysis to identify 16 independent variables.Second,a QSAR model of the compound were developed based on BP neural network(BPNN),genetic algorithm optimized BP neural network(GA-BPNN),and support vector regression(SVR).The BPNN,the SVR,and the logistic regression(LR)models were then used to identify and predict the ADMET properties of substances,with the prediction impacts of each model compared and assessed.The results reveal that a SVR model was used in QSAR quantitative prediction,and in the classification prediction of ADMET properties:the SVR model predicts the Caco-2 and hERG(human Ether-a-go-go Related Gene)properties,the LR model predicts the cytochrome P450 enzyme 3A4 subtype(CYP3A4)and Micronucleus(MN)properties,and the BPNN model predicts the Human Oral Bioavailability(HOB)properties.Finally,information entropy theory is used to validate the rationality of variable screening,and sensitivity analysis of the model demonstrates that the constructed model has high accuracy and stability,which can be used as a reference for screening probable active compounds and drug discovery.
基金This work was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region,China(Project no.15222018).
文摘The screening of advanced materials coupled with the modeling of their quantitative structural-activity relation-ships has recently become one of the hot and trending topics in energy materials due to the diverse challenges,including low success probabilities,high time consumption,and high computational cost associated with the traditional methods of developing energy materials.Following this,new research concepts and technologies to promote the research and development of energy materials become necessary.The latest advancements in ar-tificial intelligence and machine learning have therefore increased the expectation that data-driven materials science would revolutionize scientific discoveries towards providing new paradigms for the development of en-ergy materials.Furthermore,the current advances in data-driven materials engineering also demonstrate that the application of machine learning technology would not only significantly facilitate the design and development of advanced energy materials but also enhance their discovery and deployment.In this article,the importance and necessity of developing new energy materials towards contributing to the global carbon neutrality are presented.A comprehensive introduction to the fundamentals of machine learning is also provided,including open-source databases,feature engineering,machine learning algorithms,and analysis of machine learning model.Afterwards,the latest progress in data-driven materials science and engineering,including alkaline ion battery materials,pho-tovoltaic materials,catalytic materials,and carbon dioxide capture materials,is discussed.Finally,relevant clues to the successful applications of machine learning and the remaining challenges towards the development of advanced energy materials are highlighted.
基金Project(61472026)supported by the National Natural Science Foundation of ChinaProject(2014J410081)supported by Guangzhou Scientific Research Program,China
文摘When detecting deletions in complex human genomes,split-read approaches using short reads generated with next-generation sequencing still face the challenge that either false discovery rate is high,or sensitivity is low.To address the problem,an integrated strategy is proposed.It organically combines the fundamental theories of the three mainstream methods(read-pair approaches,split-read technologies and read-depth analysis) with modern machine learning algorithms,using the recipe of feature extraction as a bridge.Compared with the state-of-art split-read methods for deletion detection in both low and high sequence coverage,the machine-learning-aided strategy shows great ability in intelligently balancing sensitivity and false discovery rate and getting a both more sensitive and more precise call set at single-base-pair resolution.Thus,users do not need to rely on former experience to make an unnecessary trade-off beforehand and adjust parameters over and over again any more.It should be noted that modern machine learning models can play an important role in the field of structural variation prediction.
基金Supported by National Institute of General Medical Sciences of the National Institutes of Health,No.R01GM100387
文摘AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.
文摘Causality,the science of cause and effect,has made it possible to create a new family of models.Such models are often referred to as causal models.Unlike those of mathematical,numerical,empirical,or machine learning(ML)nature,causal models hope to tie the cause(s)to the effect(s)pertaining to a phenomenon(i.e.,data generating process)through causal principles.This paper presents one of the first works at creating causal models in the area of structural and construction engineering.To this end,this paper starts with a brief review of the principles of causality and then adopts four causal discovery algorithms,namely,PC(Peter-Clark),FCI(fast causal inference),GES(greedy equivalence search),and GRa SP(greedy relaxation of the sparsest permutation),have been used to examine four phenomena,including predicting the load-bearing capacity of axially loaded members,fire resistance of structural members,shear strength of beams,and resistance of walls against impulsive(blast)loading.Findings from this study reveal the possibility and merit of discovering complete and partial causal models.Finally,this study also proposes two simple metrics that can help assess the performance of causal discovery algorithms.
文摘The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.
文摘This paper describes an equation discovery approach based on machine learning using LAGRAMGE as an equation discovery tool, with two sources of input, a dataset and model presented in context-free grammar. The approach is searching a large range of po- tential equations by a specific inodel. The parameters of the equation are fitted to find the best equations. The experiments are illustratedwith commodity prices from the London Metal Exchange for the period of January-October 2009. The outputs of the experiments are a large mumber of equations; some of the equations display that the predicted prices are following the market trends in perfect patterns.
文摘入侵检测作为一种网络主动防御技术,能够有效阻止来自黑客的多种手段攻击。随着机器学习的发展,相关技术也开始应用到入侵检测中。本文采用sklearn库中preprocessing模块的函数对KDD CUP 99数据集进行预处理,基于朴素贝叶斯和逻辑回归算法,建立了网络入侵检测模型,并利用信息增益算法对入侵相关特征进行选择,然后进行训练与预测。实验结果表明,选择特征子集进行训练和预测能够保证预测准确率并大幅提高检测效率。研究成果可为高速铁路信号系统网络入侵检测模型的设计和建立提供参考。