Machine learning(ML)has powerful nonlinear processing and multivariate learning capabilities,so it has been widely utilised in the fatigue field.However,most ML methods are inexplicable black-box models that are diffi...Machine learning(ML)has powerful nonlinear processing and multivariate learning capabilities,so it has been widely utilised in the fatigue field.However,most ML methods are inexplicable black-box models that are difficult to apply in engineering practice.Symbolic regression(SR)is an interpretable machine learning method for determining the optimal fitting equation for datasets.In this study,domain knowledge-guided SR was used to determine a new fatigue crack growth(FCG)rate model.Three terms of the variable subtree ofΔK,R-ratio,andΔK_(th)were obtained by analysing eight traditional semi-empirical FCG rate models.Based on the FCG rate test data from other literature,the SR model was constructed using Al-7055-T7511.It was subsequently extended to other alloys(Ti-10V-2Fe-3Al,Ti-6Al-4V,Cr-Mo-V,LC9cs,Al-6013-T651,and Al-2324-T3)using multiple linear regression.Compared with the three semi-empirical FCG rate models,the SR model yielded higher prediction accuracy.This result demonstrates the potential of domain knowledge-guided SR for building the FCG rate model.展开更多
In recent years,machine-learning methods have profoundly impacted research in the interdisciplinary fields of physics.However,most machine-learning models lack interpretability,and physicists doubt the credibility of ...In recent years,machine-learning methods have profoundly impacted research in the interdisciplinary fields of physics.However,most machine-learning models lack interpretability,and physicists doubt the credibility of their conclusions because they cannot be combined with prior physical knowledge.Therefore,this review focuses on symbolic regression,which is an interpretable machine-learning method.First,the relevant concepts of machine learning are introduced in conjunction with induction.Next,we provide an overview of symbolic regression methods.Subsequently,the recent directions for the application of symbolic regression methods in different subfields of physics are outlined,and an overview of the ways in which the applications of symbolic regression have evolved in the realm of physics is provided.The major aim of this review is to introduce the basic principles of symbolic regression and explain its applications in the field of physics.展开更多
Symbolic regression(SR),exploring mathematical expressions from a given data set to construct an interpretable model,emerges as a powerful computational technique with the potential to transform the“black box”machin...Symbolic regression(SR),exploring mathematical expressions from a given data set to construct an interpretable model,emerges as a powerful computational technique with the potential to transform the“black box”machining learning methods into physical and chemistry interpretable expressions in material science research.In this review,the current advancements in SR are investigated,focusing on the underlying theories,fundamental flowcharts,various techniques,implemented codes,and application fields.More predominantly,the challenging issues and future opportunities in SR that should be overcome to unlock the full potential of SR in material design and research,including graphics processing unit accelera-tion and transfer learning algorithms,the trade-off between expression accuracy and complexity,physical or chemistry interpretable SR with generative large language models,and multimodal SR methods,are discussed.展开更多
A new point-tree data structure genetic programming (PTGP) method is proposed. For the discontinuous function regression problem, the proposed method is able to identify both the function structure and discontinuities...A new point-tree data structure genetic programming (PTGP) method is proposed. For the discontinuous function regression problem, the proposed method is able to identify both the function structure and discontinuities points simultaneously. It is also easy to be used to solve the continuous function's regression problems. The numerical experiment results demonstrate that the point-tree GP is an efficient alternative way to the complex function identification problems.展开更多
Rational design of high-performance electrocatalysts for hydrogen evolution reaction(HER)is vital for future renewable energy systems.The incorporation of foreign metal ions into catalysts can be an effective approach...Rational design of high-performance electrocatalysts for hydrogen evolution reaction(HER)is vital for future renewable energy systems.The incorporation of foreign metal ions into catalysts can be an effective approach to optimize its performance.However,there is a lack of systematic theoretical studies to reveal the quantitative relationships at the electronic level.Here,we develop a multi-level screening methodology to search for highly stable and active dopants for CoP catalysts.The density functional theory(DFT)calculations and symbolic regression(SR)were performed to investigate the relationship between the adsorption free energy(ΔG_(H^(*)))and 10 electronic parameters.The mathematic formulas derived from SR indicate that the difference of work function(ΔΦ)between doped metal and the acceptor plays the most important role in regulatingΔG_(H^(*)),followed by the d-band center(d-BC)of doped system.The descriptor of HER can be expressed asΔG_(H^(*))=1.59×√|0.188ΔΦ+d BC+0.120|1/2-0.166 with a high determination coefficient(R^(2)=0.807).Consistent with the theoretical prediction,experimental results show that the Al-CoP delivers superior electrocatalytic HER activity with a low overpotential of75 m V to drive a current density of 10 mA cm^(-2),while the overpotentials for undoped CoP,Mo-CoP,and V-CoP are 206,134,and 83 m V,respectively.The current work proves that theΔΦis the most significant regulatory parameter ofΔG_(H^(*))for ion-doped electrocatalysts.This finding can drive the discovery of high-performance ion-doped electrocatalysts,which is crucial for electrocatalytic water splitting.展开更多
There is growing interest in applying machine learning techniques in the field of materials science.However,the interpretation and knowledge extracted from machine learning models is a major concern,particularly as fo...There is growing interest in applying machine learning techniques in the field of materials science.However,the interpretation and knowledge extracted from machine learning models is a major concern,particularly as formulating an explicit model that provides insight into physics is the goal of learning.In the present study,we propose a framework that utilizes the filtering ability of feature engineering,in conjunction with symbolic regression to extract explicit,quantitative expressions for the band gap energy from materials data.We propose enhancements to genetic programming with dimensional consistency and artificial constraints to improve the search efficiency of symbolic regression.We show how two descriptors attributed to volumetric and electronic factors,from 32 possible candidates,explicitly express the band gap energy of Na Cl-type compounds.Our approach provides a basis to capture underlying physical relationships between materials descriptors and target properties.展开更多
Bulk modulus is an important mechanical property in the optimal design and selection of intermetallic compounds.In this study,bulk modulus datasets of intermetallic compounds were collected,and the features affecting ...Bulk modulus is an important mechanical property in the optimal design and selection of intermetallic compounds.In this study,bulk modulus datasets of intermetallic compounds were collected,and the features affecting the bulk modulus of intermetallics were screened via feature engineering.Three features B_(cal),dB_(avg),and TIE(corresponding to calculated bulk modulus,mean bulk modulus,and third ionization energy,respectively)were found to be the dominant factors influencing bulk modulus and can be extended to other multi-component alloys.Particularly,we predicted the bulk modulus with an accuracy of 95%using surrogate machine learning models with the selected features,and these features were also demonstrated to be effective for high-entropy alloys.Moreover,symbolic regression provided an expression for the relationship between bulk modulus and the screened features.The machine learning models provide a new approach for optimizing and predicting the bulk moduli of intermetallic compounds.展开更多
High toughness is highly desired for low-alloy steel in engineering structure applications,wherein Charpy impact toughness(CIT)is a critical factor determining the toughness performance.In the current work,CIT data of...High toughness is highly desired for low-alloy steel in engineering structure applications,wherein Charpy impact toughness(CIT)is a critical factor determining the toughness performance.In the current work,CIT data of low-alloy steel were collected,and then CIT prediction models based on machine learning(ML)algorithms were established.Three feature construction strategies were proposed.One is solely based on alloy composition,another is based on alloy composition and heat treatment parameters,and the last one is based on alloy composition,heat treatment parameters,and physical features.A series of ML methods were used to effectively select models and material descriptors from a large number of al-ternatives.Compared with the strategy solely based on the alloy composition,the strategy based on alloy composition,heat treatment parameters together with physical features perform much better.Finally,a genetic programming(GP)based symbolic regression(SR)approach was developed to establish a physical meaningful formula between the selected features and targeted CIT data.展开更多
Knowledge of the mechanical properties of structural materials is essential for their practical applications. In the present work,three-hundred and sixty data samples on four mechanical properties of steels—fatigue s...Knowledge of the mechanical properties of structural materials is essential for their practical applications. In the present work,three-hundred and sixty data samples on four mechanical properties of steels—fatigue strength, tensile strength, fracture strength and hardness—were selected from the Japan National Institute of Material Science database, comprising data on carbon steels and low-alloy steels. Five machine learning algorithms were used to predict the mechanical properties of the materials represented by the three-hundred and sixty data samples, and random forest regression showed the best predictive performance.Feature selection conducted by random forest and symbolic regressions revealed the four most important features that most influence the mechanical properties of steels: the tempering temperature of steel, and the alloying elements of carbon, chromium and molybdenum. Mathematical expressions were generated via symbolic regression, and the expressions explicitly predicted how each of the four mechanical properties varied quantitatively with the four most important features. This study demonstrates the great potential of symbolic regression in the discovery of novel advanced materials.展开更多
Continuous cooling transformation diagrams in synthetic weld heat-affected zone(SH-CCT diagrams)show the phase transition temperature and hardness at different cooling rates,which is an important basis for formulating...Continuous cooling transformation diagrams in synthetic weld heat-affected zone(SH-CCT diagrams)show the phase transition temperature and hardness at different cooling rates,which is an important basis for formulating the welding process or predicting the performance of welding heat-affected zone.However,the experimental determination of SH-CCT diagrams is a time-consuming and costly process,which does not conform to the development trend of new materials.In addition,the prediction of SHCCT diagrams using metallurgical models remains a challenge due to the complexity of alloying elements and welding processes.So,in this study,a hybrid machine learning model consisting of multilayer perceptron classifier,k-Nearest Neighbors and random forest is established to predict the phase transformation temperature and hardness of low alloy steel using chemical composition and cooling rate.Then the SH-CCT diagrams of 6 kinds of steels are calculated by the hybrid machine learning model.The results show that the accuracy of the classification model is up to 100%,the predicted values of the regression models are in good agreement with the experimental results,with high correlation coefficient and low error value.Moreover,the mathematical expressions of hardness in welding heat-affected zone of low alloy steel are calculated by symbolic regression,which can quantitatively express the relationship between alloy composition,cooling time and hardness.This study demonstrates the great potential of the material informatics in the field of welding technology.展开更多
基金Supported by Sichuan Provincial Science and Technology Program(Grant No.2022YFH0075)Opening Project of State Key Laboratory of Performance Monitoring and Protecting of Rail Transit Infrastructure(Grant No.HJGZ2021113)Independent Research Project of State Key Laboratory of Traction Power(Grant No.2022TPL_T03).
文摘Machine learning(ML)has powerful nonlinear processing and multivariate learning capabilities,so it has been widely utilised in the fatigue field.However,most ML methods are inexplicable black-box models that are difficult to apply in engineering practice.Symbolic regression(SR)is an interpretable machine learning method for determining the optimal fitting equation for datasets.In this study,domain knowledge-guided SR was used to determine a new fatigue crack growth(FCG)rate model.Three terms of the variable subtree ofΔK,R-ratio,andΔK_(th)were obtained by analysing eight traditional semi-empirical FCG rate models.Based on the FCG rate test data from other literature,the SR model was constructed using Al-7055-T7511.It was subsequently extended to other alloys(Ti-10V-2Fe-3Al,Ti-6Al-4V,Cr-Mo-V,LC9cs,Al-6013-T651,and Al-2324-T3)using multiple linear regression.Compared with the three semi-empirical FCG rate models,the SR model yielded higher prediction accuracy.This result demonstrates the potential of domain knowledge-guided SR for building the FCG rate model.
基金support of the College of Energy,Soochow Institute for Energy and Materials Innovations(SIEMIS)Jiangsu Provincial Key Laboratory for Advanced Carbon Materials and Wearable Energy Technologies of Soochow University+1 种基金Shanghai Qi Zhi InstituteLight Industry Institute of Electrochemical Power Sources of Soochow University。
文摘In recent years,machine-learning methods have profoundly impacted research in the interdisciplinary fields of physics.However,most machine-learning models lack interpretability,and physicists doubt the credibility of their conclusions because they cannot be combined with prior physical knowledge.Therefore,this review focuses on symbolic regression,which is an interpretable machine-learning method.First,the relevant concepts of machine learning are introduced in conjunction with induction.Next,we provide an overview of symbolic regression methods.Subsequently,the recent directions for the application of symbolic regression methods in different subfields of physics are outlined,and an overview of the ways in which the applications of symbolic regression have evolved in the realm of physics is provided.The major aim of this review is to introduce the basic principles of symbolic regression and explain its applications in the field of physics.
基金National Natural Science Foundation of China,Grant/Award Number:52332005National Key Research and Development Program of China,Grant/Award Number:2022YFB3807200China Postdoctoral Science Foundation,Grant/Award Number:2022TQ0019。
文摘Symbolic regression(SR),exploring mathematical expressions from a given data set to construct an interpretable model,emerges as a powerful computational technique with the potential to transform the“black box”machining learning methods into physical and chemistry interpretable expressions in material science research.In this review,the current advancements in SR are investigated,focusing on the underlying theories,fundamental flowcharts,various techniques,implemented codes,and application fields.More predominantly,the challenging issues and future opportunities in SR that should be overcome to unlock the full potential of SR in material design and research,including graphics processing unit accelera-tion and transfer learning algorithms,the trade-off between expression accuracy and complexity,physical or chemistry interpretable SR with generative large language models,and multimodal SR methods,are discussed.
基金Supported by the National Natural Science Foundation(60173046)and the Natural Science Foundation of Province(2002AB040)
文摘A new point-tree data structure genetic programming (PTGP) method is proposed. For the discontinuous function regression problem, the proposed method is able to identify both the function structure and discontinuities points simultaneously. It is also easy to be used to solve the continuous function's regression problems. The numerical experiment results demonstrate that the point-tree GP is an efficient alternative way to the complex function identification problems.
基金Financial support from the National Natural Science Foundation of China(21676216)the Special project of Shaanxi Provincial Education Department(20JC034)+1 种基金GHfund B(202202022563)Hefei Advanced Computing Center。
文摘Rational design of high-performance electrocatalysts for hydrogen evolution reaction(HER)is vital for future renewable energy systems.The incorporation of foreign metal ions into catalysts can be an effective approach to optimize its performance.However,there is a lack of systematic theoretical studies to reveal the quantitative relationships at the electronic level.Here,we develop a multi-level screening methodology to search for highly stable and active dopants for CoP catalysts.The density functional theory(DFT)calculations and symbolic regression(SR)were performed to investigate the relationship between the adsorption free energy(ΔG_(H^(*)))and 10 electronic parameters.The mathematic formulas derived from SR indicate that the difference of work function(ΔΦ)between doped metal and the acceptor plays the most important role in regulatingΔG_(H^(*)),followed by the d-band center(d-BC)of doped system.The descriptor of HER can be expressed asΔG_(H^(*))=1.59×√|0.188ΔΦ+d BC+0.120|1/2-0.166 with a high determination coefficient(R^(2)=0.807).Consistent with the theoretical prediction,experimental results show that the Al-CoP delivers superior electrocatalytic HER activity with a low overpotential of75 m V to drive a current density of 10 mA cm^(-2),while the overpotentials for undoped CoP,Mo-CoP,and V-CoP are 206,134,and 83 m V,respectively.The current work proves that theΔΦis the most significant regulatory parameter ofΔG_(H^(*))for ion-doped electrocatalysts.This finding can drive the discovery of high-performance ion-doped electrocatalysts,which is crucial for electrocatalytic water splitting.
基金financially supported by the National Key Research and Development Program of China(No.2016YFB0700500)the Guangdong Province Key Area R&D Program(No.2019B010940001)。
文摘There is growing interest in applying machine learning techniques in the field of materials science.However,the interpretation and knowledge extracted from machine learning models is a major concern,particularly as formulating an explicit model that provides insight into physics is the goal of learning.In the present study,we propose a framework that utilizes the filtering ability of feature engineering,in conjunction with symbolic regression to extract explicit,quantitative expressions for the band gap energy from materials data.We propose enhancements to genetic programming with dimensional consistency and artificial constraints to improve the search efficiency of symbolic regression.We show how two descriptors attributed to volumetric and electronic factors,from 32 possible candidates,explicitly express the band gap energy of Na Cl-type compounds.Our approach provides a basis to capture underlying physical relationships between materials descriptors and target properties.
基金financially supported by the National Natural Science Foundation of China(Nos.52122408,52071023,51901069 and 51901013)the Program for Science&Technology Innovation Talents in the University of Henan Province(No.22HASTIT1006)+3 种基金the Program for Central Plains Talents(No.ZYYCYU202012172)the Ministry of Education,Singapore(No.RG70/20)the PolyU Grant(No.1-W196)the Opening Project of National Joint Engineering Research Center for Abrasion Control and Molding of Metal Materials,Henan University of Science and Technology(No.HKDNM201906)。
文摘Bulk modulus is an important mechanical property in the optimal design and selection of intermetallic compounds.In this study,bulk modulus datasets of intermetallic compounds were collected,and the features affecting the bulk modulus of intermetallics were screened via feature engineering.Three features B_(cal),dB_(avg),and TIE(corresponding to calculated bulk modulus,mean bulk modulus,and third ionization energy,respectively)were found to be the dominant factors influencing bulk modulus and can be extended to other multi-component alloys.Particularly,we predicted the bulk modulus with an accuracy of 95%using surrogate machine learning models with the selected features,and these features were also demonstrated to be effective for high-entropy alloys.Moreover,symbolic regression provided an expression for the relationship between bulk modulus and the screened features.The machine learning models provide a new approach for optimizing and predicting the bulk moduli of intermetallic compounds.
基金supported by the National Natural Science Foundation of China(Nos.52122408,52071023,52071038,51901013)financial support from the Fun-damental Research Funds for the Central Universities(University of Science and Technology Beijing)(Nos.FRF-TP-2021-04C1 and 06500135).
文摘High toughness is highly desired for low-alloy steel in engineering structure applications,wherein Charpy impact toughness(CIT)is a critical factor determining the toughness performance.In the current work,CIT data of low-alloy steel were collected,and then CIT prediction models based on machine learning(ML)algorithms were established.Three feature construction strategies were proposed.One is solely based on alloy composition,another is based on alloy composition and heat treatment parameters,and the last one is based on alloy composition,heat treatment parameters,and physical features.A series of ML methods were used to effectively select models and material descriptors from a large number of al-ternatives.Compared with the strategy solely based on the alloy composition,the strategy based on alloy composition,heat treatment parameters together with physical features perform much better.Finally,a genetic programming(GP)based symbolic regression(SR)approach was developed to establish a physical meaningful formula between the selected features and targeted CIT data.
基金supported by the National Key Research and Development Program of China (Grant No. 2018YFB0704404)the Hong Kong Polytechnic University (Internal Grant Nos. 1-ZE8R and G-YBDH)the 111Project of the State Administration of Foreign Experts Affairs and the Ministry of Education,China (Grant No. D16002)。
文摘Knowledge of the mechanical properties of structural materials is essential for their practical applications. In the present work,three-hundred and sixty data samples on four mechanical properties of steels—fatigue strength, tensile strength, fracture strength and hardness—were selected from the Japan National Institute of Material Science database, comprising data on carbon steels and low-alloy steels. Five machine learning algorithms were used to predict the mechanical properties of the materials represented by the three-hundred and sixty data samples, and random forest regression showed the best predictive performance.Feature selection conducted by random forest and symbolic regressions revealed the four most important features that most influence the mechanical properties of steels: the tempering temperature of steel, and the alloying elements of carbon, chromium and molybdenum. Mathematical expressions were generated via symbolic regression, and the expressions explicitly predicted how each of the four mechanical properties varied quantitatively with the four most important features. This study demonstrates the great potential of symbolic regression in the discovery of novel advanced materials.
基金financial support from the National Key Research and Development Program of China[No.2016YFB0700501]the National Natural Science Foundation of China(No.51571020)。
文摘Continuous cooling transformation diagrams in synthetic weld heat-affected zone(SH-CCT diagrams)show the phase transition temperature and hardness at different cooling rates,which is an important basis for formulating the welding process or predicting the performance of welding heat-affected zone.However,the experimental determination of SH-CCT diagrams is a time-consuming and costly process,which does not conform to the development trend of new materials.In addition,the prediction of SHCCT diagrams using metallurgical models remains a challenge due to the complexity of alloying elements and welding processes.So,in this study,a hybrid machine learning model consisting of multilayer perceptron classifier,k-Nearest Neighbors and random forest is established to predict the phase transformation temperature and hardness of low alloy steel using chemical composition and cooling rate.Then the SH-CCT diagrams of 6 kinds of steels are calculated by the hybrid machine learning model.The results show that the accuracy of the classification model is up to 100%,the predicted values of the regression models are in good agreement with the experimental results,with high correlation coefficient and low error value.Moreover,the mathematical expressions of hardness in welding heat-affected zone of low alloy steel are calculated by symbolic regression,which can quantitatively express the relationship between alloy composition,cooling time and hardness.This study demonstrates the great potential of the material informatics in the field of welding technology.