The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance base...The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance based on data and data science methods, we show the advantages and potential of material informatics to study material issues which are too complicated or time consuming for conventional theoretical and experimental methods. Materials big data is the fundamental of material informatics. The challenges and strategy to construct materials big data are discussed, and some solutions are proposed as the results of our experiences to construct National Institute for Materials Science(NIMS) materials databases.展开更多
With the rapid development of artificial intelligence and machine learning(ML)methods,materials science is rapidly entering the era of data-driven materials informatics.ML models serve as the most crucial component,cl...With the rapid development of artificial intelligence and machine learning(ML)methods,materials science is rapidly entering the era of data-driven materials informatics.ML models serve as the most crucial component,closely bridging material structure and material properties.There is a considerable difference in the prediction performance of different ML methods for material systems.Herein,we evaluated three categories(linear,kernel,and nonlinear methods)of models,with twelve ML algorithms commonly used in the materials field.In addition,halide perovskite was chosen as an example to evaluate the fitting performance of different models.We constructed a total dataset of 540 halide perovskites and 72 features,with formation energy and bandgap as target properties.We found that different categories of ML models show similar trends for different target properties.Among them,the difference between the models is enormous for the formation energy,with the coefficient of determination(R2)range 0.69-0.953.The fitting performance between the models is closer for bandgap,with the R^(2)range 0.941-0.997.The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap.It shows that the nonlinear-ensemble model,constructed by combining multiple weak learners,effectively describes the nonlinear relationship between material features and target property.In addition,the extreme gradient boosting decision tree model shows the most superior results among all the models and searches for two new descriptors that are crucial for formation energy and bandgap.Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems.展开更多
MatCloud provides a high-throughput computational materials infrastructure for the integrated management of materials simulation, data, and computing resources. In comparison to AFLOW, Material Project, and NoMad, Mat...MatCloud provides a high-throughput computational materials infrastructure for the integrated management of materials simulation, data, and computing resources. In comparison to AFLOW, Material Project, and NoMad, MatCloud delivers two-fold functionalities: a computational materials platform where users can do on-line job setup, job submission and monitoring only via Web browser, and a materials properties simulation database. It is developed under Chinese Materials Genome Initiative and is a China own proprietary high-throughput computational materials infrastructure. MatCloud has been on line for about one year, receiving considerable registered users, feedbacks, and encouragements. Many users provided valuable input and requirements to MatCloud. In this paper, we describe the present MatCloud, future visions, and major challenges. Based on what we have achieved, we will endeavour to further develop MatCloud in an open and collaborative manner and make MatCloud a world known China-developed novel software in the pressing area of high-throughput materials calculations and materials properties simulation database within Material Genome Initiative.展开更多
Building processing,structure,and property(PSP)relations for computational materials design is at the heart of the Materials Genome Initiative in the era of high-throughput computational materials science.Recent techn...Building processing,structure,and property(PSP)relations for computational materials design is at the heart of the Materials Genome Initiative in the era of high-throughput computational materials science.Recent technological advancements in data acquisition and storage,microstructure characterization and reconstruction(MCR),machine learning(ML),materials modeling and simulation,data processing,manufacturing,and experimentation have significantly advanced researchers’abilities in building PSP relations and inverse material design.In this article,we examine these advancements from the perspective of design research.In particular,we introduce a data-centric approach whose fundamental aspects fall into three categories:design representation,design evaluation,and design synthesis.Developments in each of these aspects are guided by and benefit from domain knowledge.Hence,for each aspect,we present a wide range of computational methods whose integration realizes data-centric materials discovery and design.展开更多
As an implementation tool of data intensive scientific research methods,machine learning(ML)can effectively shorten the research and development(R&D)cycle of new materials by half or even more.ML shows great poten...As an implementation tool of data intensive scientific research methods,machine learning(ML)can effectively shorten the research and development(R&D)cycle of new materials by half or even more.ML shows great potential in the combination with other scientific research technologies,especially in the processing and classification of large amounts of material data from theoretical calculation and experimental characterization.It is very important to systematically understand the research ideas of material informatics to accelerate the exploration of new materials.Here,we provide a comprehensive introduction to the most commonly used ML modeling methods in material informatics with classic cases.Then,we review the latest progresses of prediction models,which focus on new processing–structure–properties–performance(PSPP)relationships in some popular material systems,such as perovskites,catalysts,alloys,two-dimensional materials,and polymers.In addition,we summarize the recent pioneering researches in innovation of material research technology,such as inverse design,ML interatomic potentials,and microtopography characterization assistance,as new research directions of material informatics.Finally,we comprehensively provide the most significant challenges and outlooks related to the future innovation and development in the field of material informatics.This review provides a critical and concise appraisal for the applications of material informatics,and a systematic and coherent guidance for material scientists to choose modeling methods based on required materials and technologies.展开更多
The manufacturing of nanomaterials by the electrospinning process requires accurate and meticulous inspection of related scanning electron microscope(SEM)images of the electrospun nanofiber,to ensure that no structura...The manufacturing of nanomaterials by the electrospinning process requires accurate and meticulous inspection of related scanning electron microscope(SEM)images of the electrospun nanofiber,to ensure that no structural defects are produced.The presence of anomalies prevents practical application of the electrospun nanofibrous material in nanotechnology.Hence,the automatic monitoring and quality control of nanomaterials is a relevant challenge in the context of Industry 4.0.In this paper,a novel automatic classification system for homogenous(anomaly-free)and non-homogenous(with defects)nanofibers is proposed.The inspection procedure aims at avoiding direct processing of the redundant full SEM image.Specifically,the image to be analyzed is first partitioned into subimages(nanopatches)that are then used as input to a hybrid unsupervised and supervised machine learning system.In the first step,an autoencoder(AE)is trained with unsupervised learning to generate a code representing the input image with a vector of relevant features.Next,a multilayer perceptron(MLP),trained with supervised learning,uses the extracted features to classify non-homogenous nanofiber(NH-NF)and homogenous nanofiber(H-NF)patches.The resulting novel AE-MLP system is shown to outperform other standard machine learning models and other recent state-of-the-art techniques,reporting accuracy rate up to92.5%.In addition,the proposed approach leads to model complexity reduction with respect to other deep learning strategies such as convolutional neural networks(CNN).The encouraging performance achieved in this benchmark study can stimulate the application of the proposed scheme in other challenging industrial manufacturing tasks.展开更多
Knowledge of the mechanical properties of structural materials is essential for their practical applications. In the present work,three-hundred and sixty data samples on four mechanical properties of steels—fatigue s...Knowledge of the mechanical properties of structural materials is essential for their practical applications. In the present work,three-hundred and sixty data samples on four mechanical properties of steels—fatigue strength, tensile strength, fracture strength and hardness—were selected from the Japan National Institute of Material Science database, comprising data on carbon steels and low-alloy steels. Five machine learning algorithms were used to predict the mechanical properties of the materials represented by the three-hundred and sixty data samples, and random forest regression showed the best predictive performance.Feature selection conducted by random forest and symbolic regressions revealed the four most important features that most influence the mechanical properties of steels: the tempering temperature of steel, and the alloying elements of carbon, chromium and molybdenum. Mathematical expressions were generated via symbolic regression, and the expressions explicitly predicted how each of the four mechanical properties varied quantitatively with the four most important features. This study demonstrates the great potential of symbolic regression in the discovery of novel advanced materials.展开更多
The mechanical properties of complex concentrated alloys(CCAs)depend on their formed phases and corresponding microstructures.The data-driven prediction of the phase formation and associated mechanical properties is e...The mechanical properties of complex concentrated alloys(CCAs)depend on their formed phases and corresponding microstructures.The data-driven prediction of the phase formation and associated mechanical properties is essential to discovering novel CCAs.The present work collects 557 samples of various chemical compositions,comprising 61 amorphous,167 single-phase crystalline,and 329 multiphases crystalline CCAs.Three classification models are developed with high accuracies to category and understand the formed phases of CCAs.Also,two regression models are constructed to predict the hardness and ultimate tensile strength of CCAs,and the correlation coefficient of the random forest regression model is greater than 0.9 for both of two targeted properties.Furthermore,the Shapley additive explanation(SHAP)values are calculated,and accordingly four most important features are identified.A significant finding in the SHAP values is that there exists a critical value in each of the top four features,which provides an easy and fast assessment in the design of improved mechanical properties of CCAs.The present work demonstrates the great potential of machine learning in the design of advanced CCAs.展开更多
A data augmentation technique is employed in the current work on a training dataset of 610 bulk metallic glasses(BMGs),which are randomly selected from 762 collected data.An ensemble machine learning(ML)model is devel...A data augmentation technique is employed in the current work on a training dataset of 610 bulk metallic glasses(BMGs),which are randomly selected from 762 collected data.An ensemble machine learning(ML)model is developed on augmented training dataset and tested by the rest 152 data.The result shows that ML model has the ability to predict the maximal diameter Dmaxof BMGs more accurate than all reported ML models.In addition,the novel ML model gives the glass forming ability(GFA)rules:average atomic radius ranging from 140 pm to 165 pm,the value of TT/(T-T)(T-T)being higher than 2.5,the entropy of mixing being higher than 10 J/K/mol,and the enthalpy of mixing ranging from-32 k J/mol to-26 k J/mol.ML model is interpretative,thereby deepening the understanding of GFA.展开更多
基金Project supported by “Materials Research by Information Integration” Initiative(MI2I) project of the Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency(JST)
文摘The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance based on data and data science methods, we show the advantages and potential of material informatics to study material issues which are too complicated or time consuming for conventional theoretical and experimental methods. Materials big data is the fundamental of material informatics. The challenges and strategy to construct materials big data are discussed, and some solutions are proposed as the results of our experiences to construct National Institute for Materials Science(NIMS) materials databases.
基金supported by the National Natural Science Foundation of China(Grants Nos.62125402 and 92061113)。
文摘With the rapid development of artificial intelligence and machine learning(ML)methods,materials science is rapidly entering the era of data-driven materials informatics.ML models serve as the most crucial component,closely bridging material structure and material properties.There is a considerable difference in the prediction performance of different ML methods for material systems.Herein,we evaluated three categories(linear,kernel,and nonlinear methods)of models,with twelve ML algorithms commonly used in the materials field.In addition,halide perovskite was chosen as an example to evaluate the fitting performance of different models.We constructed a total dataset of 540 halide perovskites and 72 features,with formation energy and bandgap as target properties.We found that different categories of ML models show similar trends for different target properties.Among them,the difference between the models is enormous for the formation energy,with the coefficient of determination(R2)range 0.69-0.953.The fitting performance between the models is closer for bandgap,with the R^(2)range 0.941-0.997.The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap.It shows that the nonlinear-ensemble model,constructed by combining multiple weak learners,effectively describes the nonlinear relationship between material features and target property.In addition,the extreme gradient boosting decision tree model shows the most superior results among all the models and searches for two new descriptors that are crucial for formation energy and bandgap.Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems.
基金Project supported by the National Key Research and Development Program of China(Grant Nos.2017YFB0701702 and 2016YFB0700501)the National Natural Science Foundation of China(Grant Nos.61472394 and 11534012)Science and Technology Department of Sichuan Province,China(Grant No.2017JZ0001)
文摘MatCloud provides a high-throughput computational materials infrastructure for the integrated management of materials simulation, data, and computing resources. In comparison to AFLOW, Material Project, and NoMad, MatCloud delivers two-fold functionalities: a computational materials platform where users can do on-line job setup, job submission and monitoring only via Web browser, and a materials properties simulation database. It is developed under Chinese Materials Genome Initiative and is a China own proprietary high-throughput computational materials infrastructure. MatCloud has been on line for about one year, receiving considerable registered users, feedbacks, and encouragements. Many users provided valuable input and requirements to MatCloud. In this paper, we describe the present MatCloud, future visions, and major challenges. Based on what we have achieved, we will endeavour to further develop MatCloud in an open and collaborative manner and make MatCloud a world known China-developed novel software in the pressing area of high-throughput materials calculations and materials properties simulation database within Material Genome Initiative.
基金support from the National Science Foundation(NSF)Cyberinfrastructure for Sustained Scientific Innovation program(OAC-1835782)the NSF Designing Materials to Revolutionize and Engineer Our Future program(CMMI-1729743)+1 种基金Center for Hierarchical Materials Design(NIST 70NANB19H005)at Northwestern Universitythe Advanced Research Projects Agency-Energy(APAR-E,DE-AR0001209)。
文摘Building processing,structure,and property(PSP)relations for computational materials design is at the heart of the Materials Genome Initiative in the era of high-throughput computational materials science.Recent technological advancements in data acquisition and storage,microstructure characterization and reconstruction(MCR),machine learning(ML),materials modeling and simulation,data processing,manufacturing,and experimentation have significantly advanced researchers’abilities in building PSP relations and inverse material design.In this article,we examine these advancements from the perspective of design research.In particular,we introduce a data-centric approach whose fundamental aspects fall into three categories:design representation,design evaluation,and design synthesis.Developments in each of these aspects are guided by and benefit from domain knowledge.Hence,for each aspect,we present a wide range of computational methods whose integration realizes data-centric materials discovery and design.
基金supported by the National Natural Science Foundation of China(12074015)the Beijing Outstanding Young Scientists Projects(BJJWZYJH01201910005018).
文摘As an implementation tool of data intensive scientific research methods,machine learning(ML)can effectively shorten the research and development(R&D)cycle of new materials by half or even more.ML shows great potential in the combination with other scientific research technologies,especially in the processing and classification of large amounts of material data from theoretical calculation and experimental characterization.It is very important to systematically understand the research ideas of material informatics to accelerate the exploration of new materials.Here,we provide a comprehensive introduction to the most commonly used ML modeling methods in material informatics with classic cases.Then,we review the latest progresses of prediction models,which focus on new processing–structure–properties–performance(PSPP)relationships in some popular material systems,such as perovskites,catalysts,alloys,two-dimensional materials,and polymers.In addition,we summarize the recent pioneering researches in innovation of material research technology,such as inverse design,ML interatomic potentials,and microtopography characterization assistance,as new research directions of material informatics.Finally,we comprehensively provide the most significant challenges and outlooks related to the future innovation and development in the field of material informatics.This review provides a critical and concise appraisal for the applications of material informatics,and a systematic and coherent guidance for material scientists to choose modeling methods based on required materials and technologies.
基金supported by the European Commission,the European Social Fund and the Calabria Region(C39B18000080002)supported by the UK Engineering and Physical Sciences Research Council(EPSRC)(EP/M026981/1,EP/T021063/1,EP/T024917/1)。
文摘The manufacturing of nanomaterials by the electrospinning process requires accurate and meticulous inspection of related scanning electron microscope(SEM)images of the electrospun nanofiber,to ensure that no structural defects are produced.The presence of anomalies prevents practical application of the electrospun nanofibrous material in nanotechnology.Hence,the automatic monitoring and quality control of nanomaterials is a relevant challenge in the context of Industry 4.0.In this paper,a novel automatic classification system for homogenous(anomaly-free)and non-homogenous(with defects)nanofibers is proposed.The inspection procedure aims at avoiding direct processing of the redundant full SEM image.Specifically,the image to be analyzed is first partitioned into subimages(nanopatches)that are then used as input to a hybrid unsupervised and supervised machine learning system.In the first step,an autoencoder(AE)is trained with unsupervised learning to generate a code representing the input image with a vector of relevant features.Next,a multilayer perceptron(MLP),trained with supervised learning,uses the extracted features to classify non-homogenous nanofiber(NH-NF)and homogenous nanofiber(H-NF)patches.The resulting novel AE-MLP system is shown to outperform other standard machine learning models and other recent state-of-the-art techniques,reporting accuracy rate up to92.5%.In addition,the proposed approach leads to model complexity reduction with respect to other deep learning strategies such as convolutional neural networks(CNN).The encouraging performance achieved in this benchmark study can stimulate the application of the proposed scheme in other challenging industrial manufacturing tasks.
基金supported by the National Key Research and Development Program of China (Grant No. 2018YFB0704404)the Hong Kong Polytechnic University (Internal Grant Nos. 1-ZE8R and G-YBDH)the 111Project of the State Administration of Foreign Experts Affairs and the Ministry of Education,China (Grant No. D16002)。
文摘Knowledge of the mechanical properties of structural materials is essential for their practical applications. In the present work,three-hundred and sixty data samples on four mechanical properties of steels—fatigue strength, tensile strength, fracture strength and hardness—were selected from the Japan National Institute of Material Science database, comprising data on carbon steels and low-alloy steels. Five machine learning algorithms were used to predict the mechanical properties of the materials represented by the three-hundred and sixty data samples, and random forest regression showed the best predictive performance.Feature selection conducted by random forest and symbolic regressions revealed the four most important features that most influence the mechanical properties of steels: the tempering temperature of steel, and the alloying elements of carbon, chromium and molybdenum. Mathematical expressions were generated via symbolic regression, and the expressions explicitly predicted how each of the four mechanical properties varied quantitatively with the four most important features. This study demonstrates the great potential of symbolic regression in the discovery of novel advanced materials.
基金supported by the National Key R&D Program of China(No.2018YFB0704404)the Hong Kong Polytechnic University(internal grant nos.1-ZE8R and G-YBDH)the 111 Project of the State Administration of Foreign Experts Affairs and the Ministry of Education,China(grant no.D16002)。
文摘The mechanical properties of complex concentrated alloys(CCAs)depend on their formed phases and corresponding microstructures.The data-driven prediction of the phase formation and associated mechanical properties is essential to discovering novel CCAs.The present work collects 557 samples of various chemical compositions,comprising 61 amorphous,167 single-phase crystalline,and 329 multiphases crystalline CCAs.Three classification models are developed with high accuracies to category and understand the formed phases of CCAs.Also,two regression models are constructed to predict the hardness and ultimate tensile strength of CCAs,and the correlation coefficient of the random forest regression model is greater than 0.9 for both of two targeted properties.Furthermore,the Shapley additive explanation(SHAP)values are calculated,and accordingly four most important features are identified.A significant finding in the SHAP values is that there exists a critical value in each of the top four features,which provides an easy and fast assessment in the design of improved mechanical properties of CCAs.The present work demonstrates the great potential of machine learning in the design of advanced CCAs.
基金the National Key R&D Program of China(No.2018YFB0704404)the Guangdong Basic and Applied Basic Research Foundation(No.2020A1515110798)+1 种基金the National Natural Science Foundation of China(Grant Nos.91860115)the Stable Supporting Fund of Shenzhen(GXWD20201230155427003-20200728114835006)。
文摘A data augmentation technique is employed in the current work on a training dataset of 610 bulk metallic glasses(BMGs),which are randomly selected from 762 collected data.An ensemble machine learning(ML)model is developed on augmented training dataset and tested by the rest 152 data.The result shows that ML model has the ability to predict the maximal diameter Dmaxof BMGs more accurate than all reported ML models.In addition,the novel ML model gives the glass forming ability(GFA)rules:average atomic radius ranging from 140 pm to 165 pm,the value of TT/(T-T)(T-T)being higher than 2.5,the entropy of mixing being higher than 10 J/K/mol,and the enthalpy of mixing ranging from-32 k J/mol to-26 k J/mol.ML model is interpretative,thereby deepening the understanding of GFA.