Machine learning for materials science envisions the acceleration of basic science research through automated identification of key data relationships to augment human interpretation and gain scientific understanding....Machine learning for materials science envisions the acceleration of basic science research through automated identification of key data relationships to augment human interpretation and gain scientific understanding.A primary role of scientists is extraction of fundamental knowledge from data,and we demonstrate that this extraction can be accelerated using neural networks via analysis of the trained data model itself rather than its application as a prediction tool.Convolutional neural networks excel at modeling complex data relationships in multi-dimensional parameter spaces,such as that mapped by a combinatorial materials science experiment.Measuring a performance metric in a given materials space provides direct information about(locally)optimal materials but not the underlying materials science that gives rise to the variation in performance.By building a model that predicts performance(in this case photoelectrochemical power generation of a solar fuels photoanode)from materials parameters(in this case composition and Raman signal),subsequent analysis of gradients in the trained model reveals key data relationships that are not readily identified by human inspection or traditional statistical analyses.Human interpretation of these key relationships produces the desired fundamental understanding,demonstrating a framework in which machine learning accelerates data interpretation by leveraging the expertize of the human scientist.We also demonstrate the use of neural network gradient analysis to automate prediction of the directions in parameter space,such as the addition of specific alloying elements,that may increase performance by moving beyond the confines of existing data.展开更多
Automated experimentation has yielded data acquisition rates that supersede human processing capabilities.Artificial Intelligence offers new possibilities for automating data interpretation to generate large,high-qual...Automated experimentation has yielded data acquisition rates that supersede human processing capabilities.Artificial Intelligence offers new possibilities for automating data interpretation to generate large,high-quality datasets.Background subtraction is a long-standing challenge,particularly in settings where multiple sources of the background signal coexist,and automatic extraction of signals of interest from measured signals accelerates data interpretation.Herein,we present an unsupervised probabilistic learning approach that analyzes large data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest.The approach is demonstrated on X-ray diffraction and Raman spectroscopy data and is suitable to any type of data where the signal of interest is a positive addition to the background signals.While the model can incorporate prior knowledge,it does not require knowledge of the signals since the shapes of the background signals,the noise levels,and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework.Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets,a transformative capability with many applications in the physical sciences and beyond.展开更多
In an era of rapid advancement of algorithms that extract knowledge from data,data and metadata management are increasingly critical to research success.In materials science,there are few examples of experimental data...In an era of rapid advancement of algorithms that extract knowledge from data,data and metadata management are increasingly critical to research success.In materials science,there are few examples of experimental databases that contain many different types of information,and compared with other disciplines,the database sizes are relatively small.Underlying these issues are the challenges in managing and linking data across disparate synthesis and characterization experiments,which we address with the development of a lightweight data management framework that is generally applicable for experimental science and beyond.Five years of managing experiments with this system has yielded the Materials Experiment and Analysis Database(MEAD)that contains raw data and metadata from millions of materials synthesis and characterization experiments,as well as the analysis and distillation of that data into property and performance metrics via software in an accompanying open source repository.The unprecedented quantity and diversity of experimental data are searchable by experiment and analysis attributes generated by both researchers and data processing software.The search web interface allows users to visualize their search results and download zipped packages of data with full annotations of their lineage.The enormity of the data provides substantial challenges and opportunities for incorporating data science in the physical sciences,and MEAD’s data and algorithm management framework will foster increased incorporation of automation and autonomous discovery in materials and chemistry research.展开更多
基金This study is based upon work performed by the Joint Center for Artificial Photosynthesis,a DOE Energy Innovation Hub,supported through the Office of Science of the U.S.Department of Energy(Award No.DE-SC0004993).
文摘Machine learning for materials science envisions the acceleration of basic science research through automated identification of key data relationships to augment human interpretation and gain scientific understanding.A primary role of scientists is extraction of fundamental knowledge from data,and we demonstrate that this extraction can be accelerated using neural networks via analysis of the trained data model itself rather than its application as a prediction tool.Convolutional neural networks excel at modeling complex data relationships in multi-dimensional parameter spaces,such as that mapped by a combinatorial materials science experiment.Measuring a performance metric in a given materials space provides direct information about(locally)optimal materials but not the underlying materials science that gives rise to the variation in performance.By building a model that predicts performance(in this case photoelectrochemical power generation of a solar fuels photoanode)from materials parameters(in this case composition and Raman signal),subsequent analysis of gradients in the trained model reveals key data relationships that are not readily identified by human inspection or traditional statistical analyses.Human interpretation of these key relationships produces the desired fundamental understanding,demonstrating a framework in which machine learning accelerates data interpretation by leveraging the expertize of the human scientist.We also demonstrate the use of neural network gradient analysis to automate prediction of the directions in parameter space,such as the addition of specific alloying elements,that may increase performance by moving beyond the confines of existing data.
基金The development of the MCBL algorithm,inkjet printing synthesis,and Raman measurements were supported by a an Accelerated Materials Design and Discovery grant from the Toyota Research InstituteInitial design of the algorithm and data procurement were supported by the NSF Expedition award for Computational Sustainability CCF-1522054 and by Army Research Office(ARO)award W911-NF-14-1-0498+2 种基金The implementation of the algorithm for automated,unsupervised operation was supported by MURI/AFOSR grant FA9550Compute infrastructure was provided by NSF award CNS-0832782 and by ARO DURIP award W911NF-17-1-0187The sputter deposition and XRD measurements were supported through the Office of Science of the U.S.Department of Energy under Award No.DE-SC0004993.
文摘Automated experimentation has yielded data acquisition rates that supersede human processing capabilities.Artificial Intelligence offers new possibilities for automating data interpretation to generate large,high-quality datasets.Background subtraction is a long-standing challenge,particularly in settings where multiple sources of the background signal coexist,and automatic extraction of signals of interest from measured signals accelerates data interpretation.Herein,we present an unsupervised probabilistic learning approach that analyzes large data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest.The approach is demonstrated on X-ray diffraction and Raman spectroscopy data and is suitable to any type of data where the signal of interest is a positive addition to the background signals.While the model can incorporate prior knowledge,it does not require knowledge of the signals since the shapes of the background signals,the noise levels,and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework.Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets,a transformative capability with many applications in the physical sciences and beyond.
基金This study and the acquisition of all data is based upon work performed by the Joint Center for Artificial Photosynthesis,a DOE Energy Innovation Hub,supported through the Office of Science of the US Department of Energy(Award No.DE-SC0004993)Use of the Stanford Synchrotron Radiation Lightsource,SLAC National Accelerator Laboratory,is supported by the US Department of Energy,Office of Science,Office of Basic Energy Sciences under Contract No.DE-AC02-76SF00515.
文摘In an era of rapid advancement of algorithms that extract knowledge from data,data and metadata management are increasingly critical to research success.In materials science,there are few examples of experimental databases that contain many different types of information,and compared with other disciplines,the database sizes are relatively small.Underlying these issues are the challenges in managing and linking data across disparate synthesis and characterization experiments,which we address with the development of a lightweight data management framework that is generally applicable for experimental science and beyond.Five years of managing experiments with this system has yielded the Materials Experiment and Analysis Database(MEAD)that contains raw data and metadata from millions of materials synthesis and characterization experiments,as well as the analysis and distillation of that data into property and performance metrics via software in an accompanying open source repository.The unprecedented quantity and diversity of experimental data are searchable by experiment and analysis attributes generated by both researchers and data processing software.The search web interface allows users to visualize their search results and download zipped packages of data with full annotations of their lineage.The enormity of the data provides substantial challenges and opportunities for incorporating data science in the physical sciences,and MEAD’s data and algorithm management framework will foster increased incorporation of automation and autonomous discovery in materials and chemistry research.