Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis.However,most content in the scientific literature is locked-up in written natural l...Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis.However,most content in the scientific literature is locked-up in written natural language,which is difficult to parse into databases using explicitly hard-coded classification rules.In this work,we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language.Without any human input,latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps,such as“grinding”and“heating”,“dissolving”and“centrifuging”,etc.Guided by a modest amount of annotation,a random forest classifier can then associate these steps with different categories of materials synthesis,such as solid-state or hydrothermal synthesis.Finally,we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures.Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized,machine-readable database.展开更多
First-principles based cluster expansion models are the dominant approach in ab initio thermodynamics of crystalline mixtures enabling the prediction of phase diagrams and novel ground states.However,despite recent ad...First-principles based cluster expansion models are the dominant approach in ab initio thermodynamics of crystalline mixtures enabling the prediction of phase diagrams and novel ground states.However,despite recent advances,the construction of accurate models still requires a careful and time-consuming manual parameter tuning process for ground-state preservation,since this property is not guaranteed by default.In this paper,we present a systematic and mathematically sound method to obtain cluster expansion models that are guaranteed to preserve the ground states of their reference data.The method builds on the recently introduced compressive sensing paradigm for cluster expansion and employs quadratic programming to impose constraints on the model parameters.The robustness of our methodology is illustrated for two lithium transition metal oxides with relevance for Li-ion battery cathodes,i.e.,Li_(2x)Fe_(2(1−x))O_(2) and Li_(2x)Ti_(2(1−x))O_(2),for which the construction of cluster expansion models with compressive sensing alone has proven to be challenging.We demonstrate that our method not only guarantees ground-state preservation on the set of reference structures used for the model construction,but also show that out-of-sample ground-state preservation up to relatively large supercell size is achievable through a rapidly converging iterative refinement.This method provides a general tool for building robust,compressed and constrained physical models with predictive power.展开更多
基金Funding to support this work was provided by the Energy&Biosciences Institute through the EBI-Shell program,Office of Naval Research(ONR)Award #N00014-14-1-0444the National Science Foundation under Grant No 5710003959.
文摘Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis.However,most content in the scientific literature is locked-up in written natural language,which is difficult to parse into databases using explicitly hard-coded classification rules.In this work,we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language.Without any human input,latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps,such as“grinding”and“heating”,“dissolving”and“centrifuging”,etc.Guided by a modest amount of annotation,a random forest classifier can then associate these steps with different categories of materials synthesis,such as solid-state or hydrothermal synthesis.Finally,we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures.Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized,machine-readable database.
基金supported primarily by the US Department of Energy(DOE)under Contract No.DE-FG02-96ER45571.
文摘First-principles based cluster expansion models are the dominant approach in ab initio thermodynamics of crystalline mixtures enabling the prediction of phase diagrams and novel ground states.However,despite recent advances,the construction of accurate models still requires a careful and time-consuming manual parameter tuning process for ground-state preservation,since this property is not guaranteed by default.In this paper,we present a systematic and mathematically sound method to obtain cluster expansion models that are guaranteed to preserve the ground states of their reference data.The method builds on the recently introduced compressive sensing paradigm for cluster expansion and employs quadratic programming to impose constraints on the model parameters.The robustness of our methodology is illustrated for two lithium transition metal oxides with relevance for Li-ion battery cathodes,i.e.,Li_(2x)Fe_(2(1−x))O_(2) and Li_(2x)Ti_(2(1−x))O_(2),for which the construction of cluster expansion models with compressive sensing alone has proven to be challenging.We demonstrate that our method not only guarantees ground-state preservation on the set of reference structures used for the model construction,but also show that out-of-sample ground-state preservation up to relatively large supercell size is achievable through a rapidly converging iterative refinement.This method provides a general tool for building robust,compressed and constrained physical models with predictive power.