Information on the physicochemical properties of chemical species is an important prerequisite when performing tasks such as process design and product design.However,the lack of extensive data and high experimental c...Information on the physicochemical properties of chemical species is an important prerequisite when performing tasks such as process design and product design.However,the lack of extensive data and high experimental costs hinder the development of prediction techniques for these properties.Moreover,accuracy and predictive capabilities still limit the scope and applicability of most property estimation methods.This paper proposes a new Gaussian process-based modeling framework that aims to manage a discrete and high-dimensional input space related to molecular structure representation with the group-contribution approach.A warping function is used to map discrete input into a continuous domain in order to adjust the correlation between different compounds.Prior selection techniques,including prior elicitation and prior predictive checking,are also applied during the building procedure to provide the model with more information from previous research findings.The framework is assessed using datasets of varying sizes for 20 pure component properties.For 18 out of the 20 pure component properties,the new models are found to give improved accuracy and predictive power in comparison with other published models,with and without machine learning.展开更多
基金support from the National Natural Science Foundation of China(22150410338 and 61973268)is gratefully acknowledged.
文摘Information on the physicochemical properties of chemical species is an important prerequisite when performing tasks such as process design and product design.However,the lack of extensive data and high experimental costs hinder the development of prediction techniques for these properties.Moreover,accuracy and predictive capabilities still limit the scope and applicability of most property estimation methods.This paper proposes a new Gaussian process-based modeling framework that aims to manage a discrete and high-dimensional input space related to molecular structure representation with the group-contribution approach.A warping function is used to map discrete input into a continuous domain in order to adjust the correlation between different compounds.Prior selection techniques,including prior elicitation and prior predictive checking,are also applied during the building procedure to provide the model with more information from previous research findings.The framework is assessed using datasets of varying sizes for 20 pure component properties.For 18 out of the 20 pure component properties,the new models are found to give improved accuracy and predictive power in comparison with other published models,with and without machine learning.