Predictive models based on graph neural network (GNN) have attracted increasing interest in quantitative structure-property relation (QSPR) modeling of organic species including biofuel components in recent years. For...Predictive models based on graph neural network (GNN) have attracted increasing interest in quantitative structure-property relation (QSPR) modeling of organic species including biofuel components in recent years. For the task of property prediction of biofuel-relevant species, the present work applies the Directed Message Passing Neural Network (D-MPNN) framework, an emerging type of GNN, and incorporates graph attention on the DMPNN architecture to improve its capability. modeling using other common machine learning methods is also conducted, confirming the advantage of D-MPNN in comparison. Graph Edge Attention (GEA) is proposed for the D-MPNN layers and shows success in increasing model accuracy after implementation. A relatively sizable subset from the QM9 data and 4 other datasets forming a wide scope of target properties (e.g., thermodynamic properties, ignition properties, surface tension, etc.) are selected for the models. A breakdown analysis of the species distribution of these datasets is conducted for more informed modeling. As the data availability of biofuel species is often a main obstacle for related modeling tasks, this study shows that the performance of D-MPNN with the proposed GEA attention mechanism is most enhanced when using a medium data size of 2000~5000. Some discussions are made regarding data issues and the use of machine learning methods and graph attention for the predictive modeling of biofuel properties, pointing out the need for more data with better species distribution that is representative of biofuels.展开更多
基金supported by the National Key Research and Development Program of China(Grand No.2022YFE0199600).
文摘Predictive models based on graph neural network (GNN) have attracted increasing interest in quantitative structure-property relation (QSPR) modeling of organic species including biofuel components in recent years. For the task of property prediction of biofuel-relevant species, the present work applies the Directed Message Passing Neural Network (D-MPNN) framework, an emerging type of GNN, and incorporates graph attention on the DMPNN architecture to improve its capability. modeling using other common machine learning methods is also conducted, confirming the advantage of D-MPNN in comparison. Graph Edge Attention (GEA) is proposed for the D-MPNN layers and shows success in increasing model accuracy after implementation. A relatively sizable subset from the QM9 data and 4 other datasets forming a wide scope of target properties (e.g., thermodynamic properties, ignition properties, surface tension, etc.) are selected for the models. A breakdown analysis of the species distribution of these datasets is conducted for more informed modeling. As the data availability of biofuel species is often a main obstacle for related modeling tasks, this study shows that the performance of D-MPNN with the proposed GEA attention mechanism is most enhanced when using a medium data size of 2000~5000. Some discussions are made regarding data issues and the use of machine learning methods and graph attention for the predictive modeling of biofuel properties, pointing out the need for more data with better species distribution that is representative of biofuels.