Machine learning-assisted prediction of polymer properties prior to synthesis has the potential to significantly accelerate the discovery and development of new polymer materials.To date,several approaches have been i...Machine learning-assisted prediction of polymer properties prior to synthesis has the potential to significantly accelerate the discovery and development of new polymer materials.To date,several approaches have been implemented to represent the chemical structure in machine learning models,among which Mol2Vec embeddings have attracted considerable attention in the cheminformatics community since their introduction in 2018.However,for small datasets,the use of chemical structure representations typically increases the dimensionality of the input dataset,resulting in a decrease in model performance.Furthermore,the limited diversity of polymer chemical structures hinders the training of reliable embeddings,necessitating complex task-specific architecture implementations.To address these challenges,we examined the efficacy of Mol2Vec pre-trained embeddings in deriving vectorized representations of polymers.This study assesses the impact of incorporating Mol2Vec compound vectors into the input features on the efficacy of a model reliant on the physical properties of 214 polymers.The results will hopefully highlight the potential for improving prediction accuracy in polymer studies by incorporating pre-trained embeddings or promote their utilization when dealing with modestly sized polymer databases.展开更多
基金the framework of the program of state support for the centers of the National Technology Initiative(NTI)on the basis of educational institutions of higher education and scientific organizations(Center NTI"Digital Materials Science:New Materials and Substances"on the basis of the Bauman Moscow State Technical University).
文摘Machine learning-assisted prediction of polymer properties prior to synthesis has the potential to significantly accelerate the discovery and development of new polymer materials.To date,several approaches have been implemented to represent the chemical structure in machine learning models,among which Mol2Vec embeddings have attracted considerable attention in the cheminformatics community since their introduction in 2018.However,for small datasets,the use of chemical structure representations typically increases the dimensionality of the input dataset,resulting in a decrease in model performance.Furthermore,the limited diversity of polymer chemical structures hinders the training of reliable embeddings,necessitating complex task-specific architecture implementations.To address these challenges,we examined the efficacy of Mol2Vec pre-trained embeddings in deriving vectorized representations of polymers.This study assesses the impact of incorporating Mol2Vec compound vectors into the input features on the efficacy of a model reliant on the physical properties of 214 polymers.The results will hopefully highlight the potential for improving prediction accuracy in polymer studies by incorporating pre-trained embeddings or promote their utilization when dealing with modestly sized polymer databases.