摘要
Accurately and quickly predicting hydrogen embrittlement performance is critical for the service of metal materials.However,due to multi-source heterogeneity,existing hydrogen embrittlement data are missing,making it impractical to train reliable machine learning models.In this study,we proposed an ensemble learning training strategy for missing data based on the Adaboost algorithm.This method introduced a mask matrix with missing data and enabled each round of training to generate sub-datasets,considering missing value information.The strategy first trained a subset of features based on the existing dataset and a selected method and continuously focused on the combination of features with the highest error for iterative training,where the mask matrix of the missing data was used as the input to fit the weights of each base learner using a neural network.Compared with directly modeling on highly sparse data,the predictive ability of this strategy was significantly improved by approximately 20%.In addition,in the testing of new samples,the predicted mean absolute error of the new model was successfully reduced from 0.2 to 0.09.This strategy offers good adaptability to the hydrogen embrittlement sensitivity of different sizes and can avoid interference from feature importance caused by filling data.
基金
the support of National Key Research and Development Program of China(2022YFB3707500,2021YFB3802101).