摘要
人工智能的发展正在改变材料科学领域.然而,大规模材料数据集中存在错误数据以及利用机器学习预测与温度相关的性质时出现过拟合等挑战.本文以热电材料为例,首先采取一系列合理的方法删除问题数据,从Starrydata2数据库中获得包括7295种成分在不同温度下的92,291个数据.然后,提出了一种基于成分的交叉验证方法避免过拟合.进而,使用梯度提升决策树方法构建了机器学习模型,并获得了显著的R2.最后,使用该模型对Materials Project数据库中的材料进行评估,Ge2Te5As2和Ge3(Te3As)2表现出较高的zT值.理论计算得到n型和p型Ge2Te5As2的最大zT值为1.98和2.12,n型和p型Ge3(Te3As)2的最大zT值为0.58和0.74,表明它们是有潜力的热电材料.本工作提出了一个处理和克服材料科学中的人工智能大数据挑战的示例.
The development of artificial intelligence(AI),particularly,data science and machine learning(ML),is revolutionizing the field of material science.Yet,some inevitable key challenges remain,including errors contained in largescale material datasets and the overfitting of predicted temperature-dependent properties.In this work,using thermoelectric(TE)materials as an archetypal example,we firstly performed a series of rational actions to identify and discard questionable data,and obtained 92,291 data points consisting of 7295 compositions and different temperatures from the Starrydata2 database.Next,we proposed a composition-based cross-validation method to emphasize that the data points with the same compositions but different temperatures should not be split into different sets to avoid overfitting.Then,we built ML models using the gradient boosting decision tree(GBDT)method,and achieved remarkable R?values of~0.89,~0.90,and~0.89 on the training dataset,test dataset,and new out-of-sample experimental data published in 2023,verifying the model's high accuracy in predicting newly available materials.Using this ML model,we carried out a large-scale evaluation of the stable materials from the Materials Project database,and Ge,TesAs2 and Ges(TesAs)2 were predicted to exhibit high zT values.Density functional theory calculations were then executed and the calculated maximum zT values were 1.98 and 2.12 for n-and p-type Ge2TesAs2,and 0.58 and 0.74 for n-and p-type Ges(TesAs)2,respectively,indicating their potential as TE materials and supporting our ML model.This work presents an example of dealing with and overcoming big data challenges in AI for materials science.
作者
贾雪
Alex Aziz
Yusuke Hashimoto
李昊
Xue Jia;Alex Aziz;Yusuke Hashimoto;Hao Li(Advanced Institute for Materials Research(WPI-AIMR),Tohoku University,Sendai 980-8577,Japan;Tohoku Forum for Creativity,Tohoku University,Sendai 980-8577,Japan)
基金
supported by the JSPS KAKENHI (JP23K13599)
the Hirose Foundation。
关键词
人工智能
大数据
机器学习
过拟合
热电材料
材料数据
数据库
温度相关
thermoelectric
artificial intelligence
machine learning
cross-validation
density functional theory calculations