A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing 被引量：1

导出

摘要 The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from literature.We used natural language processing methods to automatically extract material property data from the abstracts of polymer literature.As a component of our pipeline,we trained MaterialsBERT,a language model,using 2.4 million materials science abstracts,which outperforms other baseline models in three out of five named entity recognition datasets.Using this pipeline,we obtained~300,000 material property records from~130,000 abstracts in 60 hours.The extracted data was analyzed for a diverse range of applications such as fuel cells,supercapacitors,and polymer solar cells to recover non-trivial insights.The data extracted through our pipeline is made available at polymerscholar.org which can be used to locate material property data recorded in abstracts.This work demonstrates the feasibility of an automatic pipeline that starts from published literature and ends with extracted material property information.

作者 Pranav Shetty Arunkumar Chitteth Rajan Chris Kuenneth Sonakshi Gupta Lakshmi Prerana Panchumarti Lauren Holm Chao Zhang Rampi Ramprasad

机构地区 School of Computational Science&Engineering School of Materials Science and Engineering Department of Metallurgy Engineering and Materials Science

出处《npj Computational Materials》 SCIE EI CSCD 2023年第1期1826-1837,共12页 计算材料学（英文）

基金 This work was supported by the Office of Naval Research through grants N00014-19-1-2103 and N00014-20-1-2175.Helpful discussions and feedback from Dr.Lihua Chen are acknowledged.Pranav Shetty was partially funded by a fellowship by JPMorgan Chase&Co.that helped to support this research.Any views or opinions expressed herein are solely those of the authors listed,and may differ from the views and opinions expressed by JPMorgan Chase&Co.or its affiliates.

关键词 PROPERTY INSIGHT PIPELINE

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1Tanishq Gupta,Mohd Zaki,N.M.Anoop Krishnan,Mausam.MatSciBERT:A materials domain language model for text mining and information extraction[J].npj Computational Materials,2022(1):940-950. 被引量：7

共引文献6

1单斌,李豪杰,文艳伟,陈蓉.大语言模型时代的材料信息提取和数据驱动研发[J].金属功能材料,2023,30(3):1-16. 被引量：4
2李长泰,韩旭,蒋若辉,贠培文,胡鹏飞,班晓娟.大模型及其在材料科学中的应用与展望[J].工程科学学报,2024,46(2):290-305. 被引量：4
3Luke P.J.Gilligan,Matteo Cobelli,Valentin Taufour,Stefano Sanvito.A rule-free workflow for the automated generation of databases from scientific literature[J].npj Computational Materials,2023(1):96-109.
4黄星瑞.基于语义块识别的材料科学文献工艺数据实体关系抽取[J].化工自动化及仪表,2024,51(3):507-515.
5李海军,王卓.基于知识融合和聚类引导的语言模型用于MOFs合成信息分类[J].现代电子技术,2024,47(18):179-186.
6谭晶维,张怀清,刘洋,杨杰,郑东萍.问答式林业预训练语言模型ForestBERT[J].林业科学,2024,60(9):99-110.

引证文献1

1秦娜,陈海标,卫来.基于BERT语言模型的电催化材料相关文献信息提取与分析[J].伊犁师范大学学报（自然科学版）,2024,18(2):80-84.

1Pengcheng Xu,Xiaobo Ji,Minjie Li,Wencong Lu.Small data machine learning in materials science[J].npj Computational Materials,2023(1):1920-1934. 被引量：3
2Yingyue Zhang,Wentao Zou,Youdi Zhang,Pei Cheng,Long Ye,Ke Gao.A record-breaking high efficiency facilitated by hierarchical morphology in all polymer solar cells[J].Journal of Energy Chemistry,2023(12):460-461. 被引量：2
3赵继贵,钱育蓉,王魁,侯树祥,陈嘉颖.中文命名实体识别研究综述[J].计算机工程与应用,2024,60(1):15-27. 被引量：12
4Ammar Saeed,Eesa Al Solami.Fake News Detection Using Machine Learning and Deep Learning Methods[J].Computers, Materials & Continua,2023,77(11):2079-2096.
5Yi-Fan Shen,Jianqi Zhang,Chenyang Tian,Dingding Qiu,Zhixiang Wei.Slot-die coated large-area flexible all-polymer solar cells by nonhalogenated solvent[J].Nano Research,2023,16(12):13008-13013.
6周奕,郑骁庆,黄萱菁.Chinese Named Entity Recognition Augmented with Lexicon Memory[J].Journal of Computer Science & Technology,2023,38(5):1021-1035.
7Jiajin XUE,Min SHAO,Zhigang GAO,Ning HU.Advances in micro-nano biosensing platforms for intracellular electrophysiology[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2023,24(11):1017-1026.
8Md. Tofael Ahmed,Almas Hossain Antar,Maqsudur Rahman,Abu Zafor Muhammad Touhidul Islam,Dipankar Das,Md. Golam Rashed.Social Media Cyberbullying Detection on Political Violence from Bangla Texts Using Machine Learning Algorithm[J].Journal of Intelligent Learning Systems and Applications,2023,15(4):108-122.
9胡雪儿,董晓华,马耀明,章程焱,薄会娟,郭东淏.澜沧江流域卫星产品降尺度与融合方法[J].农业工程学报,2023,39(20):140-147.
10张龙印,谭新,孔芳,李培峰,周国栋.Top-down Text-Level Discourse Rhetorical Structure Parsing with Bidirectional Representation Learning[J].Journal of Computer Science & Technology,2023,38(5):985-1001.

npj Computational Materials

2023年第1期

浏览历史

内容加载中请稍等...

A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing 被引量：1

参考文献1

共引文献6

引证文献1

相关作者

相关机构

相关主题

浏览历史