摘要
【目的】从给定的专题研究论文中识别用于有机材料研发所需的实验材料信息,抽取已有研究中给体-受体体系下有机电池材料实体、类型实例。【方法】利用本地部署的大语言模型和提示工程,将信息抽取任务转化为无需微调的对话式抽取任务。通过在提示模板中添加少量示例并允许大语言模型给出否定回答的方式,识别相应的实例信息。【结果】在没有使用数据集进行微调的情况下,实现了材料实体和类型的抽取,其中实体识别的准确率为0.98,超过微调的方式,材料类型识别的准确率达到0.94。【局限】受本地计算资源的约束,本文降低了大语言模型的参数精度,对于长实体的识别性能偏低。【结论】采用低配本地化部署的基础大语言模型,通过构建合理的提示指令和人机协作模式,可以灵活、高效抽取所需主题下的实验信息。
[Objective]This paper extracts entities and type instances of battery materials.It identifies experimental information needed to develop related materials from given research.[Methods]We utilized a locally deployed Large Language Model(LLM)and prompt engineering to transform the information extraction task into dialog-based extraction tasks without fine-tuning.We identified the relevant instance information by adding a few examples to the prompt template and allowing the LLM to provide negative answers.[Results]Without using a dataset for fine-tuning,we extracted materials entities and types with an entity recognition accuracy of 0.98,surpassing fine-tuned methods,and the material type recognition accuracy reached 0.94.[Limitations]Due to the constraints of local computational resources,the LLM's precision results in lower performance in recognizing long entities.[Conclusions]The proposed method could effectively and flexibly extract experimental information from research papers.
作者
时宗彬
朱丽雅
乐小虬
Shi Zongbin;Zhu Liya;Le Xiaoqiu(National Science Library,Chinese Academy of Sciences,Beijing 100190,China;Department of Information Resources Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190,China)
出处
《数据分析与知识发现》
EI
CSSCI
CSCD
北大核心
2024年第7期23-31,共9页
Data Analysis and Knowledge Discovery
基金
国家社会科学基金项目(项目编号:23BTQ102)的研究成果之一。
关键词
大语言模型
提示工程
信息抽取
有机电池材料
Large Language Model
Prompt Engineering
Information Extraction
Organic Battery Materials