A reliable knowledge processing framework for combustion science using foundation models

导出

摘要 This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmented Generation (RAG) framework, the study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature. The multifaceted nature of combustion research emphasizes the critical role of knowledge processing in navigating and extracting valuable information from a vast and diverse pool of sources. The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy. It incorporates prompt engineering and offline open-source LLMs, offering user autonomy in selecting base models. The study provides a thorough examination of text segmentation strategies, conducts comparative studies between LLMs, and explores various optimized prompts to demonstrate the effectiveness of the framework. By incorporating an external vector database, the framework outperforms a conventional LLM in generating accurate responses and constructing robust arguments. Additionally, the study delves into the investigation of optimized prompt templates for the purpose of efficient extraction of scientific literature. Furthermore, we present a targeted scaling study to quantify the algorithmic performance of the framework as the number of prompt tokens increases. The research addresses concerns related to hallucinations and false research articles by introducing a custom workflow developed with a detection algorithm to filter out inaccuracies. Despite identified areas for improvement, the framework consistently delivers accurate domain-specific responses with minimal human oversight. The prompt-agnostic approach introduced holds promise for future improvements. The study underscores the significance of integrating LLMs and knowledge processing techniques in scientific research, providing a foundation for advancements in data assimilation and utilization.

作者 Vansh Sharma Venkat Raman

机构地区 Department of Aerospace Engineering

出处《Energy and AI》 EI 2024年第2期396-416,共21页 能源与人工智能（英文）

基金 support from the Defense Threat Reduction Agency(DTRA)under Grant No.HDTRA12110012 with Dr.Richard Fry as the Program Officer,and partial project support from the Air Force Office of Scientific Research(AFOSR)under Grant No.FA9550-24-1-0017 with Dr.Chiping Li as the Program Officer.

关键词 Large language models(LLM) Foundation models COMBUSTION Knowledge processing Retrieval-augmented generation(RAG)

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1中山大学网络空间安全学院学生获CVPR 2024 Workshop杰出论文奖[J].信息网络安全,2024(8):1240-1240.
2Zongben Xu,Zhi-Hua Zhou,Wenwu Zhu.Machine learning automation[J].National Science Review,2024,11(8):1-2.
3Luisa D’Amore,Rosalba Cacciapuoti.Space-Time Decomposition of Kalman Filter[J].Numerical Mathematics(Theory,Methods and Applications),2023,16(4):847-882.
4Yang Jeong Park,Daniel Kaplan,Zhichu Ren,Chia-Wei Hsu,Changhao Li,Haowei Xu,Sipei Li,Ju Li.Can ChatGPT be used to generate scientific hypotheses?[J].Journal of Materiomics,2024,10(3):578-584. 被引量：1
5Yuan-Feng Song,Yuan-Qin He,Xue-Fang Zhao,Han-Lin Gu,Di Jiang,Hai-Jun Yang,Li-Xin Fan.A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models[J].Journal of Computer Science & Technology,2024,39(4):984-1004.
6GU Yidong,GAO Ming,ZHAO Guangheng,WANG Qiang,LYU Congmin,ZHONG Hongen,LIU Guoning.Recent Progress in Space Science and Applications on Chinese Space Station in 2022–2024[J].空间科学学报,2024,44(4):607-621.
7Shijie Zheng,Liming Song,Xiang Ma,Ping Wang,Rui Qiao,Yue Huang,Xiaoyun Zhao,Hongmei Zhang,Xiaobo Li,Mingyu Ge,Gang Chen,Gongxing Sun,Wenxi Peng,Ce Cai,Wei Chen,Yanqi Du,Dongya Guo,Bing Li,Chaoyang Li,Jianhui Li,Qingxin Li,Jing Liang,Jiacong Liu,Ge Ou,Dongli Shi,Jingyan Shi,Xinying Song,Jin Wang,Wenshuai Wang,Hong Wu,Shuo Xiao,Wangchen Xue,Min Yao,Jianying Ye,Kai Zhang,Peng Zhang,Xiaolu Zhang,Yanqiu Zhang,Guoying Zhao,Shiyi Zhao,Chao Zheng,Shaolin Xiong.The Design of GECAM Scientific Ground Segment[J].Research in Astronomy and Astrophysics,2024,24(10):1-10.
8陈卡.基于模型分割的联邦学习数据隐私保护方法[J].电信科学,2024,40(9):136-145.
9Paul J.Morrow.Legal and Ethical Perspectives on Artificial Intelligence[J].International Relations and Diplomacy,2024,12(4):139-146.
10Baogui Du,Baiheng Feng.An Analysis of Policy Ecology of Scientific Data under the Dual Three-dimensional Framework[J].Innovation and Development Policy,2023(2):167-194.

Energy and AI

2024年第2期

浏览历史

内容加载中请稍等...

A reliable knowledge processing framework for combustion science using foundation models

相关作者

相关机构

相关主题

浏览历史