摘要
当今的大型语言模型(LLM)虽然具备强大的能力,但也面临幻觉、过时知识和推理过程不透明等难题。目前,学术界正在通过整合外部数据库的知识来实现“检索增强生成(RAG)”,以解决这些问题。该方法被认为是最有前途的解决方案,能够增强LLM的准确性和可信度,尤其是在知识密集型任务中。通过将LLM的内在知识与外部数据库的庞大动态存储库融合,RAG使LLM能够持续更新知识并集成特定领域的信息。文章对RAG范式进行了详细研究,分析了其3个基础组件:检索、生成和增强,并重点阐述了嵌入等关键组件中应用的先进技术,报告了RAG系统的当前总体现状。
Although today's Large Language Model(LLM)possess powerful capabilities,they also face challenges such as illusions,outdated knowledge,and opaque reasoning processes.Currently,the academic community is implementing Retrieval Enhanced Generation(RAG)by integrating knowledge from external databases to address these issues.This method is considered the most promising solution,which can enhance the accuracy and credibility of LLM,especially in knowledge intensive tasks.By integrating the intrinsic knowledge of LLM with a vast dynamic repository of external databases,RAG enables LLM to continuously update knowledge and integrate domain specific information.The article provides a detailed study of the RAG paradigm,analyzing its three basic components:retrieval,generation,and enhancement.It focuses on advanced technologies applied in key components such as embedding and reports on the current overall status of the RAG system.
作者
蒋雷
汤海林
JIANG Lei;TANG Hailin(Department of Data Engineering,School of Big Data and Computer Science,Guangdong Baiyun University,Guangzhou 510450,China)
基金
2021年广东省教育厅质量工程项目:大数据+赋能专业融合的构建与实践探索(CXQX-JY202101)。
关键词
检索增强生成
大语言模型
数据库
rretrieval argumentation generation
large language model
database