期刊文献+

面向大语言模型驱动的智能体的计划复用机制

A Plan Reuse Mechanism for LLM-Driven Agent
下载PDF
导出
摘要 将大语言模型集成到个人助手中(如小爱同学、蓝心小V等)能有效提升个人助手与人类交互、解决复杂问题、管理物联网设备等能力,这类助手也被称为大模型驱动的智能体,也可称其为智能体.智能体接收到用户请求后,首先调用大模型生成计划,之后调用各类工具执行计划并将响应返回给用户.上述过程中,智能体使用大模型生成计划的延迟可达数十秒,十分影响用户体验.对真实数据的分析显示,智能体接收到的请求中约有30%是相同或相似的,此类请求可复用先前生成的计划,以降低智能体响应延迟.然而,直接对请求原始文本进行相似度评估难以准确界定智能体接收到的请求文本间的相似性.此外,自然语言表达的多样性和大模型生成的非结构化计划文本导致难以对计划进行有效复用.针对上述问题,提出并实现了面向大模型驱动的智能体的计划复用机制AgentReuse,通过利用请求文本间语义的相似性和差异性,采用基于意图分类的方法来界定请求间的相似性并实现计划复用.基于真实数据集的实验结果表明,AgentReuse对计划的有效复用率为93%,对请求进行相似性界定的F1分数为0.9718,准确率为0.9459,与不采用复用机制相比,可减少93.12%的延迟. Integrating large language models(LLMs)into personal assistants,like Xiao Ai and Blue Heart V,effectively enhances their ability to interact with humans,solve complex tasks,and manage IoT devices.Such assistants are also termed LLM-driven agents.Upon receiving user requests,the LLM-driven agent generates plans using an LLM,executes these plans through various tools,and then returns the response to the user.During this process,the latency for generating a plan with an LLM can reach tens of seconds,significantly degrading user experience.Real-world dataset analysis shows that about 30%of the requests received by LLM-driven agents are identical or similar,which allows the reuse of previously generated plans to reduce latency.However,it is difficult to accurately define the similarity between the request texts received by the LLM-driven agent through directly evaluating the original request texts.Moreover,the diverse expressions of natural language and the unstructured format of plan texts make implementing plan reuse challenging.To address these issues,we present and implement a plan reuse mechanism for LLM-driven agents called AgentReuse.AgentReuse leverages the similarities and differences among requests’semantics and uses intent classification to evaluate the similarities between requests and enable the reuse of plans.Experimental results based on a real-world dataset demonstrate that AgentReuse achieves a 93%effective plan reuse rate,an F1 score of 0.9718,and an accuracy of 0.9459 in evaluating request similarities,reducing latency by 93.12%compared with baselines without using the reuse mechanism.
作者 李国鹏 吴瑞骐 谈海生 陈国良 Li Guopeng;Wu Ruiqi;Tan Haisheng;Chen Guoliang(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027)
出处 《计算机研究与发展》 EI CSCD 北大核心 2024年第11期3706-3720,共15页 Journal of Computer Research and Development
基金 科技创新2030—“新一代人工智能”重大项目(2021ZD0110400) 国家自然科学基金重点项目(62132009) 中央高校基本科研业务费专项资金。
关键词 智能物联网 大语言模型 智能体 语义缓存 相似度评估 artificial intelligence of things large language models(LLMs) agent semantic cache similarity evaluation
  • 相关文献

参考文献6

二级参考文献27

  • 1Ferdman M, Adileh A, Kocberber O, et al. Clearing the clouds~ A study of emerging scale out workloads on modernhardware [C] //Proc of the 17th Conf on Architecture Support for Programming Languages and Operating Systems (ASPLOS). New York: ACM, 2012:37-48.
  • 2Sorin D J, Hill M D, Wood D A. A Primer on Memory Consistency and Cache Coherence [M]. San Rafael, CA: Morgan & Claypool Publishers, 2011.
  • 3Barroso L A, Gharachorloo K, McNamara R, et al. Piranha: A scalable architecture based on single chip multiprocessing [C] //Proc of the 27th Annual Int Symp on Computer Architecture (ISCA). New York; ACM, 2000 282 -293.
  • 4Sun, OpenSPARCTM T2 core microarchitecture specification [R/OL]. Sun MicroSysmtes, Inc, 2007 [2015 -04-20]. http~//www, oracle, com/technetwork/systems/opensparc/t2 06 opensparet2 core-microarch 1537749. html.
  • 5Singhal R. Inside Intel next generation Nehalem micorarchlteeture [R/OL]. Intel Corporation, 2008 [2015- 04-20]. http://weblab, cs. uml. edu/-bill/cs515/Intel Nehalem Processor. pdf.
  • 6Ferdman M, Pejman L K, Balet K, et al. Cuckoo directory: A scalable directory for many-core systems [C] //Proc of the 17th Int Syrup on High Performance Computer Architecture (HPCA). New York: ACM, 2011:169-180.
  • 7Cuesta B A, Ros A, Gomez M E,' et al. Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks [C] //Proe of the 38th Annual Int Syrup on Computer Architecture (ISCA). New York: ACM, 2011, 93-104.
  • 8Pejman L K, Grot B, Fredman M, et al. Scale-out Processors [C] //Proc of the 39th Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2012:500-511.
  • 9Gupta A, Weber W, Mowry T. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes [C] //Proe of the 1990 Int Conf on Paraliel Processing (ICPP). Berlin: Springer, 1992:167-192.
  • 10Martin M M K, Hill M D, Sorin D J. Why on-chip cache coherence is here to stay [J]. Communications of the ACM, 2012, 55(7): 78-89.

共引文献58

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部