期刊文献+

Smaller & Smarter: Score-Driven Network Chaining of Smaller Language Models

Smaller & Smarter: Score-Driven Network Chaining of Smaller Language Models
下载PDF
导出
摘要 With the continuous evolution and expanding applications of Large Language Models (LLMs), there has been a noticeable surge in the size of the emerging models. It is not solely the growth in model size, primarily measured by the number of parameters, but also the subsequent escalation in computational demands, hardware and software prerequisites for training, all culminating in a substantial financial investment as well. In this paper, we present novel techniques like supervision, parallelization, and scoring functions to get better results out of chains of smaller language models, rather than relying solely on scaling up model size. Firstly, we propose an approach to quantify the performance of a Smaller Language Models (SLM) by introducing a corresponding supervisor model that incrementally corrects the encountered errors. Secondly, we propose an approach to utilize two smaller language models (in a network) performing the same task and retrieving the best relevant output from the two, ensuring peak performance for a specific task. Experimental evaluations establish the quantitative accuracy improvements on financial reasoning and arithmetic calculation tasks from utilizing techniques like supervisor models (in a network of model scenario), threshold scoring and parallel processing over a baseline study. With the continuous evolution and expanding applications of Large Language Models (LLMs), there has been a noticeable surge in the size of the emerging models. It is not solely the growth in model size, primarily measured by the number of parameters, but also the subsequent escalation in computational demands, hardware and software prerequisites for training, all culminating in a substantial financial investment as well. In this paper, we present novel techniques like supervision, parallelization, and scoring functions to get better results out of chains of smaller language models, rather than relying solely on scaling up model size. Firstly, we propose an approach to quantify the performance of a Smaller Language Models (SLM) by introducing a corresponding supervisor model that incrementally corrects the encountered errors. Secondly, we propose an approach to utilize two smaller language models (in a network) performing the same task and retrieving the best relevant output from the two, ensuring peak performance for a specific task. Experimental evaluations establish the quantitative accuracy improvements on financial reasoning and arithmetic calculation tasks from utilizing techniques like supervisor models (in a network of model scenario), threshold scoring and parallel processing over a baseline study.
作者 Gunika Dhingra Siddansh Chawla Vijay K. Madisetti Arshdeep Bahga Gunika Dhingra;Siddansh Chawla;Vijay K. Madisetti;Arshdeep Bahga(School of Computer Science Engineering & Technology, Bennett University, Greater Noida, India;School of Cybersecurity and Privacy, Georgia Institute of Technology, Atlanta, USA;Cloudemy Technology Labs, Chandigarh, India)
出处 《Journal of Software Engineering and Applications》 2024年第1期23-42,共20页 软件工程与应用(英文)
关键词 Large Language Models (LLMs) Smaller Language Models (SLMs) FINANCE NETWORKING Supervisor Model Scoring Function Large Language Models (LLMs) Smaller Language Models (SLMs) Finance Networking Supervisor Model Scoring Function
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部