Smaller & Smarter: Score-Driven Network Chaining of Smaller Language Models

Smaller & Smarter: Score-Driven Network Chaining of Smaller Language Models

下载PDF

导出

摘要 With the continuous evolution and expanding applications of Large Language Models (LLMs), there has been a noticeable surge in the size of the emerging models. It is not solely the growth in model size, primarily measured by the number of parameters, but also the subsequent escalation in computational demands, hardware and software prerequisites for training, all culminating in a substantial financial investment as well. In this paper, we present novel techniques like supervision, parallelization, and scoring functions to get better results out of chains of smaller language models, rather than relying solely on scaling up model size. Firstly, we propose an approach to quantify the performance of a Smaller Language Models (SLM) by introducing a corresponding supervisor model that incrementally corrects the encountered errors. Secondly, we propose an approach to utilize two smaller language models (in a network) performing the same task and retrieving the best relevant output from the two, ensuring peak performance for a specific task. Experimental evaluations establish the quantitative accuracy improvements on financial reasoning and arithmetic calculation tasks from utilizing techniques like supervisor models (in a network of model scenario), threshold scoring and parallel processing over a baseline study. With the continuous evolution and expanding applications of Large Language Models (LLMs), there has been a noticeable surge in the size of the emerging models. It is not solely the growth in model size, primarily measured by the number of parameters, but also the subsequent escalation in computational demands, hardware and software prerequisites for training, all culminating in a substantial financial investment as well. In this paper, we present novel techniques like supervision, parallelization, and scoring functions to get better results out of chains of smaller language models, rather than relying solely on scaling up model size. Firstly, we propose an approach to quantify the performance of a Smaller Language Models (SLM) by introducing a corresponding supervisor model that incrementally corrects the encountered errors. Secondly, we propose an approach to utilize two smaller language models (in a network) performing the same task and retrieving the best relevant output from the two, ensuring peak performance for a specific task. Experimental evaluations establish the quantitative accuracy improvements on financial reasoning and arithmetic calculation tasks from utilizing techniques like supervisor models (in a network of model scenario), threshold scoring and parallel processing over a baseline study.

作者 Gunika Dhingra Siddansh Chawla Vijay K. Madisetti Arshdeep Bahga Gunika Dhingra;Siddansh Chawla;Vijay K. Madisetti;Arshdeep Bahga(School of Computer Science Engineering & Technology, Bennett University, Greater Noida, India;School of Cybersecurity and Privacy, Georgia Institute of Technology, Atlanta, USA;Cloudemy Technology Labs, Chandigarh, India)

机构地区 School of Computer Science Engineering & Technology School of Cybersecurity and Privacy Cloudemy Technology Labs

出处《Journal of Software Engineering and Applications》 2024年第1期23-42,共20页 软件工程与应用（英文）

关键词 Large Language Models (LLMs) Smaller Language Models (SLMs) FINANCE NETWORKING Supervisor Model Scoring Function Large Language Models (LLMs) Smaller Language Models (SLMs) Finance Networking Supervisor Model Scoring Function

分类号 H31 [语言文字—英语]

引文网络
相关文献

1Jorge Gallego-Madrid,Ramon Sanchez-Iborra,Pedro M.Ruiz,Antonio F.Skarmeta.Machine learning-based zero-touch network and service management:a survey[J].Digital Communications and Networks,2022,8(2):105-123. 被引量：2
2Wenbing Zhao,Chenxi Huang,Yizhang Jiang.Introduction to the Special Issue on ComputerModeling for Smart Cities Applications[J].Computer Modeling in Engineering & Sciences,2024,138(2):1015-1017.
3Weihan Liu,Jiancheng Shi,Shunlin Liang,Shugui Zhou,Jie Cheng.Simultaneous retrieval of land surface temperature and emissivity from the FengYun-4A advanced geosynchronous radiation imager[J].International Journal of Digital Earth,2022,15(1):198-225. 被引量：3
4刘匡,王一凡,季波,高万鹏,林志荣,王镇.Single-flux-quantum-based qubit control with tunable driving strength[J].Chinese Physics B,2023,32(12):623-627.
5Xiaoping Ma,Jing Zhao,Limin Jia,Xiyuan Chen,Zhe Li.Optimal edge-cloud collaboration based strategies for minimizing valid latency of railway environment monitoring system[J].High-Speed Railway,2023,1(3):185-194.
6Zeba Mohsin Wase,Vijay K. Madisetti,Arshdeep Bahga.Object Detection Meets LLMs: Model Fusion for Safety and Security[J].Journal of Software Engineering and Applications,2023,16(12):672-684.
7Domestic Mineral Resource Development Is Embracing Great Opportunities for Progress[J].China Nonferrous Metals Monthly,2023(7):2-4.
8Jian Wei,Qinzhao Wang,Zixu Zhao.Interactive Transformer for Small Object Detection[J].Computers, Materials & Continua,2023,77(11):1699-1717.
9Seong Nam Hwang,Kayla Meier.Tornado Impacts in the US from 1950-2015: A GIS-Based Analysis of Vulnerability and Evolving Risk Zones for Human Casualties[J].Journal of Geographic Information System,2023,15(5):563-579.
10Mohamed Zarouan,Ibrahim M.Mehedi,Shaikh Abdul Latif,Md.Masud Rana.Gradient Optimizer Algorithm with Hybrid Deep Learning Based Failure Detection and Classification in the Industrial Environment[J].Computer Modeling in Engineering & Sciences,2024,138(2):1341-1364.

Journal of Software Engineering and Applications

2024年第1期

浏览历史

内容加载中请稍等...

Smaller & Smarter: Score-Driven Network Chaining of Smaller Language Models

相关作者

相关机构

相关主题

浏览历史