State Space Models Based Efficient Long Documents Classification

State Space Models Based Efficient Long Documents Classification

下载PDF

导出

摘要 Large language models like Generative Pretrained Transformer (GPT) have significantly advanced natural language processing (NLP) in recent times. They have excelled in tasks such as language translation question answering and text generation. However, their effectiveness is limited by the quadratic training complexity of Transformer models O (L2), which makes it challenging to handle complex tasks like classifying long documents. To overcome this challenge researchers have explored architectures and techniques such as sparse attention mechanisms, hierarchical processing and efficient attention modules. A recent innovation called Mamba based on a state space model approach offers inference speed and scalability in sequence length due to its unique selection mechanism. By incorporating this selection mechanism Mamba allows for context reasoning and targeted focus on particular inputs thereby reducing computational costs and enhancing performance. Despite its advantages, the application of Mamba in long document classification has not been thoroughly investigated. This study aims to fill this gap by developing a Mamba-based model, for long document classification and assessing its efficacy across four datasets;Hyperpartisan, 20 Newsgroups, EURLEX and CMU Book Summary. Our study reveals that the Mamba model surpasses NLP models such as BERT and Longformer showcasing exceptional performance and highlighting Mamba’s efficiency in handling lengthy document classification tasks. These results hold implications for NLP applications empowering advanced language models to address challenging tasks with extended sequences and enhanced effectiveness. This study opens doors for the exploration of Mamba’s abilities and its potential utilization, across diverse NLP domains. Large language models like Generative Pretrained Transformer (GPT) have significantly advanced natural language processing (NLP) in recent times. They have excelled in tasks such as language translation question answering and text generation. However, their effectiveness is limited by the quadratic training complexity of Transformer models O (L2), which makes it challenging to handle complex tasks like classifying long documents. To overcome this challenge researchers have explored architectures and techniques such as sparse attention mechanisms, hierarchical processing and efficient attention modules. A recent innovation called Mamba based on a state space model approach offers inference speed and scalability in sequence length due to its unique selection mechanism. By incorporating this selection mechanism Mamba allows for context reasoning and targeted focus on particular inputs thereby reducing computational costs and enhancing performance. Despite its advantages, the application of Mamba in long document classification has not been thoroughly investigated. This study aims to fill this gap by developing a Mamba-based model, for long document classification and assessing its efficacy across four datasets;Hyperpartisan, 20 Newsgroups, EURLEX and CMU Book Summary. Our study reveals that the Mamba model surpasses NLP models such as BERT and Longformer showcasing exceptional performance and highlighting Mamba’s efficiency in handling lengthy document classification tasks. These results hold implications for NLP applications empowering advanced language models to address challenging tasks with extended sequences and enhanced effectiveness. This study opens doors for the exploration of Mamba’s abilities and its potential utilization, across diverse NLP domains.

作者 Bo Song Yuanhao Xu Penghao Liang Yichao Wu Bo Song;Yuanhao Xu;Penghao Liang;Yichao Wu(Khoury College of Computer Science, Northeastern University, Boston, MA, USA)

机构地区 Khoury College of Computer Science

出处《Journal of Intelligent Learning Systems and Applications》 2024年第3期143-154,共12页 智能学习系统与应用（英文）

关键词 Mamba TRANSFORMER NLP Mamba Transformer NLP

分类号 H31 [语言文字—英语]

引文网络
相关文献

1YAN Jing,SI Zhan-jun,ZHANG Ying-xue.Research on the LA-UMamba Model for Asymmetric Modules with Added Auxiliary Information[J].印刷与数字媒体技术研究,2024(4):56-66.
2Lirong Yin,Lei Wang,Siyu Lu,Ruiyang Wang,Youshuai Yang,Bo Yang,Shan Liu,Ahmed AlSanad,Salman A.AlQahtani,Zhengtong Yin,Xiaolu Li,Xiaobing Chen,Wenfeng Zheng.Convolution-Transformer for Image Feature Extraction[J].Computer Modeling in Engineering & Sciences,2024,141(10):87-106.
3Ziwang FU,Feng LIU,Qing XU,Xiangling FU,Jiayin QI.LMR-CBT: learning modality-fused representations with CB-Transformer for multimodal emotion recognition from unaligned multimodal sequences[J].Frontiers of Computer Science,2024,18(4):39-47.
4Zhe Wang.A Study on the Impact of Voice-to-Text Technology on Academic Achievement of the Hearing-Impaired[J].Journal of Contemporary Educational Research,2024,8(8):276-282.
5Pengming Wang,Zijiang Zhu,Qing Chen,Weihuang Dai.Text Reasoning Chain Extraction for Multi-Hop Question Answering[J].Tsinghua Science and Technology,2024,29(4):959-970.
6Yang Fang,Bailian Xie,Uswah Khairuddin,Zijian Min,Bingbing Jiang,Weisheng Li.DPT‐tracker:Dual pooling transformer for efficient visual tracking[J].CAAI Transactions on Intelligence Technology,2024,9(4):948-959.
7Rui Qiu,Bo Zhang,Wei Zhao,Ren-Fu Tu,Man-Qiu He,Qi Liao,Yong-Tu Liang.An integrated MINLP model for multi-party coordination in downstream oil supply chain[J].Petroleum Science,2024,21(3):2066-2079.
8刘子萱.红楼梦中诅咒语的翻译对比研究[J].现代语言学,2024,12(8):167-172.

Journal of Intelligent Learning Systems and Applications

2024年第3期

浏览历史

内容加载中请稍等...

State Space Models Based Efficient Long Documents Classification

相关作者

相关机构

相关主题

浏览历史