摘要
本文专注于探索与实践中文文本分类技术的演进,通过严谨的实证对比研究,检验了传统技术方法与基于大模型的先进算法在各类文本分类任务中的表现差异。研究在涵盖情感分析的基础数据集和富含复杂专业信息的多类别文本数据集上展开了深入探索,系统性地对比了传统统计学习方法、经典深度学习算法与当前极具影响力的预训练大模型(如BERT、LLM等)。研究核心围绕提升分类准确性这一关键目标,同时审视各模型在资源效率及训练时效性方面的能力。针对预训练大模型,利用了提示工程技术和模型微调手段,以期优化其性能表现。实验结果揭示了大模型在理解和利用语言上下文、提高泛化性能方面的显著优势,在不同数据集、验证集上普遍能降低10%以上的错误率,同时证实了在特定情境下传统技术依然具备独特且有效的应用价值。通过系统化的对比分析,本文旨在为中文文本分类技术的科学选型及未来发展方向提供有力依据与导向。
This paper focuses on exploring and practicing the evolution of Chinese text performance differences between traditional methods and advanced algorithms based on large models across various text classification tasks.The paper delves into extensive investigations across foundational datasets for sentiment analysis and multi-class text datasets laden with intricate professional information,systematically comparing traditional statistical learning approaches,classical deep learning algorithms,and the currently influential pre-trained large models such as BERT and LLMs.Central to the proposed research is the enhancement of classification accuracy,while concurrently assessing the resource efficiency and training time effectiveness of each model.With respect to pretrained large models,the paper employs prompt engineering techniques and model fine-tuning strategies to optimize their performance.The proposed experimental outcomes vividly demonstrate the substantial advantages of large models in understanding and leveraging linguistic context,thereby boosting generalization capabilities,universally reduces the error rate by more than 10%across diverse datasets and validation sets.Meanwhile,the proposed findings confirm the unique and effective application value of conventional techniques under specific scenarios.Through systematic comparative analyses,this study aims to provide strong evidence and direction for the scientific selection and future development path of Chinese text classification technologies.
作者
文飞
WEN Fei(ZhongZhuoxin(Beijing)Technology Co.,Ltd.,Beijing 100085,China)
出处
《智能计算机与应用》
2024年第6期88-94,共7页
Intelligent Computer and Applications
关键词
文本分类
BERT
预训练大语言模型
提示工程
微调
小样本学习
text classification
BERT
pre-trained large language models
prompt engineering
fine-tuning
few-shot learning