期刊文献+

传统与大模型并举:中文文本分类技术对比研究

Comparative study on traditional and large model-based techniques for Chinese text classification: Leveraging both paradigms
下载PDF
导出
摘要 本文专注于探索与实践中文文本分类技术的演进,通过严谨的实证对比研究,检验了传统技术方法与基于大模型的先进算法在各类文本分类任务中的表现差异。研究在涵盖情感分析的基础数据集和富含复杂专业信息的多类别文本数据集上展开了深入探索,系统性地对比了传统统计学习方法、经典深度学习算法与当前极具影响力的预训练大模型(如BERT、LLM等)。研究核心围绕提升分类准确性这一关键目标,同时审视各模型在资源效率及训练时效性方面的能力。针对预训练大模型,利用了提示工程技术和模型微调手段,以期优化其性能表现。实验结果揭示了大模型在理解和利用语言上下文、提高泛化性能方面的显著优势,在不同数据集、验证集上普遍能降低10%以上的错误率,同时证实了在特定情境下传统技术依然具备独特且有效的应用价值。通过系统化的对比分析,本文旨在为中文文本分类技术的科学选型及未来发展方向提供有力依据与导向。 This paper focuses on exploring and practicing the evolution of Chinese text performance differences between traditional methods and advanced algorithms based on large models across various text classification tasks.The paper delves into extensive investigations across foundational datasets for sentiment analysis and multi-class text datasets laden with intricate professional information,systematically comparing traditional statistical learning approaches,classical deep learning algorithms,and the currently influential pre-trained large models such as BERT and LLMs.Central to the proposed research is the enhancement of classification accuracy,while concurrently assessing the resource efficiency and training time effectiveness of each model.With respect to pretrained large models,the paper employs prompt engineering techniques and model fine-tuning strategies to optimize their performance.The proposed experimental outcomes vividly demonstrate the substantial advantages of large models in understanding and leveraging linguistic context,thereby boosting generalization capabilities,universally reduces the error rate by more than 10%across diverse datasets and validation sets.Meanwhile,the proposed findings confirm the unique and effective application value of conventional techniques under specific scenarios.Through systematic comparative analyses,this study aims to provide strong evidence and direction for the scientific selection and future development path of Chinese text classification technologies.
作者 文飞 WEN Fei(ZhongZhuoxin(Beijing)Technology Co.,Ltd.,Beijing 100085,China)
出处 《智能计算机与应用》 2024年第6期88-94,共7页 Intelligent Computer and Applications
关键词 文本分类 BERT 预训练大语言模型 提示工程 微调 小样本学习 text classification BERT pre-trained large language models prompt engineering fine-tuning few-shot learning
  • 相关文献

参考文献4

二级参考文献10

共引文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部