摘要
对短文本所含信息量缺乏而导致分类准确度难以提升的问题进行研究,提出一种融合概率类别特征增强的短文本分类网络模型FT_BDCNN。将N-gram处理后产生的N元词典通过TF-IDF分离出具有概率类别区分度的特征信息(FT模块);将向量化表示后的文本信息输入到改进后的特征提取模块中;将两个模块的输出进行特征融合,完成文本分类。实验结果表明,所提模型在THUCNews数据集上的F1值达到91.91%。FT模块可以与现有分类模型进行融合,提升模型的分类性能。
The problem of difficulty in improving classification accuracy due to the lack of information contained in short text was studied,and a short text classification network model,FT_BDCNN,based on the enhancement of probability category features,was proposed.The N-gram was processed to generate an N-element dictionary,and then TF-IDF was used to separate out the feature information with probability category discrimination(FT module).The text information after vectorization representation was input into the improved feature extraction module.The outputs of the two modules were fused with features to complete the text classification.Experimental results show that the F 1 value of the proposed model on the THUCNews dataset reaches 91.91%.The FT module can be integrated with existing classification models to improve the classification performance of the model.
作者
廖列法
李奎
姚秀
LIAO Lie-fa;LI Kui;YAO Xiu(School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China;Dean Office,Jiangxi Modern Polytechnic College,Nanchang 330095,China)
出处
《计算机工程与设计》
北大核心
2024年第7期2074-2081,共8页
Computer Engineering and Design
基金
国家自然科学基金项目(71462018、71761018)。
关键词
类别特征增强
短文本
双池化
特征融合
统计算法
快速分类
深度学习
category feature enhancement
short text
double pooling
feature fusion
statistical algorithms
quick classification
deep learning