摘要
不均衡文本分类时分类结果过于倾向多数类,忽略少数类,导致分类效果较差,本文研究了基于深度学习的不均衡文本分类方法。利用类别区分能力(DA)方法选择不均衡文本特征,将评分标准设置为文档概率相关度之差的最小值,令所选取文本特征均衡分布于多数类以及少数类中,改进文本特征的均衡性。将特征选取所获取的子集作为多个受限玻尔兹曼机所构成的深度信念网络的输入,受限玻尔兹曼机通过预训练获取训练样本的最佳概率分布,利用对比分歧算法确定受限玻尔兹曼机权值,完成受限玻尔兹曼机参数设定后,利用贪婪算法迭代训练受限玻尔兹曼机,直至完成全部文本分类。实验结果表明:该方法可有效分类不均衡文本,分类精度高达99.5%以上。
In unbalanced text classification,the classification results tend to the majority and ignore the minority,which leads to poor classification effect. The unbalanced text classification method based on deep learning is studied. DA method is used to select unbalanced text features. DA method sets the scoring standard to the minimum value of the difference of document probability correlation,so that the selected text features are evenly distributed in most classes and a few classes to improve the balance of text features.The subset obtained by feature selection is used as the input of the depth belief network composed of multiple constrained Boltzmann machines. The constrained Boltzmann machine obtains the optimal probability distribution of training samples through pre training. The weight of the constrained Boltzmann machine is determined by contrast bifurcation algorithm. After the parameters of the constrained Boltzmann machine are set,the greedy algorithm is used to train the constrained Boltzmann machine iteratively until the whole process is completed text classification. Experimental results show that this method can effectively classify unbalanced text,and the classification accuracy is more than 99.5%.
作者
李晓英
杨名
全睿
谭保华
LI Xiao-ying;YANG Ming;QUAN Rui;TAN Bao-hua(Industrial Design Engineering,Hubei University of Technology,Wuhan 430068,China;Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System,Hubei University of Technology,Wuhan 430068,China;School of Science,Hubei University of Technology,Wuhan 430068,China)
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2022年第8期1889-1895,共7页
Journal of Jilin University:Engineering and Technology Edition
基金
国家自然科学基金面上项目(51977061)。
关键词
深度学习
不均衡
文本
分类方法
深度信念网络
文档概率
预训练
对比算法
deep learning
imbalance
text
classification method
deep belief network
document probability
pre-training
contrast divergence algorithm