摘要
现实垃圾数据集通常呈现严重的类别不平衡的长尾分布现象,导致传统深度学习模型在进行垃圾分类和识别任务时存在泛化性不高的问题。为此,提出一种新的数据重标记算法与框架以提升保洁机器人识别、分类垃圾的泛化程度与精确度。该算法包含特征提取、特征聚类、标签映射模块,在训练常用的分类模型时,通过分析数据集的数据分布情况,将特征提取模块的特征向量输入特征聚类模块后为每个类别生成几个子类,并为之分配一个相应的伪标签,以缓解标签层面的数据不平衡问题。同时,在预测时通过标签映射模块,将伪标签转换为真实标签。实验表明,所提算法能在不损失头部类性能的前提下,显著提升垃圾长尾数据集中尾部类的性能,重标记算法能显著提升baseline中不同类别不平衡学习方法在长尾垃圾数据集上的分类精度。
Real garbage dataset usually presents a serious long tail distribution phenomenon of unbalanced categories,which leads to the problem that the generalization of the traditional deep learning model is not high when performing waste sorting and recognition tasks.To this end,a new data re labeling algorithm and framework are proposed to improve the generalization and accuracy of cleaning robot recognition and garbage classification.This algorithm includes feature extraction,feature clustering,and label mapping modules.When training commonly used classification models,by analyzing the data distribution of the dataset,the feature vectors of the feature extraction module are input into the feature clustering module to generate several subcategories for each category,and corresponding pseudo labels are assigned to them to alle-viate the problem of data imbalance at the label level.At the same time,during prediction,pseudo labels are converted into real labels through the label mapping module.The experiment shows that the proposed algorithm can significantly improve the performance of tail classes in gar-bage long tailed datasets without losing the performance of the head class,and the relabeling algorithm can significantly improve the classifica-tion accuracy of imbalanced learning methods for different categories in the baseline on long tailed garbage datasets.
作者
王中磐
袁野
李清都
万里红
刘娜
WANG Zhongpan;YUAN Ye;LI Qingdu;WAN Lihong;LIU Na(School of Health Science and Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;School of Electronics,Information and Electrical Engineering(SEIEE),Shanghai Jiao Tong University,Shanghai 200030,China;Origin Dynamics Intelligent Robot Co.,Ltd.,Zhengzhou 450018,China)
出处
《软件导刊》
2023年第9期52-58,共7页
Software Guide
基金
国家自然科学基金项目(62006165)
上海市浦江人才计划项目(2019PJD035)
上海市人工智能创新发展专项资金项目(2019RGZN01041)。
关键词
垃圾分类
深度学习
类别不平衡学习
数据重标记
数据集分析
特征聚类
图像处理
计算机视觉
garbage classification
deep learning
class-imbalance learning
data relabeling
dataset analysis
feature clustering
image processing
computer vision