摘要
事件相机具有高时间分辨率、高动态范围和低功耗等特性,通常被用于传统相机应用受限场景(高速度、强光、弱光等)下的目标检测任务中。然而由于事件相机的像素异步性,其输出的事件序列难以进行人工标注,为此现有方法通过RGB图像标记迁移得到事件序列标记。然而,迁移标记中存在大量噪声标记和事件序列中部分目标纹理模糊,导致难以取得理想的模型性能。为了解决此问题,提出了一种跨模态噪声过滤的事件相机目标检测算法。算法利用预训练后的事件相机检测器对开源RGB目标检测数据集进行筛选,得到对训练事件相机检测器最具价值的RGB图像和事件图像一起构成跨模态混合图像,帮助检测器更准确地识别、定位事件图像目标;为了缓解噪声标记对检测器性能的影响,设计了一种多阶段目标检测联合优化策略,单个阶段训练完成时,在全局标记中识别噪声标记,并对噪声标记进行修正后在下一阶段使用。实验结果表明,在1Mpx Detection Dataset上,与基准模型相比,跨模态噪声过滤的事件相机目标检测算法提供了8.35%的模型增益,远优于Co-teaching,O2U-net等噪声标签学习方法,具体地,跨模态混合图像训练、联合优化框架分别提供了6.44%,4.77%的模型增益。
Event-based camera is commonly seen in object detection in limited scenarios for traditional camera applications(high speed,strong light,low light,etc.)due to their high time resolution,high dynamic range and low power consumption.However,the event sequence output of event camera is difficult to be manually labeled due to its pixel asynchronism,so the existing me-thods obtain event sequence annotations through the migration of RGB image annotations.However,since the migrated annotations have numerous inaccurate bounding boxes and some object textures in event sequence are fuzzy,leading to poor model performance.To address this problem,event-based camera object detection algorithm for cross-modal noisy annotations filtering is proposed.The method uses a pre-trained event-based camera detector to filter open-source RGB object detection datasets and selects RGB images that are most valuable for training the event-based camera detector.These selected RGB images are combined with event images to construct cross-domain mixed images,helping the detector to identify and locate the event image object more accurately.To mitigate the impact of noisy annotations on detector performance,a multi-stage object detection joint optimization strategy is designed.After each stage of training is completed,noisy annotations are identified in the global annotations and are corrected use in the next stage.Experimental results show that,on the 1Mpx Detection Dataset,the robust event-based camera cross-modal object detection method based on noisy annotations provides 8.35% model gain compared to the baseline model,significantly outperforming noise-label learning methods such as Co-teaching and O2U-net.Specifically,cross-modal hybrid images training and joint optimization frameworks offer model gains of 6.44% and 4.77%,respectively.
作者
胡刚
梁栋
黄圣君
HU Gang;LIANG Dong;HUANG Shengjun(School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出处
《计算机科学》
CSCD
北大核心
2024年第S02期242-247,共6页
Computer Science
关键词
事件相机
目标检测
噪声标记
跨模态
联合优化
Event-based camera
Object detection
Noisy annotations
Cross-modal
Joint optimization