摘要
目标检测是计算机视觉的重要分支,目前基于深度学习的目标检测算法相较于传统目标检测算法在检测精度和检测时间上虽能略胜一筹,但其难以同时兼顾检测速度与检测精度,因此针对这一问题提出了改进YOLOv3后的Mul-YOLO目标检测网络。Mul-YOLO目标检测网络利用Haar小波进行数据预处理,将图像信息的低频特征在不同分辨率下层层分解,用以获得水平方向、垂直方向以及斜对角方向上的高频特征,进而利用高频特征记录的相应特征信息,减小被检测目标在几何状态变化、光照变化和背景变化下对检测精度带来的负面影响。在特征层上采样、卷积和拼接的过程中融入高阶计算,由此增强在有限的感受野内的特征表述能力,使得训练网络更加关注映射特征的显著性信息,增强了图像的分辨率,有效地减少了数据集训练过程中由连续的卷积和池化带来的信息丢失问题。在PASCAL VOC数据集下的实验结果表明,本文提出的Mul-YOLO目标检测模型相较于传统目标检测模型有了明显的改进,比如相较于Faster R-CNN ResNet提取特征的方法,mAP提高了8.97%,并且单张图片的检测时间提高了172 ms。与YOLOv3提取特征的方法相比,其mAP提高了33.48%,达到了检测精度与检测时间同时相得益彰的目的,综合其他比较结果,本文方法的有效性可以有效地得以验证。
Target detection is an important branch of computer vision,although the current target detection approaches based on deep learning can solve the issues that are usually caused by traditional target detection methods in detection accuracy and detection time,it is still difficult to take both detection speed and detection accuracy into account.Therefore,this paper proposes the Mul-YOLO target detection network based on the improved YOLOv3,which uses Haar wavelet for data preprocessing,decomposes low-frequency features of image information layer by layer in different resolutions,and then obtains high-frequency features in horizontal,vertical and diagonal directions.The information recorded by the aforementioned high-frequency features can reduce the negative effects to detection accuracy that are usually brought by geometric state change,illumination change and background change.Convolution and concatenating on the feature layer in combination with the third-order calculation are integrated,and the feature extraction which makes the training network pay more attention to the significant information of the mapping features,is strengthened in the limited receptive field.This enhances the image resolution,and makes up for the problem of information loss caused by continuous convolution and pooling in the data set training process.The experimental results on PASCAL VOC data sets show that the proposed Mul-YOLO target detection approach has obvious improvements compared with the previous generation of target detection model.For example,mAP is improved by 8.97%compared with the Faster R-CNN ResNet feature extraction method,the detection time of single image is decreased by 172 ms,while mAP is increased by 30.48%compared with the YOLOv3 feature extraction method,achieveing the purpose that detection accuracy and detection time complement each other at the same time.The detection accuracy is therefore improved,and the detection time remains unchanged and the effectiveness of proposed approaches can be guaranteed also.
作者
严晨旭
邵海见
邓星
YAN Chen-xu;SHAO Hai-jian;DENG Xing(School of Computer Science,Jiangsu University of Science and Technology,Zhenjiang 212003,Jiangsu,China;College of Automation Key Laboratory of Ministry of Education of Complex Engineering System Measurement and Control,Southeast University,Nanjing 210009,Jiangsu,China)
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2022年第3期20-30,共11页
Journal of Shandong University(Natural Science)
基金
国家自然科学青年科学基金资助项目(61806087,61902158)。