In recent years,deep convolution neural network has exhibited excellent performance in computer vision and has a far-reaching impact.Traditional plant taxonomic identification requires high expertise,which is time-con...In recent years,deep convolution neural network has exhibited excellent performance in computer vision and has a far-reaching impact.Traditional plant taxonomic identification requires high expertise,which is time-consuming.Most nature reserves have problems such as incomplete species surveys,inaccurate taxonomic identification,and untimely updating of status data.Simple and accurate recognition of plant images can be achieved by applying convolutional neural network technology to explore the best network model.Taking 24 typical desert plant species that are widely distributed in the nature reserves in Xinjiang Uygur Autonomous Region of China as the research objects,this study established an image database and select the optimal network model for the image recognition of desert plant species to provide decision support for fine management in the nature reserves in Xinjiang,such as species investigation and monitoring,by using deep learning.Since desert plant species were not included in the public dataset,the images used in this study were mainly obtained through field shooting and downloaded from the Plant Photo Bank of China(PPBC).After the sorting process and statistical analysis,a total of 2331 plant images were finally collected(2071 images from field collection and 260 images from the PPBC),including 24 plant species belonging to 14 families and 22 genera.A large number of numerical experiments were also carried out to compare a series of 37 convolutional neural network models with good performance,from different perspectives,to find the optimal network model that is most suitable for the image recognition of desert plant species in Xinjiang.The results revealed 24 models with a recognition Accuracy,of greater than 70.000%.Among which,Residual Network X_8GF(RegNetX_8GF)performs the best,with Accuracy,Precision,Recall,and F1(which refers to the harmonic mean of the Precision and Recall values)values of 78.33%,77.65%,69.55%,and 71.26%,respectively.Considering the demand factors of hardware equipment and inference time,Mobile NetworkV2 achieves the best balance among the Accuracy,the number of parameters and the number of floating-point operations.The number of parameters for Mobile Network V2(MobileNetV2)is 1/16 of RegNetX_8GF,and the number of floating-point operations is 1/24.Our findings can facilitate efficient decision-making for the management of species survey,cataloging,inspection,and monitoring in the nature reserves in Xinjiang,providing a scientific basis for the protection and utilization of natural plant resources.展开更多
Violence recognition is crucial because of its applications in activities related to security and law enforcement.Existing semi-automated systems have issues such as tedious manual surveillances,which causes human err...Violence recognition is crucial because of its applications in activities related to security and law enforcement.Existing semi-automated systems have issues such as tedious manual surveillances,which causes human errors and makes these systems less effective.Several approaches have been proposed using trajectory-based,non-object-centric,and deep-learning-based methods.Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods.However,the their performance must be improved.This study explores the state-of-the-art deep learning architecture of convolutional neural networks(CNNs)and inception V4 to detect and recognize violence using video data.In the proposed framework,the keyframe extraction technique eliminates duplicate consecutive frames.This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames.For feature selection and classification tasks,the applied sequential CNN uses one kernel size,whereas the inception v4 CNN uses multiple kernels for different layers of the architecture.For empirical analysis,four widely used standard datasets are used with diverse activities.The results confirm that the proposed approach attains 98%accuracy,reduces the computational cost,and outperforms the existing techniques of violence detection and recognition.展开更多
<div style="text-align:justify;"> Load identification method is one of the major technical difficulties of non-intrusive composite monitoring. Binary V-I trajectory image can reflect the original V-I t...<div style="text-align:justify;"> Load identification method is one of the major technical difficulties of non-intrusive composite monitoring. Binary V-I trajectory image can reflect the original V-I trajectory characteristics to a large extent, so it is widely used in load identification. However, using single binary V-I trajectory feature for load identification has certain limitations. In order to improve the accuracy of load identification, the power feature is added on the basis of the binary V-I trajectory feature in this paper. We change the initial binary V-I trajectory into a new 3D feature by mapping the power feature to the third dimension. In order to reduce the impact of imbalance samples on load identification, the SVM SMOTE algorithm is used to balance the samples. Based on the deep learning method, the convolutional neural network model is used to extract the newly produced 3D feature to achieve load identification in this paper. The results indicate the new 3D feature has better observability and the proposed model has higher identification performance compared with other classification models on the public data set PLAID. </div>展开更多
One of the most basic and difficult areas of computer vision and image understanding applications is still object detection. Deep neural network models and enhanced object representation have led to significant progre...One of the most basic and difficult areas of computer vision and image understanding applications is still object detection. Deep neural network models and enhanced object representation have led to significant progress in object detection. This research investigates in greater detail how object detection has changed in the recent years in the deep learning age. We provide an overview of the literature on a range of cutting-edge object identification algorithms and the theoretical underpinnings of these techniques. Deep learning technologies are contributing to substantial innovations in the field of object detection. While Convolutional Neural Networks (CNN) have laid a solid foundation, new models such as You Only Look Once (YOLO) and Vision Transformers (ViTs) have expanded the possibilities even further by providing high accuracy and fast detection in a variety of settings. Even with these developments, integrating CNN, YOLO and ViTs, into a coherent framework still poses challenges with juggling computing demand, speed, and accuracy especially in dynamic contexts. Real-time processing in applications like surveillance and autonomous driving necessitates improvements that take use of each model type’s advantages. The goal of this work is to provide an object detection system that maximizes detection speed and accuracy while decreasing processing requirements by integrating YOLO, CNN, and ViTs. Improving real-time detection performance in changing weather and light exposure circumstances, as well as detecting small or partially obscured objects in crowded cities, are among the goals. We provide a hybrid architecture which leverages CNN for robust feature extraction, YOLO for rapid detection, and ViTs for remarkable global context capture via self-attention techniques. Using an innovative training regimen that prioritizes flexible learning rates and data augmentation procedures, the model is trained on an extensive dataset of urban settings. Compared to solo YOLO, CNN, or ViTs models, the suggested model exhibits an increase in detection accuracy. This improvement is especially noticeable in difficult situations such settings with high occlusion and low light. In addition, it attains a decrease in inference time in comparison to baseline models, allowing real-time object detection without performance loss. This work introduces a novel method of object identification that integrates CNN, YOLO and ViTs, in a synergistic way. The resultant framework extends the use of integrated deep learning models in practical applications while also setting a new standard for detection performance under a variety of conditions. Our research advances computer vision by providing a scalable and effective approach to object identification problems. Its possible uses include autonomous navigation, security, and other areas.展开更多
实时目标检测YOLO(you only look once)算法存在检测精度低和网络模型训练速度慢等问题,对此,结合批再规范化算法处理小批样本以及非独立同分布数据的优势,引入批再规范化处理对YOLO网络结构予以改进,即把卷积层中经卷积运算产生的特征...实时目标检测YOLO(you only look once)算法存在检测精度低和网络模型训练速度慢等问题,对此,结合批再规范化算法处理小批样本以及非独立同分布数据的优势,引入批再规范化处理对YOLO网络结构予以改进,即把卷积层中经卷积运算产生的特征图看做神经元,并对其进行规范化处理。同时,在网络结构中移除dropout,并增大网络训练的学习率。实验结果表明,该改进算法相对于原YOLO算法具有更高的检测精度、更快的实时检测速度,并且通过适当设置批样本大小可使网络模型在训练时间和硬件设备方面成本有一定的降低。展开更多
针对现有人脸检测深度学习算法计算量大,难以移植到嵌入式平台,无法满足移动设备实时性和便捷性需求的问题,提出一种基于YOLO(You Only Look Once)算法的适用于嵌入式平台的小型人脸检测网络E-YOLO(Enhance-YOLO)。借鉴YOLO算法的思想,...针对现有人脸检测深度学习算法计算量大,难以移植到嵌入式平台,无法满足移动设备实时性和便捷性需求的问题,提出一种基于YOLO(You Only Look Once)算法的适用于嵌入式平台的小型人脸检测网络E-YOLO(Enhance-YOLO)。借鉴YOLO算法的思想,将人脸检测问题转换为回归问题,将待检测的图像均分为S×S个单元格,每个单元格检测落在单元格内的目标。通过修改YOLO网络模型中的卷积神经网络结构,提高其检测的准确性,同时减少网络结构中卷积核的数目,降低模型的大小。实验结果表明,E-YOLO模型大小为43MB,视频的检测帧率为26FPS,在WIDERFACE和FDDB数据集上均有较高的准确率和检测速度,可以实现在嵌入式平台下的实时人脸检测。展开更多
基金supported by the West Light Foundation of the Chinese Academy of Sciences(2019-XBQNXZ-A-007)the National Natural Science Foundation of China(12071458,71731009).
文摘In recent years,deep convolution neural network has exhibited excellent performance in computer vision and has a far-reaching impact.Traditional plant taxonomic identification requires high expertise,which is time-consuming.Most nature reserves have problems such as incomplete species surveys,inaccurate taxonomic identification,and untimely updating of status data.Simple and accurate recognition of plant images can be achieved by applying convolutional neural network technology to explore the best network model.Taking 24 typical desert plant species that are widely distributed in the nature reserves in Xinjiang Uygur Autonomous Region of China as the research objects,this study established an image database and select the optimal network model for the image recognition of desert plant species to provide decision support for fine management in the nature reserves in Xinjiang,such as species investigation and monitoring,by using deep learning.Since desert plant species were not included in the public dataset,the images used in this study were mainly obtained through field shooting and downloaded from the Plant Photo Bank of China(PPBC).After the sorting process and statistical analysis,a total of 2331 plant images were finally collected(2071 images from field collection and 260 images from the PPBC),including 24 plant species belonging to 14 families and 22 genera.A large number of numerical experiments were also carried out to compare a series of 37 convolutional neural network models with good performance,from different perspectives,to find the optimal network model that is most suitable for the image recognition of desert plant species in Xinjiang.The results revealed 24 models with a recognition Accuracy,of greater than 70.000%.Among which,Residual Network X_8GF(RegNetX_8GF)performs the best,with Accuracy,Precision,Recall,and F1(which refers to the harmonic mean of the Precision and Recall values)values of 78.33%,77.65%,69.55%,and 71.26%,respectively.Considering the demand factors of hardware equipment and inference time,Mobile NetworkV2 achieves the best balance among the Accuracy,the number of parameters and the number of floating-point operations.The number of parameters for Mobile Network V2(MobileNetV2)is 1/16 of RegNetX_8GF,and the number of floating-point operations is 1/24.Our findings can facilitate efficient decision-making for the management of species survey,cataloging,inspection,and monitoring in the nature reserves in Xinjiang,providing a scientific basis for the protection and utilization of natural plant resources.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2018R1D1A1B07042967)the Soonchunhyang University Research Fund.
文摘Violence recognition is crucial because of its applications in activities related to security and law enforcement.Existing semi-automated systems have issues such as tedious manual surveillances,which causes human errors and makes these systems less effective.Several approaches have been proposed using trajectory-based,non-object-centric,and deep-learning-based methods.Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods.However,the their performance must be improved.This study explores the state-of-the-art deep learning architecture of convolutional neural networks(CNNs)and inception V4 to detect and recognize violence using video data.In the proposed framework,the keyframe extraction technique eliminates duplicate consecutive frames.This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames.For feature selection and classification tasks,the applied sequential CNN uses one kernel size,whereas the inception v4 CNN uses multiple kernels for different layers of the architecture.For empirical analysis,four widely used standard datasets are used with diverse activities.The results confirm that the proposed approach attains 98%accuracy,reduces the computational cost,and outperforms the existing techniques of violence detection and recognition.
文摘<div style="text-align:justify;"> Load identification method is one of the major technical difficulties of non-intrusive composite monitoring. Binary V-I trajectory image can reflect the original V-I trajectory characteristics to a large extent, so it is widely used in load identification. However, using single binary V-I trajectory feature for load identification has certain limitations. In order to improve the accuracy of load identification, the power feature is added on the basis of the binary V-I trajectory feature in this paper. We change the initial binary V-I trajectory into a new 3D feature by mapping the power feature to the third dimension. In order to reduce the impact of imbalance samples on load identification, the SVM SMOTE algorithm is used to balance the samples. Based on the deep learning method, the convolutional neural network model is used to extract the newly produced 3D feature to achieve load identification in this paper. The results indicate the new 3D feature has better observability and the proposed model has higher identification performance compared with other classification models on the public data set PLAID. </div>
文摘One of the most basic and difficult areas of computer vision and image understanding applications is still object detection. Deep neural network models and enhanced object representation have led to significant progress in object detection. This research investigates in greater detail how object detection has changed in the recent years in the deep learning age. We provide an overview of the literature on a range of cutting-edge object identification algorithms and the theoretical underpinnings of these techniques. Deep learning technologies are contributing to substantial innovations in the field of object detection. While Convolutional Neural Networks (CNN) have laid a solid foundation, new models such as You Only Look Once (YOLO) and Vision Transformers (ViTs) have expanded the possibilities even further by providing high accuracy and fast detection in a variety of settings. Even with these developments, integrating CNN, YOLO and ViTs, into a coherent framework still poses challenges with juggling computing demand, speed, and accuracy especially in dynamic contexts. Real-time processing in applications like surveillance and autonomous driving necessitates improvements that take use of each model type’s advantages. The goal of this work is to provide an object detection system that maximizes detection speed and accuracy while decreasing processing requirements by integrating YOLO, CNN, and ViTs. Improving real-time detection performance in changing weather and light exposure circumstances, as well as detecting small or partially obscured objects in crowded cities, are among the goals. We provide a hybrid architecture which leverages CNN for robust feature extraction, YOLO for rapid detection, and ViTs for remarkable global context capture via self-attention techniques. Using an innovative training regimen that prioritizes flexible learning rates and data augmentation procedures, the model is trained on an extensive dataset of urban settings. Compared to solo YOLO, CNN, or ViTs models, the suggested model exhibits an increase in detection accuracy. This improvement is especially noticeable in difficult situations such settings with high occlusion and low light. In addition, it attains a decrease in inference time in comparison to baseline models, allowing real-time object detection without performance loss. This work introduces a novel method of object identification that integrates CNN, YOLO and ViTs, in a synergistic way. The resultant framework extends the use of integrated deep learning models in practical applications while also setting a new standard for detection performance under a variety of conditions. Our research advances computer vision by providing a scalable and effective approach to object identification problems. Its possible uses include autonomous navigation, security, and other areas.
文摘实时目标检测YOLO(you only look once)算法存在检测精度低和网络模型训练速度慢等问题,对此,结合批再规范化算法处理小批样本以及非独立同分布数据的优势,引入批再规范化处理对YOLO网络结构予以改进,即把卷积层中经卷积运算产生的特征图看做神经元,并对其进行规范化处理。同时,在网络结构中移除dropout,并增大网络训练的学习率。实验结果表明,该改进算法相对于原YOLO算法具有更高的检测精度、更快的实时检测速度,并且通过适当设置批样本大小可使网络模型在训练时间和硬件设备方面成本有一定的降低。
文摘针对现有人脸检测深度学习算法计算量大,难以移植到嵌入式平台,无法满足移动设备实时性和便捷性需求的问题,提出一种基于YOLO(You Only Look Once)算法的适用于嵌入式平台的小型人脸检测网络E-YOLO(Enhance-YOLO)。借鉴YOLO算法的思想,将人脸检测问题转换为回归问题,将待检测的图像均分为S×S个单元格,每个单元格检测落在单元格内的目标。通过修改YOLO网络模型中的卷积神经网络结构,提高其检测的准确性,同时减少网络结构中卷积核的数目,降低模型的大小。实验结果表明,E-YOLO模型大小为43MB,视频的检测帧率为26FPS,在WIDERFACE和FDDB数据集上均有较高的准确率和检测速度,可以实现在嵌入式平台下的实时人脸检测。