Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems,...Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems, including spectral, polarization, and infrared technologies, there is still a lack of effective real-time method for accurately detecting small-size and high-efficient camouflaged people in complex real-world scenes. Here, this study proposes a snapshot multispectral image-based camouflaged detection model, multispectral YOLO(MS-YOLO), which utilizes the SPD-Conv and Sim AM modules to effectively represent targets and suppress background interference by exploiting the spatial-spectral target information. Besides, the study constructs the first real-shot multispectral camouflaged people dataset(MSCPD), which encompasses diverse scenes, target scales, and attitudes. To minimize information redundancy, MS-YOLO selects an optimal subset of 12 bands with strong feature representation and minimal inter-band correlation as input. Through experiments on the MSCPD, MS-YOLO achieves a mean Average Precision of 94.31% and real-time detection at 65 frames per second, which confirms the effectiveness and efficiency of our method in detecting camouflaged people in various typical desert and forest scenes. Our approach offers valuable support to improve the perception capabilities of unmanned aerial vehicles in detecting enemy forces and rescuing personnel in battlefield.展开更多
Automatic control technology is the basis of road robot improvement,according to the characteristics of construction equipment and functions,the research will be input type perception from positioning acquisition,real...Automatic control technology is the basis of road robot improvement,according to the characteristics of construction equipment and functions,the research will be input type perception from positioning acquisition,real-world monitoring,the process will use RTK-GNSS positional perception technology,by projecting the left side of the earth from Gauss-Krueger projection method,and then carry out the Cartesian conversion based on the characteristics of drawing;steering control system is the core of the electric drive unmanned module,on the basis of the analysis of the composition of the steering system of unmanned engineering vehicles,the steering system key components such as direction,torque sensor,drive motor and other models are established,the joint simulation model of unmanned engineering vehicles is established,the steering controller is designed using the PID method,the simulation results show that the control method can meet the construction path demand for automatic steering.The path planning will first formulate the construction area with preset values and realize the steering angle correction during driving by PID algorithm,and never realize the construction-based path planning,and the results show that the method can control the straight path within the error of 10 cm and the curve error within 20 cm.With the collaboration of various modules,the automatic construction simulation results of this robot show that the design path and control method is effective.展开更多
Detecting highly-overlapped objects in crowded scenes remains a challenging problem,especially for one-stage detector.In this paper,we extricate YOLOv4 from the dilemma in a crowd by fine-tuning its detection scheme,n...Detecting highly-overlapped objects in crowded scenes remains a challenging problem,especially for one-stage detector.In this paper,we extricate YOLOv4 from the dilemma in a crowd by fine-tuning its detection scheme,named YOLO-CS.Specifically,we give YOLOv4 the power to detect multiple objects in one cell.Center to our method is the carefully designed joint prediction scheme,which is executed through an assignment of bounding boxes and a joint loss.Equipped with the derived joint-object augmentation(DJA),refined regression loss(RL)and Score-NMS(SN),YOLO-CS achieves competitive detection performance on CrowdHuman and CityPersons benchmarks compared with state-of-the-art detectors at the cost of little time.Furthermore,on the widely used general benchmark COCO,YOLOCS still has a good performance,indicating its robustness to various scenes.展开更多
The analysis of overcrowded areas is essential for flow monitoring,assembly control,and security.Crowd counting’s primary goal is to calculate the population in a given region,which requires real-time analysis of con...The analysis of overcrowded areas is essential for flow monitoring,assembly control,and security.Crowd counting’s primary goal is to calculate the population in a given region,which requires real-time analysis of congested scenes for prompt reactionary actions.The crowd is always unexpected,and the benchmarked available datasets have a lot of variation,which limits the trained models’performance on unseen test data.In this paper,we proposed an end-to-end deep neural network that takes an input image and generates a density map of a crowd scene.The proposed model consists of encoder and decoder networks comprising batch-free normalization layers known as evolving normalization(EvoNorm).This allows our network to be generalized for unseen data because EvoNorm is not using statistics from the training samples.The decoder network uses dilated 2D convolutional layers to provide large receptive fields and fewer parameters,which enables real-time processing and solves the density drift problem due to its large receptive field.Five benchmark datasets are used in this study to assess the proposed model,resulting in the conclusion that it outperforms conventional models.展开更多
Weather is a key factor affecting the control of air traffic.Accurate recognition and classification of similar weather scenes in the terminal area is helpful for rapid decision-making in air trafficflow management.Curren...Weather is a key factor affecting the control of air traffic.Accurate recognition and classification of similar weather scenes in the terminal area is helpful for rapid decision-making in air trafficflow management.Current researches mostly use traditional machine learning methods to extract features of weather scenes,and clustering algorithms to divide similar scenes.Inspired by the excellent performance of deep learning in image recognition,this paper proposes a terminal area similar weather scene classification method based on improved deep convolution embedded clustering(IDCEC),which uses the com-bination of the encoding layer and the decoding layer to reduce the dimensionality of the weather image,retaining useful information to the greatest extent,and then uses the combination of the pre-trained encoding layer and the clustering layer to train the clustering model of the similar scenes in the terminal area.Finally,term-inal area of Guangzhou Airport is selected as the research object,the method pro-posed in this article is used to classify historical weather data in similar scenes,and the performance is compared with other state-of-the-art methods.The experi-mental results show that the proposed IDCEC method can identify similar scenes more accurately based on the spatial distribution characteristics and severity of weather;at the same time,compared with the actualflight volume in the Guangz-hou terminal area,IDCEC's recognition results of similar weather scenes are con-sistent with the recognition of experts in thefield.展开更多
In European thought and culture,there exists a group of passionate artists who are fascinated by the intention,passion,and richness of artistic expression.They strive to establish connections between different art for...In European thought and culture,there exists a group of passionate artists who are fascinated by the intention,passion,and richness of artistic expression.They strive to establish connections between different art forms.Musicians not only attempt to represent masterpieces through the language of music but also aim to convey subjective experiences of emotions and personal imagination to listeners by adding titles to their musical works.This study examines two pieces,“Scenes of Childhood”and“Children’s Garden”,and analyzes the different approaches employed by the composers in portraying similar content.展开更多
Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes o...Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes or scattering media,is also evolving.Existing underwater 3D reconstruction systems still face challenges such as long training times and low rendering efficiency.This paper proposes an improved underwater 3D reconstruction system to achieve rapid and high-quality 3D reconstruction.First,we enhance underwater videos captured by a monocular camera to correct the image quality degradation caused by the physical properties of the water medium and ensure consistency in enhancement across frames.Then,we perform keyframe selection to optimize resource usage and reduce the impact of dynamic objects on the reconstruction results.After pose estimation using COLMAP,the selected keyframes undergo 3D reconstruction using neural radiance fields(NeRF)based on multi-resolution hash encoding for model construction and rendering.In terms of image enhancement,our method has been optimized in certain scenarios,demonstrating effectiveness in image enhancement and better continuity between consecutive frames of the same data.In terms of 3D reconstruction,our method achieved a peak signal-to-noise ratio(PSNR)of 18.40 dB and a structural similarity(SSIM)of 0.6677,indicating a good balance between operational efficiency and reconstruction quality.展开更多
Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal ...Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.展开更多
Crime scene investigation(CSI)image is key evidence carrier during criminal investiga-tion,in which CSI image retrieval can assist the public police to obtain criminal clues.Moreover,with the rapid development of deep...Crime scene investigation(CSI)image is key evidence carrier during criminal investiga-tion,in which CSI image retrieval can assist the public police to obtain criminal clues.Moreover,with the rapid development of deep learning,data-driven paradigm has become the mainstreammethod of CSI image feature extraction and representation,and in this process,datasets provideeffective support for CSI retrieval performance.However,there is a lack of systematic research onCSI image retrieval methods and datasets.Therefore,we present an overview of the existing worksabout one-class and multi-class CSI image retrieval based on deep learning.According to theresearch,based on their technical functionalities and implementation methods,CSI image retrievalis roughly classified into five categories:feature representation,metric learning,generative adversar-ial networks,autoencoder networks and attention networks.Furthermore,We analyzed the remain-ing challenges and discussed future work directions in this field.展开更多
The proposed robust reversible watermarking algorithm addresses the compatibility challenges between robustness and reversibility in existing video watermarking techniques by leveraging scene smoothness for frame grou...The proposed robust reversible watermarking algorithm addresses the compatibility challenges between robustness and reversibility in existing video watermarking techniques by leveraging scene smoothness for frame grouping videos.Grounded in the H.264 video coding standard,the algorithm first employs traditional robust watermark stitching technology to embed watermark information in the low-frequency coefficient domain of the U channel.Subsequently,it utilizes histogram migration techniques in the high-frequency coefficient domain of the U channel to embed auxiliary information,enabling successful watermark extraction and lossless recovery of the original video content.Experimental results demonstrate the algorithm’s strong imperceptibility,with each embedded frame in the experimental videos achieving a mean peak signal-to-noise ratio of 49.3830 dB and a mean structural similarity of 0.9996.Compared with the three comparison algorithms,the performance of the two experimental indexes is improved by 7.59%and 0.4%on average.At the same time,the proposed algorithm has strong robustness to both offline and online attacks:In the face of offline attacks,the average normalized correlation coefficient between the extracted watermark and the original watermark is 0.9989,and the average bit error rate is 0.0089.In the face of online attacks,the normalized correlation coefficient between the extracted watermark and the original watermark is 0.8840,and the mean bit error rate is 0.2269.Compared with the three comparison algorithms,the performance of the two experimental indexes is improved by 1.27%and 18.16%on average,highlighting the algorithm’s robustness.Furthermore,the algorithm exhibits low computational complexity,with the mean encoding and the mean decoding time differentials during experimental video processing being 3.934 and 2.273 s,respectively,underscoring its practical utility.展开更多
In order to improve target localization precision,accuracy,execution efficiency,and application range of the unmanned aerial vehicle(UAV)based on scene matching,a ground target localization method for unmanned aerial ...In order to improve target localization precision,accuracy,execution efficiency,and application range of the unmanned aerial vehicle(UAV)based on scene matching,a ground target localization method for unmanned aerial vehicle based on scene matching(GTLUAVSM)is proposed.The sugges-ted approach entails completing scene matching through a feature matching algorithm.Then,multi-sensor registration is optimized by robust estimation based on homologous registration.Finally,basemap generation and model solution are utilized to improve basemap correspondence and accom-plish aerial image positioning.Theoretical evidence and experimental verification demonstrate that GTLUAVSM can improve localization accuracy,speed,and precision while minimizing reliance on task equipment.展开更多
For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior fe...For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.展开更多
Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance o...Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance of robotic applications in terms of accuracy and speed.This research proposed a real-time indoor camera localization system based on a recurrent neural network that detects scene change during the image sequence.An annotated image dataset trains the proposed system and predicts the camera pose in real-time.The system mainly improved the localization performance of indoor cameras by more accurately predicting the camera pose.It also recognizes the scene changes during the sequence and evaluates the effects of these changes.This system achieved high accuracy and real-time performance.The scene change detection process was performed using visual rhythm and the proposed recurrent deep architecture,which performed camera pose prediction and scene change impact evaluation.Overall,this study proposed a novel real-time localization system for indoor cameras that detects scene changes and shows how they affect localization performance.展开更多
As eye tracking can be used to record moment-to-moment changes of eye movements as people inspect pictures of natural scenes and comprehend information, this paper attempts to use eye-movement technology to investigat...As eye tracking can be used to record moment-to-moment changes of eye movements as people inspect pictures of natural scenes and comprehend information, this paper attempts to use eye-movement technology to investigate how the order of presentation and the characteristics of information affect the semantic mismatch effect in the picture-sentence paradigm. A 3(syntax)×2(semantic relation) factorial design is adopted, with syntax and semantic relations as within-participant variables. The experiment finds that the semantic mismatch is most likely to increase cognitive loads as people have to spend more time, including first-pass time, regression path duration, and total fixation duration. Double negation does not significantly increase the processing difficulty of pictures and information. Experimental results show that people can extract the special syntactic strategy from long-term memory to process pictures and sentences with different semantic relations. It enables readers to comprehend double negation as affirmation. These results demonstrate that the constituent comparison model may not be a general model regarding other languages.展开更多
In digital video analysis, browse, retrieval and query, shot is incapable of meeting needs. Scene is a cluster of a series of shots, which partially meets above demands. In this paper, an algorithm of video scenes clu...In digital video analysis, browse, retrieval and query, shot is incapable of meeting needs. Scene is a cluster of a series of shots, which partially meets above demands. In this paper, an algorithm of video scenes clustering based on shot key frame sets is proposed. We use X^2 histogram match and twin histogram comparison for shot detection. A method is presented for key frame set extraction based on distance of non adjacent frames, further more, the minimum distance of key frame sets as distance of shots is computed, eventually scenes are clustered according to the distance of shots. Experiments of this algorithm show satisfactory performance in cor rectness and computing speed.展开更多
The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decisi...The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decision-making experience may be used to help controllers decide control strategies quickly.Considering that there are many traffic scenes and it is hard to label them all,in this paper,we propose an active SVM metric learning(ASVM2L)algorithm to measure and identify the similar traffic scenes.First of all,we obtain some traffic scene samples correctly labeled by experienced air traffic controllers.We design an active sampling strategy based on voting difference to choose the most valuable unlabeled samples and label them.Then the metric matrix of all the labeled samples is learned and used to complete the classification of traffic scenes.We verify the effectiveness of ASVM2L on standard data sets,and then use it to measure and classify the traffic scenes on the historical air traffic data set of the Central South Sector of China.The experimental results show that,compared with other existing methods,the proposed method can use the information of traffic scene samples more thoroughly and achieve better classification performance under limited labeled samples.展开更多
Air traffic controllers face challenging initiatives due to uncertainty in air traffic.One way to support their initiatives is to identify similar operation scenes.Based on the operation characteristics of typical bus...Air traffic controllers face challenging initiatives due to uncertainty in air traffic.One way to support their initiatives is to identify similar operation scenes.Based on the operation characteristics of typical busy area control airspace,an complexity measurement indicator system is established.We find that operation in area sector is characterized by aggregation and continuity,and that dimensionality and information redundancy reduction are feasible for dynamic operation data base on principle components.Using principle components,discrete features and time series features are constructed.Based on Gaussian kernel function,Euclidean distance and dynamic time warping(DTW)are used to measure the similarity of the features.Then the matrices of similarity are input in Spectral Clustering.The clustering results show that similar scenes of trend are not ideal and similar scenes of modes are good base on the indicator system.Finally,actual vertical operation decisions for area sector and results of identification are compared,which are visualized by metric multidimensional scaling(MDS)plots.We find that identification results can well reflect the operation at peak hours,but controllers make different decisions under the similar conditions before dawn.The compliance rate of busy operation mode and division decisions at peak hours is 96.7%.The results also show subjectivity of actual operation and objectivity of identification.In most scenes,we observe that similar air traffic activities provide regularity for initiatives,validating the potential of this approach for initiatives and other artificial intelligence support.展开更多
Crowded scene analysis is currently a hot and challenging topic in computer vision field. The ability to analyze motion patterns from videos is a difficult, but critical part of this problem. In this paper, we propose...Crowded scene analysis is currently a hot and challenging topic in computer vision field. The ability to analyze motion patterns from videos is a difficult, but critical part of this problem. In this paper, we propose a novel approach for the analysis of motion patterns by clustering the tracklets using an unsupervised hierarchical clustering algorithm, where the similarity between tracklets is measured by the Longest Common Subsequences. The tracklets are obtained by tracking dense points under three effective rules, therefore enabling it to capture the motion patterns in crowded scenes. The analysis of motion patterns is implemented in a completely unsupervised way, and the tracklets are clustered automatically through hierarchical clustering algorithm based on a graphic model. To validate the performance of our approach, we conducted experimental evaluations on two datasets. The results reveal the precise distributions of motion patterns in current crowded videos and demonstrate the effectiveness of our approach.展开更多
The airport apron scene contains rich contextual information about the spatial position relationship.Traditional object detectors only considered visual appearance and ignored the contextual information.In addition,th...The airport apron scene contains rich contextual information about the spatial position relationship.Traditional object detectors only considered visual appearance and ignored the contextual information.In addition,the detection accuracy of some categories in the apron dataset was low.Therefore,an improved object detection method using spatial-aware features in apron scenes called SA-FRCNN is presented.The method uses graph convolutional networks to capture the relative spatial relationship between objects in the apron scene,incorporating this spatial context into feature learning.Moreover,an attention mechanism is introduced into the feature extraction process,with the goal to focus on the spatial position and key features,and distance-IoU loss is used to achieve a more accurate regression.The experimental results show that the mean average precision of the apron object detection based on SAFRCNN can reach 95.75%,and the detection effect of some hard-to-detect categories has been significantly improved.The proposed method effectively improves the detection accuracy on the apron dataset,which has a leading advantage over other methods.展开更多
This paper makes a brief analysis of the scenes and implication of the famous Ancient Chinese poem River snow. It makes a comparison and contrast of the five typical English versions from the perspective of the transl...This paper makes a brief analysis of the scenes and implication of the famous Ancient Chinese poem River snow. It makes a comparison and contrast of the five typical English versions from the perspective of the translation of its static and dynamic states. The paper also discusses the meaning of “du diao han jiang xue”, and how to better translate the key word “diao”.展开更多
基金support by the National Natural Science Foundation of China (Grant No. 62005049)Natural Science Foundation of Fujian Province (Grant Nos. 2020J01451, 2022J05113)Education and Scientific Research Program for Young and Middleaged Teachers in Fujian Province (Grant No. JAT210035)。
文摘Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems, including spectral, polarization, and infrared technologies, there is still a lack of effective real-time method for accurately detecting small-size and high-efficient camouflaged people in complex real-world scenes. Here, this study proposes a snapshot multispectral image-based camouflaged detection model, multispectral YOLO(MS-YOLO), which utilizes the SPD-Conv and Sim AM modules to effectively represent targets and suppress background interference by exploiting the spatial-spectral target information. Besides, the study constructs the first real-shot multispectral camouflaged people dataset(MSCPD), which encompasses diverse scenes, target scales, and attitudes. To minimize information redundancy, MS-YOLO selects an optimal subset of 12 bands with strong feature representation and minimal inter-band correlation as input. Through experiments on the MSCPD, MS-YOLO achieves a mean Average Precision of 94.31% and real-time detection at 65 frames per second, which confirms the effectiveness and efficiency of our method in detecting camouflaged people in various typical desert and forest scenes. Our approach offers valuable support to improve the perception capabilities of unmanned aerial vehicles in detecting enemy forces and rescuing personnel in battlefield.
文摘Automatic control technology is the basis of road robot improvement,according to the characteristics of construction equipment and functions,the research will be input type perception from positioning acquisition,real-world monitoring,the process will use RTK-GNSS positional perception technology,by projecting the left side of the earth from Gauss-Krueger projection method,and then carry out the Cartesian conversion based on the characteristics of drawing;steering control system is the core of the electric drive unmanned module,on the basis of the analysis of the composition of the steering system of unmanned engineering vehicles,the steering system key components such as direction,torque sensor,drive motor and other models are established,the joint simulation model of unmanned engineering vehicles is established,the steering controller is designed using the PID method,the simulation results show that the control method can meet the construction path demand for automatic steering.The path planning will first formulate the construction area with preset values and realize the steering angle correction during driving by PID algorithm,and never realize the construction-based path planning,and the results show that the method can control the straight path within the error of 10 cm and the curve error within 20 cm.With the collaboration of various modules,the automatic construction simulation results of this robot show that the design path and control method is effective.
基金the China National Key Research and Development Program(No.2016YFC0802904)National Natural Science Foundation of China(61671470)62nd batch of funded projects of China Postdoctoral Science Foundation(No.2017M623423).
文摘Detecting highly-overlapped objects in crowded scenes remains a challenging problem,especially for one-stage detector.In this paper,we extricate YOLOv4 from the dilemma in a crowd by fine-tuning its detection scheme,named YOLO-CS.Specifically,we give YOLOv4 the power to detect multiple objects in one cell.Center to our method is the carefully designed joint prediction scheme,which is executed through an assignment of bounding boxes and a joint loss.Equipped with the derived joint-object augmentation(DJA),refined regression loss(RL)and Score-NMS(SN),YOLO-CS achieves competitive detection performance on CrowdHuman and CityPersons benchmarks compared with state-of-the-art detectors at the cost of little time.Furthermore,on the widely used general benchmark COCO,YOLOCS still has a good performance,indicating its robustness to various scenes.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2021R1I1A1A01055652).
文摘The analysis of overcrowded areas is essential for flow monitoring,assembly control,and security.Crowd counting’s primary goal is to calculate the population in a given region,which requires real-time analysis of congested scenes for prompt reactionary actions.The crowd is always unexpected,and the benchmarked available datasets have a lot of variation,which limits the trained models’performance on unseen test data.In this paper,we proposed an end-to-end deep neural network that takes an input image and generates a density map of a crowd scene.The proposed model consists of encoder and decoder networks comprising batch-free normalization layers known as evolving normalization(EvoNorm).This allows our network to be generalized for unseen data because EvoNorm is not using statistics from the training samples.The decoder network uses dilated 2D convolutional layers to provide large receptive fields and fewer parameters,which enables real-time processing and solves the density drift problem due to its large receptive field.Five benchmark datasets are used in this study to assess the proposed model,resulting in the conclusion that it outperforms conventional models.
基金supported by the Fundamental Research Funds for the CentralUniversities under Grant NS2020045. Y.L.G received the grant.
文摘Weather is a key factor affecting the control of air traffic.Accurate recognition and classification of similar weather scenes in the terminal area is helpful for rapid decision-making in air trafficflow management.Current researches mostly use traditional machine learning methods to extract features of weather scenes,and clustering algorithms to divide similar scenes.Inspired by the excellent performance of deep learning in image recognition,this paper proposes a terminal area similar weather scene classification method based on improved deep convolution embedded clustering(IDCEC),which uses the com-bination of the encoding layer and the decoding layer to reduce the dimensionality of the weather image,retaining useful information to the greatest extent,and then uses the combination of the pre-trained encoding layer and the clustering layer to train the clustering model of the similar scenes in the terminal area.Finally,term-inal area of Guangzhou Airport is selected as the research object,the method pro-posed in this article is used to classify historical weather data in similar scenes,and the performance is compared with other state-of-the-art methods.The experi-mental results show that the proposed IDCEC method can identify similar scenes more accurately based on the spatial distribution characteristics and severity of weather;at the same time,compared with the actualflight volume in the Guangz-hou terminal area,IDCEC's recognition results of similar weather scenes are con-sistent with the recognition of experts in thefield.
文摘In European thought and culture,there exists a group of passionate artists who are fascinated by the intention,passion,and richness of artistic expression.They strive to establish connections between different art forms.Musicians not only attempt to represent masterpieces through the language of music but also aim to convey subjective experiences of emotions and personal imagination to listeners by adding titles to their musical works.This study examines two pieces,“Scenes of Childhood”and“Children’s Garden”,and analyzes the different approaches employed by the composers in portraying similar content.
基金This work was supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes or scattering media,is also evolving.Existing underwater 3D reconstruction systems still face challenges such as long training times and low rendering efficiency.This paper proposes an improved underwater 3D reconstruction system to achieve rapid and high-quality 3D reconstruction.First,we enhance underwater videos captured by a monocular camera to correct the image quality degradation caused by the physical properties of the water medium and ensure consistency in enhancement across frames.Then,we perform keyframe selection to optimize resource usage and reduce the impact of dynamic objects on the reconstruction results.After pose estimation using COLMAP,the selected keyframes undergo 3D reconstruction using neural radiance fields(NeRF)based on multi-resolution hash encoding for model construction and rendering.In terms of image enhancement,our method has been optimized in certain scenarios,demonstrating effectiveness in image enhancement and better continuity between consecutive frames of the same data.In terms of 3D reconstruction,our method achieved a peak signal-to-noise ratio(PSNR)of 18.40 dB and a structural similarity(SSIM)of 0.6677,indicating a good balance between operational efficiency and reconstruction quality.
基金the National Natural Science Foundation of PRChina(42075130)Nari Technology Co.,Ltd.(4561655965)。
文摘Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.
文摘Crime scene investigation(CSI)image is key evidence carrier during criminal investiga-tion,in which CSI image retrieval can assist the public police to obtain criminal clues.Moreover,with the rapid development of deep learning,data-driven paradigm has become the mainstreammethod of CSI image feature extraction and representation,and in this process,datasets provideeffective support for CSI retrieval performance.However,there is a lack of systematic research onCSI image retrieval methods and datasets.Therefore,we present an overview of the existing worksabout one-class and multi-class CSI image retrieval based on deep learning.According to theresearch,based on their technical functionalities and implementation methods,CSI image retrievalis roughly classified into five categories:feature representation,metric learning,generative adversar-ial networks,autoencoder networks and attention networks.Furthermore,We analyzed the remain-ing challenges and discussed future work directions in this field.
基金supported in part by the National Natural Science Foundation of China under Grants 62202496,62272478the Basic Frontier Innovation Project of Engineering university of People Armed Police under Grants WJY202314,WJY202221.
文摘The proposed robust reversible watermarking algorithm addresses the compatibility challenges between robustness and reversibility in existing video watermarking techniques by leveraging scene smoothness for frame grouping videos.Grounded in the H.264 video coding standard,the algorithm first employs traditional robust watermark stitching technology to embed watermark information in the low-frequency coefficient domain of the U channel.Subsequently,it utilizes histogram migration techniques in the high-frequency coefficient domain of the U channel to embed auxiliary information,enabling successful watermark extraction and lossless recovery of the original video content.Experimental results demonstrate the algorithm’s strong imperceptibility,with each embedded frame in the experimental videos achieving a mean peak signal-to-noise ratio of 49.3830 dB and a mean structural similarity of 0.9996.Compared with the three comparison algorithms,the performance of the two experimental indexes is improved by 7.59%and 0.4%on average.At the same time,the proposed algorithm has strong robustness to both offline and online attacks:In the face of offline attacks,the average normalized correlation coefficient between the extracted watermark and the original watermark is 0.9989,and the average bit error rate is 0.0089.In the face of online attacks,the normalized correlation coefficient between the extracted watermark and the original watermark is 0.8840,and the mean bit error rate is 0.2269.Compared with the three comparison algorithms,the performance of the two experimental indexes is improved by 1.27%and 18.16%on average,highlighting the algorithm’s robustness.Furthermore,the algorithm exhibits low computational complexity,with the mean encoding and the mean decoding time differentials during experimental video processing being 3.934 and 2.273 s,respectively,underscoring its practical utility.
基金the National Key R&D Program of China(2022YFF0604502).
文摘In order to improve target localization precision,accuracy,execution efficiency,and application range of the unmanned aerial vehicle(UAV)based on scene matching,a ground target localization method for unmanned aerial vehicle based on scene matching(GTLUAVSM)is proposed.The sugges-ted approach entails completing scene matching through a feature matching algorithm.Then,multi-sensor registration is optimized by robust estimation based on homologous registration.Finally,basemap generation and model solution are utilized to improve basemap correspondence and accom-plish aerial image positioning.Theoretical evidence and experimental verification demonstrate that GTLUAVSM can improve localization accuracy,speed,and precision while minimizing reliance on task equipment.
文摘For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.
文摘Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance of robotic applications in terms of accuracy and speed.This research proposed a real-time indoor camera localization system based on a recurrent neural network that detects scene change during the image sequence.An annotated image dataset trains the proposed system and predicts the camera pose in real-time.The system mainly improved the localization performance of indoor cameras by more accurately predicting the camera pose.It also recognizes the scene changes during the sequence and evaluates the effects of these changes.This system achieved high accuracy and real-time performance.The scene change detection process was performed using visual rhythm and the proposed recurrent deep architecture,which performed camera pose prediction and scene change impact evaluation.Overall,this study proposed a novel real-time localization system for indoor cameras that detects scene changes and shows how they affect localization performance.
基金The National Social Science Foundation of China (No.CBA080236)the Graduate Innovation Project of Jiangsu Province (No.CX08B-016R)
文摘As eye tracking can be used to record moment-to-moment changes of eye movements as people inspect pictures of natural scenes and comprehend information, this paper attempts to use eye-movement technology to investigate how the order of presentation and the characteristics of information affect the semantic mismatch effect in the picture-sentence paradigm. A 3(syntax)×2(semantic relation) factorial design is adopted, with syntax and semantic relations as within-participant variables. The experiment finds that the semantic mismatch is most likely to increase cognitive loads as people have to spend more time, including first-pass time, regression path duration, and total fixation duration. Double negation does not significantly increase the processing difficulty of pictures and information. Experimental results show that people can extract the special syntactic strategy from long-term memory to process pictures and sentences with different semantic relations. It enables readers to comprehend double negation as affirmation. These results demonstrate that the constituent comparison model may not be a general model regarding other languages.
基金Supported by the Natural Science Foundation ofHubei Province(2004ABA174)
文摘In digital video analysis, browse, retrieval and query, shot is incapable of meeting needs. Scene is a cluster of a series of shots, which partially meets above demands. In this paper, an algorithm of video scenes clustering based on shot key frame sets is proposed. We use X^2 histogram match and twin histogram comparison for shot detection. A method is presented for key frame set extraction based on distance of non adjacent frames, further more, the minimum distance of key frame sets as distance of shots is computed, eventually scenes are clustered according to the distance of shots. Experiments of this algorithm show satisfactory performance in cor rectness and computing speed.
基金supported by the National Natural Science Foundation of China(No.61501229)the Fundamental Research Funds for the Central Universities(Nos.2019054,2020045)。
文摘The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decision-making experience may be used to help controllers decide control strategies quickly.Considering that there are many traffic scenes and it is hard to label them all,in this paper,we propose an active SVM metric learning(ASVM2L)algorithm to measure and identify the similar traffic scenes.First of all,we obtain some traffic scene samples correctly labeled by experienced air traffic controllers.We design an active sampling strategy based on voting difference to choose the most valuable unlabeled samples and label them.Then the metric matrix of all the labeled samples is learned and used to complete the classification of traffic scenes.We verify the effectiveness of ASVM2L on standard data sets,and then use it to measure and classify the traffic scenes on the historical air traffic data set of the Central South Sector of China.The experimental results show that,compared with other existing methods,the proposed method can use the information of traffic scene samples more thoroughly and achieve better classification performance under limited labeled samples.
基金the National Natural Science Foundation of China(Nos.71731001,61573181,71971114)the Fundamental Research Funds for the Central Universities(No.NS2020045)。
文摘Air traffic controllers face challenging initiatives due to uncertainty in air traffic.One way to support their initiatives is to identify similar operation scenes.Based on the operation characteristics of typical busy area control airspace,an complexity measurement indicator system is established.We find that operation in area sector is characterized by aggregation and continuity,and that dimensionality and information redundancy reduction are feasible for dynamic operation data base on principle components.Using principle components,discrete features and time series features are constructed.Based on Gaussian kernel function,Euclidean distance and dynamic time warping(DTW)are used to measure the similarity of the features.Then the matrices of similarity are input in Spectral Clustering.The clustering results show that similar scenes of trend are not ideal and similar scenes of modes are good base on the indicator system.Finally,actual vertical operation decisions for area sector and results of identification are compared,which are visualized by metric multidimensional scaling(MDS)plots.We find that identification results can well reflect the operation at peak hours,but controllers make different decisions under the similar conditions before dawn.The compliance rate of busy operation mode and division decisions at peak hours is 96.7%.The results also show subjectivity of actual operation and objectivity of identification.In most scenes,we observe that similar air traffic activities provide regularity for initiatives,validating the potential of this approach for initiatives and other artificial intelligence support.
基金supported in part by National Basic Research Program of China (973 Program) under Grant No. 2011CB302203the National Natural Science Foundation of China under Grant No. 61273285
文摘Crowded scene analysis is currently a hot and challenging topic in computer vision field. The ability to analyze motion patterns from videos is a difficult, but critical part of this problem. In this paper, we propose a novel approach for the analysis of motion patterns by clustering the tracklets using an unsupervised hierarchical clustering algorithm, where the similarity between tracklets is measured by the Longest Common Subsequences. The tracklets are obtained by tracking dense points under three effective rules, therefore enabling it to capture the motion patterns in crowded scenes. The analysis of motion patterns is implemented in a completely unsupervised way, and the tracklets are clustered automatically through hierarchical clustering algorithm based on a graphic model. To validate the performance of our approach, we conducted experimental evaluations on two datasets. The results reveal the precise distributions of motion patterns in current crowded videos and demonstrate the effectiveness of our approach.
基金supported by the Fundamental Research Funds for Central Universities of the Civil Aviation University of China(No.3122021088).
文摘The airport apron scene contains rich contextual information about the spatial position relationship.Traditional object detectors only considered visual appearance and ignored the contextual information.In addition,the detection accuracy of some categories in the apron dataset was low.Therefore,an improved object detection method using spatial-aware features in apron scenes called SA-FRCNN is presented.The method uses graph convolutional networks to capture the relative spatial relationship between objects in the apron scene,incorporating this spatial context into feature learning.Moreover,an attention mechanism is introduced into the feature extraction process,with the goal to focus on the spatial position and key features,and distance-IoU loss is used to achieve a more accurate regression.The experimental results show that the mean average precision of the apron object detection based on SAFRCNN can reach 95.75%,and the detection effect of some hard-to-detect categories has been significantly improved.The proposed method effectively improves the detection accuracy on the apron dataset,which has a leading advantage over other methods.
文摘This paper makes a brief analysis of the scenes and implication of the famous Ancient Chinese poem River snow. It makes a comparison and contrast of the five typical English versions from the perspective of the translation of its static and dynamic states. The paper also discusses the meaning of “du diao han jiang xue”, and how to better translate the key word “diao”.