Crime scene investigation(CSI)image is key evidence carrier during criminal investiga-tion,in which CSI image retrieval can assist the public police to obtain criminal clues.Moreover,with the rapid development of deep...Crime scene investigation(CSI)image is key evidence carrier during criminal investiga-tion,in which CSI image retrieval can assist the public police to obtain criminal clues.Moreover,with the rapid development of deep learning,data-driven paradigm has become the mainstreammethod of CSI image feature extraction and representation,and in this process,datasets provideeffective support for CSI retrieval performance.However,there is a lack of systematic research onCSI image retrieval methods and datasets.Therefore,we present an overview of the existing worksabout one-class and multi-class CSI image retrieval based on deep learning.According to theresearch,based on their technical functionalities and implementation methods,CSI image retrievalis roughly classified into five categories:feature representation,metric learning,generative adversar-ial networks,autoencoder networks and attention networks.Furthermore,We analyzed the remain-ing challenges and discussed future work directions in this field.展开更多
Automatic control technology is the basis of road robot improvement,according to the characteristics of construction equipment and functions,the research will be input type perception from positioning acquisition,real...Automatic control technology is the basis of road robot improvement,according to the characteristics of construction equipment and functions,the research will be input type perception from positioning acquisition,real-world monitoring,the process will use RTK-GNSS positional perception technology,by projecting the left side of the earth from Gauss-Krueger projection method,and then carry out the Cartesian conversion based on the characteristics of drawing;steering control system is the core of the electric drive unmanned module,on the basis of the analysis of the composition of the steering system of unmanned engineering vehicles,the steering system key components such as direction,torque sensor,drive motor and other models are established,the joint simulation model of unmanned engineering vehicles is established,the steering controller is designed using the PID method,the simulation results show that the control method can meet the construction path demand for automatic steering.The path planning will first formulate the construction area with preset values and realize the steering angle correction during driving by PID algorithm,and never realize the construction-based path planning,and the results show that the method can control the straight path within the error of 10 cm and the curve error within 20 cm.With the collaboration of various modules,the automatic construction simulation results of this robot show that the design path and control method is effective.展开更多
In order to improve target localization precision,accuracy,execution efficiency,and application range of the unmanned aerial vehicle(UAV)based on scene matching,a ground target localization method for unmanned aerial ...In order to improve target localization precision,accuracy,execution efficiency,and application range of the unmanned aerial vehicle(UAV)based on scene matching,a ground target localization method for unmanned aerial vehicle based on scene matching(GTLUAVSM)is proposed.The sugges-ted approach entails completing scene matching through a feature matching algorithm.Then,multi-sensor registration is optimized by robust estimation based on homologous registration.Finally,basemap generation and model solution are utilized to improve basemap correspondence and accom-plish aerial image positioning.Theoretical evidence and experimental verification demonstrate that GTLUAVSM can improve localization accuracy,speed,and precision while minimizing reliance on task equipment.展开更多
Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance o...Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance of robotic applications in terms of accuracy and speed.This research proposed a real-time indoor camera localization system based on a recurrent neural network that detects scene change during the image sequence.An annotated image dataset trains the proposed system and predicts the camera pose in real-time.The system mainly improved the localization performance of indoor cameras by more accurately predicting the camera pose.It also recognizes the scene changes during the sequence and evaluates the effects of these changes.This system achieved high accuracy and real-time performance.The scene change detection process was performed using visual rhythm and the proposed recurrent deep architecture,which performed camera pose prediction and scene change impact evaluation.Overall,this study proposed a novel real-time localization system for indoor cameras that detects scene changes and shows how they affect localization performance.展开更多
For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior fe...For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.展开更多
Weather is a key factor affecting the control of air traffic.Accurate recognition and classification of similar weather scenes in the terminal area is helpful for rapid decision-making in air trafficflow management.Curren...Weather is a key factor affecting the control of air traffic.Accurate recognition and classification of similar weather scenes in the terminal area is helpful for rapid decision-making in air trafficflow management.Current researches mostly use traditional machine learning methods to extract features of weather scenes,and clustering algorithms to divide similar scenes.Inspired by the excellent performance of deep learning in image recognition,this paper proposes a terminal area similar weather scene classification method based on improved deep convolution embedded clustering(IDCEC),which uses the com-bination of the encoding layer and the decoding layer to reduce the dimensionality of the weather image,retaining useful information to the greatest extent,and then uses the combination of the pre-trained encoding layer and the clustering layer to train the clustering model of the similar scenes in the terminal area.Finally,term-inal area of Guangzhou Airport is selected as the research object,the method pro-posed in this article is used to classify historical weather data in similar scenes,and the performance is compared with other state-of-the-art methods.The experi-mental results show that the proposed IDCEC method can identify similar scenes more accurately based on the spatial distribution characteristics and severity of weather;at the same time,compared with the actualflight volume in the Guangz-hou terminal area,IDCEC's recognition results of similar weather scenes are con-sistent with the recognition of experts in thefield.展开更多
Regional cultural patterns and characteristics play a positive role in economic and social development.By planning and constructing cultural amenities and creating cultural scenes,the spatial quality and quality of li...Regional cultural patterns and characteristics play a positive role in economic and social development.By planning and constructing cultural amenities and creating cultural scenes,the spatial quality and quality of life in a region can be enhanced,facilitating the expansion of cultural consumption.Shaanxi,with its rich historical and cultural resources,positions the capital city of Xi’an as a“world historical city”,boasting a vast number of cultural amenities represented by“cultural facilities”,“cultural activities”,“cultural experiences”,and“cultural services”.The development of urban cultural scene,with the aim of promoting the upgrading of regional cultural consumption in Shaanxi,requires comprehensive planning and a multifaceted approach,particularly in integrating provincial cultural scenes,clarifying the positioning of cultural scenes,innovating cultural scene experience projects,creating cultural scene intellectual property(IP),and empowering cultural scenes through the application of science and technology.展开更多
In European thought and culture,there exists a group of passionate artists who are fascinated by the intention,passion,and richness of artistic expression.They strive to establish connections between different art for...In European thought and culture,there exists a group of passionate artists who are fascinated by the intention,passion,and richness of artistic expression.They strive to establish connections between different art forms.Musicians not only attempt to represent masterpieces through the language of music but also aim to convey subjective experiences of emotions and personal imagination to listeners by adding titles to their musical works.This study examines two pieces,“Scenes of Childhood”and“Children’s Garden”,and analyzes the different approaches employed by the composers in portraying similar content.展开更多
Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems,...Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems, including spectral, polarization, and infrared technologies, there is still a lack of effective real-time method for accurately detecting small-size and high-efficient camouflaged people in complex real-world scenes. Here, this study proposes a snapshot multispectral image-based camouflaged detection model, multispectral YOLO(MS-YOLO), which utilizes the SPD-Conv and Sim AM modules to effectively represent targets and suppress background interference by exploiting the spatial-spectral target information. Besides, the study constructs the first real-shot multispectral camouflaged people dataset(MSCPD), which encompasses diverse scenes, target scales, and attitudes. To minimize information redundancy, MS-YOLO selects an optimal subset of 12 bands with strong feature representation and minimal inter-band correlation as input. Through experiments on the MSCPD, MS-YOLO achieves a mean Average Precision of 94.31% and real-time detection at 65 frames per second, which confirms the effectiveness and efficiency of our method in detecting camouflaged people in various typical desert and forest scenes. Our approach offers valuable support to improve the perception capabilities of unmanned aerial vehicles in detecting enemy forces and rescuing personnel in battlefield.展开更多
Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal ...Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.展开更多
Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes o...Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes or scattering media,is also evolving.Existing underwater 3D reconstruction systems still face challenges such as long training times and low rendering efficiency.This paper proposes an improved underwater 3D reconstruction system to achieve rapid and high-quality 3D reconstruction.First,we enhance underwater videos captured by a monocular camera to correct the image quality degradation caused by the physical properties of the water medium and ensure consistency in enhancement across frames.Then,we perform keyframe selection to optimize resource usage and reduce the impact of dynamic objects on the reconstruction results.After pose estimation using COLMAP,the selected keyframes undergo 3D reconstruction using neural radiance fields(NeRF)based on multi-resolution hash encoding for model construction and rendering.In terms of image enhancement,our method has been optimized in certain scenarios,demonstrating effectiveness in image enhancement and better continuity between consecutive frames of the same data.In terms of 3D reconstruction,our method achieved a peak signal-to-noise ratio(PSNR)of 18.40 dB and a structural similarity(SSIM)of 0.6677,indicating a good balance between operational efficiency and reconstruction quality.展开更多
The proposed robust reversible watermarking algorithm addresses the compatibility challenges between robustness and reversibility in existing video watermarking techniques by leveraging scene smoothness for frame grou...The proposed robust reversible watermarking algorithm addresses the compatibility challenges between robustness and reversibility in existing video watermarking techniques by leveraging scene smoothness for frame grouping videos.Grounded in the H.264 video coding standard,the algorithm first employs traditional robust watermark stitching technology to embed watermark information in the low-frequency coefficient domain of the U channel.Subsequently,it utilizes histogram migration techniques in the high-frequency coefficient domain of the U channel to embed auxiliary information,enabling successful watermark extraction and lossless recovery of the original video content.Experimental results demonstrate the algorithm’s strong imperceptibility,with each embedded frame in the experimental videos achieving a mean peak signal-to-noise ratio of 49.3830 dB and a mean structural similarity of 0.9996.Compared with the three comparison algorithms,the performance of the two experimental indexes is improved by 7.59%and 0.4%on average.At the same time,the proposed algorithm has strong robustness to both offline and online attacks:In the face of offline attacks,the average normalized correlation coefficient between the extracted watermark and the original watermark is 0.9989,and the average bit error rate is 0.0089.In the face of online attacks,the normalized correlation coefficient between the extracted watermark and the original watermark is 0.8840,and the mean bit error rate is 0.2269.Compared with the three comparison algorithms,the performance of the two experimental indexes is improved by 1.27%and 18.16%on average,highlighting the algorithm’s robustness.Furthermore,the algorithm exhibits low computational complexity,with the mean encoding and the mean decoding time differentials during experimental video processing being 3.934 and 2.273 s,respectively,underscoring its practical utility.展开更多
With a history of more than 800 years,Lijiang boasts picturesque sceneries,diverse cultures,and an abundance of well-preserved historical sites.THE old town of Lijiang,also known as the town of Dayan,is located in Guc...With a history of more than 800 years,Lijiang boasts picturesque sceneries,diverse cultures,and an abundance of well-preserved historical sites.THE old town of Lijiang,also known as the town of Dayan,is located in Gucheng District in Lijiang City,southwest China’s Yunnan Province.The current structures of the town can be traced back to the end of the Song Dynasty(960-1279)and the early era of the Yuan Dynasty(1271-1368).展开更多
The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its p...The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its performance by implementing the algorithm on GPUs. In the previous research work, “Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm using GPUs,” the authors demonstrated first the Bundle Adjustment algorithmic performance improvement by reducing the mean square error using an additional radial distorting parameter and explicitly computed analytical derivatives and reducing the computational burden of the Bundle Adjustment algorithm using GPUs. The naïve implementation of the CUDA code, a speedup of 10× for the largest dataset of 13,678 cameras, 4,455,747 points, and 28,975,571 projections was achieved. In this paper, we present the optimization of the Bundle Adjustment algorithm CUDA code on GPUs to achieve higher speedup. We propose a new data memory layout for the parameters in the Bundle Adjustment algorithm, resulting in contiguous memory access. We demonstrate that it improves the memory throughput on the GPUs, thereby improving the overall performance. We also demonstrate an increase in the computational throughput of the algorithm by optimizing the CUDA kernels to utilize the GPU resources effectively. A comparative performance study of explicitly computing an algorithm parameter versus using the Jacobians instead is presented. In the previous work, the Bundle Adjustment algorithm failed to converge for certain datasets due to several block matrices of the cameras in the augmented normal equation, resulting in rank-deficient matrices. In this work, we identify the cameras that cause rank-deficient matrices and preprocess the datasets to ensure the convergence of the BA algorithm. Our optimized CUDA implementation achieves convergence of the Bundle Adjustment algorithm in around 22 seconds for the largest dataset compared to 654 seconds for the sequential implementation, resulting in a speedup of 30×. Our optimized CUDA implementation presented in this paper has achieved a 3× speedup for the largest dataset compared to the previous naïve CUDA implementation.展开更多
The purpose of the present study was to describe and provide insight on Detroit’s music industry,with the practical goal to inform the current state of music in Detroit and draw conclusive suggestions for bringing mu...The purpose of the present study was to describe and provide insight on Detroit’s music industry,with the practical goal to inform the current state of music in Detroit and draw conclusive suggestions for bringing music back to a forefront of Detroit.The study drew on a sample of convenience(N=4)by researcher networking and past work in the Detroit music industry.Eight themes were identified in a content analysis of interview responses.The study revealed a deep and common vision shared among diverse industry professionals-to bring national recognition back to Detroit.Everyone wants Detroit’s music industry to make a comeback,however,they realize it is a slow process to achieve that goal.展开更多
As eye tracking can be used to record moment-to-moment changes of eye movements as people inspect pictures of natural scenes and comprehend information, this paper attempts to use eye-movement technology to investigat...As eye tracking can be used to record moment-to-moment changes of eye movements as people inspect pictures of natural scenes and comprehend information, this paper attempts to use eye-movement technology to investigate how the order of presentation and the characteristics of information affect the semantic mismatch effect in the picture-sentence paradigm. A 3(syntax)×2(semantic relation) factorial design is adopted, with syntax and semantic relations as within-participant variables. The experiment finds that the semantic mismatch is most likely to increase cognitive loads as people have to spend more time, including first-pass time, regression path duration, and total fixation duration. Double negation does not significantly increase the processing difficulty of pictures and information. Experimental results show that people can extract the special syntactic strategy from long-term memory to process pictures and sentences with different semantic relations. It enables readers to comprehend double negation as affirmation. These results demonstrate that the constituent comparison model may not be a general model regarding other languages.展开更多
In the study of automatic driving,understanding the road scene is a key to improve driving safety.The semantic segmentation method could divide the image into different areas associated with semantic categories in acc...In the study of automatic driving,understanding the road scene is a key to improve driving safety.The semantic segmentation method could divide the image into different areas associated with semantic categories in accordance with the pixel level,so as to help vehicles to perceive and obtain the surrounding road environment information,which would improve driving safety.Deeplabv3+is the current popular semantic segmentation model.There are phenomena that small targets are missed and similar objects are easily misjudged during its semantic segmentation tasks,which leads to rough segmentation boundary and reduces semantic accuracy.This study focuses on the issue,based on the Deeplabv3+network structure and combined with the attention mechanism,to increase the weight of the segmentation area,and then proposes an improved Deeplabv3+fusion attention mechanism for road scene semantic segmentation method.First,a group of parallel position attention module and channel attention module are introduced on the Deeplabv3+encoding end to capture more spatial context information and high-level semantic information.Then,an attention mechanism is introduced to restore the spatial detail information,and the data shall be normalized in order to accelerate the convergence speed of the model at the decoding end.The effects of model segmentation with different attention-introducing mechanisms are compared and tested on CamVid and Cityscapes datasets.The experimental results show that the mean Intersection over Unons of the improved model segmentation accuracies on the two datasets are boosted by 6.88%and 2.58%,respectively,which is better than using Deeplabv3+.This method does not significantly increase the amount of network calculation and complexity,and has a good balance of speed and accuracy.展开更多
The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decisi...The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decision-making experience may be used to help controllers decide control strategies quickly.Considering that there are many traffic scenes and it is hard to label them all,in this paper,we propose an active SVM metric learning(ASVM2L)algorithm to measure and identify the similar traffic scenes.First of all,we obtain some traffic scene samples correctly labeled by experienced air traffic controllers.We design an active sampling strategy based on voting difference to choose the most valuable unlabeled samples and label them.Then the metric matrix of all the labeled samples is learned and used to complete the classification of traffic scenes.We verify the effectiveness of ASVM2L on standard data sets,and then use it to measure and classify the traffic scenes on the historical air traffic data set of the Central South Sector of China.The experimental results show that,compared with other existing methods,the proposed method can use the information of traffic scene samples more thoroughly and achieve better classification performance under limited labeled samples.展开更多
This paper makes a brief analysis of the scenes and implication of the famous Ancient Chinese poem River snow. It makes a comparison and contrast of the five typical English versions from the perspective of the transl...This paper makes a brief analysis of the scenes and implication of the famous Ancient Chinese poem River snow. It makes a comparison and contrast of the five typical English versions from the perspective of the translation of its static and dynamic states. The paper also discusses the meaning of “du diao han jiang xue”, and how to better translate the key word “diao”.展开更多
In digital video analysis, browse, retrieval and query, shot is incapable of meeting needs. Scene is a cluster of a series of shots, which partially meets above demands. In this paper, an algorithm of video scenes clu...In digital video analysis, browse, retrieval and query, shot is incapable of meeting needs. Scene is a cluster of a series of shots, which partially meets above demands. In this paper, an algorithm of video scenes clustering based on shot key frame sets is proposed. We use X^2 histogram match and twin histogram comparison for shot detection. A method is presented for key frame set extraction based on distance of non adjacent frames, further more, the minimum distance of key frame sets as distance of shots is computed, eventually scenes are clustered according to the distance of shots. Experiments of this algorithm show satisfactory performance in cor rectness and computing speed.展开更多
文摘Crime scene investigation(CSI)image is key evidence carrier during criminal investiga-tion,in which CSI image retrieval can assist the public police to obtain criminal clues.Moreover,with the rapid development of deep learning,data-driven paradigm has become the mainstreammethod of CSI image feature extraction and representation,and in this process,datasets provideeffective support for CSI retrieval performance.However,there is a lack of systematic research onCSI image retrieval methods and datasets.Therefore,we present an overview of the existing worksabout one-class and multi-class CSI image retrieval based on deep learning.According to theresearch,based on their technical functionalities and implementation methods,CSI image retrievalis roughly classified into five categories:feature representation,metric learning,generative adversar-ial networks,autoencoder networks and attention networks.Furthermore,We analyzed the remain-ing challenges and discussed future work directions in this field.
文摘Automatic control technology is the basis of road robot improvement,according to the characteristics of construction equipment and functions,the research will be input type perception from positioning acquisition,real-world monitoring,the process will use RTK-GNSS positional perception technology,by projecting the left side of the earth from Gauss-Krueger projection method,and then carry out the Cartesian conversion based on the characteristics of drawing;steering control system is the core of the electric drive unmanned module,on the basis of the analysis of the composition of the steering system of unmanned engineering vehicles,the steering system key components such as direction,torque sensor,drive motor and other models are established,the joint simulation model of unmanned engineering vehicles is established,the steering controller is designed using the PID method,the simulation results show that the control method can meet the construction path demand for automatic steering.The path planning will first formulate the construction area with preset values and realize the steering angle correction during driving by PID algorithm,and never realize the construction-based path planning,and the results show that the method can control the straight path within the error of 10 cm and the curve error within 20 cm.With the collaboration of various modules,the automatic construction simulation results of this robot show that the design path and control method is effective.
基金the National Key R&D Program of China(2022YFF0604502).
文摘In order to improve target localization precision,accuracy,execution efficiency,and application range of the unmanned aerial vehicle(UAV)based on scene matching,a ground target localization method for unmanned aerial vehicle based on scene matching(GTLUAVSM)is proposed.The sugges-ted approach entails completing scene matching through a feature matching algorithm.Then,multi-sensor registration is optimized by robust estimation based on homologous registration.Finally,basemap generation and model solution are utilized to improve basemap correspondence and accom-plish aerial image positioning.Theoretical evidence and experimental verification demonstrate that GTLUAVSM can improve localization accuracy,speed,and precision while minimizing reliance on task equipment.
文摘Real-time indoor camera localization is a significant problem in indoor robot navigation and surveillance systems.The scene can change during the image sequence and plays a vital role in the localization performance of robotic applications in terms of accuracy and speed.This research proposed a real-time indoor camera localization system based on a recurrent neural network that detects scene change during the image sequence.An annotated image dataset trains the proposed system and predicts the camera pose in real-time.The system mainly improved the localization performance of indoor cameras by more accurately predicting the camera pose.It also recognizes the scene changes during the sequence and evaluates the effects of these changes.This system achieved high accuracy and real-time performance.The scene change detection process was performed using visual rhythm and the proposed recurrent deep architecture,which performed camera pose prediction and scene change impact evaluation.Overall,this study proposed a novel real-time localization system for indoor cameras that detects scene changes and shows how they affect localization performance.
文摘For some important object recognition applications such as intelligent robots and unmanned driving, images are collected on a consecutive basis and associated among themselves, besides, the scenes have steady prior features. Yet existing technologies do not take full advantage of this information. In order to take object recognition further than existing algorithms in the above application, an object recognition method that fuses temporal sequence with scene priori information is proposed. This method first employs YOLOv3 as the basic algorithm to recognize objects in single-frame images, then the DeepSort algorithm to establish association among potential objects recognized in images of different moments, and finally the confidence fusion method and temporal boundary processing method designed herein to fuse, at the decision level, temporal sequence information with scene priori information. Experiments using public datasets and self-built industrial scene datasets show that due to the expansion of information sources, the quality of single-frame images has less impact on the recognition results, whereby the object recognition is greatly improved. It is presented herein as a widely applicable framework for the fusion of information under multiple classes. All the object recognition algorithms that output object class, location information and recognition confidence at the same time can be integrated into this information fusion framework to improve performance.
基金supported by the Fundamental Research Funds for the CentralUniversities under Grant NS2020045. Y.L.G received the grant.
文摘Weather is a key factor affecting the control of air traffic.Accurate recognition and classification of similar weather scenes in the terminal area is helpful for rapid decision-making in air trafficflow management.Current researches mostly use traditional machine learning methods to extract features of weather scenes,and clustering algorithms to divide similar scenes.Inspired by the excellent performance of deep learning in image recognition,this paper proposes a terminal area similar weather scene classification method based on improved deep convolution embedded clustering(IDCEC),which uses the com-bination of the encoding layer and the decoding layer to reduce the dimensionality of the weather image,retaining useful information to the greatest extent,and then uses the combination of the pre-trained encoding layer and the clustering layer to train the clustering model of the similar scenes in the terminal area.Finally,term-inal area of Guangzhou Airport is selected as the research object,the method pro-posed in this article is used to classify historical weather data in similar scenes,and the performance is compared with other state-of-the-art methods.The experi-mental results show that the proposed IDCEC method can identify similar scenes more accurately based on the spatial distribution characteristics and severity of weather;at the same time,compared with the actualflight volume in the Guangz-hou terminal area,IDCEC's recognition results of similar weather scenes are con-sistent with the recognition of experts in thefield.
基金the Major Theoretical and Practical Issues Research Project in Social Sciences of Shaanxi Province(2021ND0076).
文摘Regional cultural patterns and characteristics play a positive role in economic and social development.By planning and constructing cultural amenities and creating cultural scenes,the spatial quality and quality of life in a region can be enhanced,facilitating the expansion of cultural consumption.Shaanxi,with its rich historical and cultural resources,positions the capital city of Xi’an as a“world historical city”,boasting a vast number of cultural amenities represented by“cultural facilities”,“cultural activities”,“cultural experiences”,and“cultural services”.The development of urban cultural scene,with the aim of promoting the upgrading of regional cultural consumption in Shaanxi,requires comprehensive planning and a multifaceted approach,particularly in integrating provincial cultural scenes,clarifying the positioning of cultural scenes,innovating cultural scene experience projects,creating cultural scene intellectual property(IP),and empowering cultural scenes through the application of science and technology.
文摘In European thought and culture,there exists a group of passionate artists who are fascinated by the intention,passion,and richness of artistic expression.They strive to establish connections between different art forms.Musicians not only attempt to represent masterpieces through the language of music but also aim to convey subjective experiences of emotions and personal imagination to listeners by adding titles to their musical works.This study examines two pieces,“Scenes of Childhood”and“Children’s Garden”,and analyzes the different approaches employed by the composers in portraying similar content.
基金support by the National Natural Science Foundation of China (Grant No. 62005049)Natural Science Foundation of Fujian Province (Grant Nos. 2020J01451, 2022J05113)Education and Scientific Research Program for Young and Middleaged Teachers in Fujian Province (Grant No. JAT210035)。
文摘Camouflaged people are extremely expert in actively concealing themselves by effectively utilizing cover and the surrounding environment. Despite advancements in optical detection capabilities through imaging systems, including spectral, polarization, and infrared technologies, there is still a lack of effective real-time method for accurately detecting small-size and high-efficient camouflaged people in complex real-world scenes. Here, this study proposes a snapshot multispectral image-based camouflaged detection model, multispectral YOLO(MS-YOLO), which utilizes the SPD-Conv and Sim AM modules to effectively represent targets and suppress background interference by exploiting the spatial-spectral target information. Besides, the study constructs the first real-shot multispectral camouflaged people dataset(MSCPD), which encompasses diverse scenes, target scales, and attitudes. To minimize information redundancy, MS-YOLO selects an optimal subset of 12 bands with strong feature representation and minimal inter-band correlation as input. Through experiments on the MSCPD, MS-YOLO achieves a mean Average Precision of 94.31% and real-time detection at 65 frames per second, which confirms the effectiveness and efficiency of our method in detecting camouflaged people in various typical desert and forest scenes. Our approach offers valuable support to improve the perception capabilities of unmanned aerial vehicles in detecting enemy forces and rescuing personnel in battlefield.
基金the National Natural Science Foundation of PRChina(42075130)Nari Technology Co.,Ltd.(4561655965)。
文摘Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.
基金This work was supported by the Key Research and Development Program of Hainan Province(Grant Nos.ZDYF2023GXJS163,ZDYF2024GXJS014)National Natural Science Foundation of China(NSFC)(Grant Nos.62162022,62162024)+2 种基金the Major Science and Technology Project of Hainan Province(Grant No.ZDKJ2020012)Hainan Provincial Natural Science Foundation of China(Grant No.620MS021)Youth Foundation Project of Hainan Natural Science Foundation(621QN211).
文摘Research on neural radiance fields for novel view synthesis has experienced explosive growth with the development of new models and extensions.The NeRF(Neural Radiance Fields)algorithm,suitable for underwater scenes or scattering media,is also evolving.Existing underwater 3D reconstruction systems still face challenges such as long training times and low rendering efficiency.This paper proposes an improved underwater 3D reconstruction system to achieve rapid and high-quality 3D reconstruction.First,we enhance underwater videos captured by a monocular camera to correct the image quality degradation caused by the physical properties of the water medium and ensure consistency in enhancement across frames.Then,we perform keyframe selection to optimize resource usage and reduce the impact of dynamic objects on the reconstruction results.After pose estimation using COLMAP,the selected keyframes undergo 3D reconstruction using neural radiance fields(NeRF)based on multi-resolution hash encoding for model construction and rendering.In terms of image enhancement,our method has been optimized in certain scenarios,demonstrating effectiveness in image enhancement and better continuity between consecutive frames of the same data.In terms of 3D reconstruction,our method achieved a peak signal-to-noise ratio(PSNR)of 18.40 dB and a structural similarity(SSIM)of 0.6677,indicating a good balance between operational efficiency and reconstruction quality.
基金supported in part by the National Natural Science Foundation of China under Grants 62202496,62272478the Basic Frontier Innovation Project of Engineering university of People Armed Police under Grants WJY202314,WJY202221.
文摘The proposed robust reversible watermarking algorithm addresses the compatibility challenges between robustness and reversibility in existing video watermarking techniques by leveraging scene smoothness for frame grouping videos.Grounded in the H.264 video coding standard,the algorithm first employs traditional robust watermark stitching technology to embed watermark information in the low-frequency coefficient domain of the U channel.Subsequently,it utilizes histogram migration techniques in the high-frequency coefficient domain of the U channel to embed auxiliary information,enabling successful watermark extraction and lossless recovery of the original video content.Experimental results demonstrate the algorithm’s strong imperceptibility,with each embedded frame in the experimental videos achieving a mean peak signal-to-noise ratio of 49.3830 dB and a mean structural similarity of 0.9996.Compared with the three comparison algorithms,the performance of the two experimental indexes is improved by 7.59%and 0.4%on average.At the same time,the proposed algorithm has strong robustness to both offline and online attacks:In the face of offline attacks,the average normalized correlation coefficient between the extracted watermark and the original watermark is 0.9989,and the average bit error rate is 0.0089.In the face of online attacks,the normalized correlation coefficient between the extracted watermark and the original watermark is 0.8840,and the mean bit error rate is 0.2269.Compared with the three comparison algorithms,the performance of the two experimental indexes is improved by 1.27%and 18.16%on average,highlighting the algorithm’s robustness.Furthermore,the algorithm exhibits low computational complexity,with the mean encoding and the mean decoding time differentials during experimental video processing being 3.934 and 2.273 s,respectively,underscoring its practical utility.
文摘With a history of more than 800 years,Lijiang boasts picturesque sceneries,diverse cultures,and an abundance of well-preserved historical sites.THE old town of Lijiang,also known as the town of Dayan,is located in Gucheng District in Lijiang City,southwest China’s Yunnan Province.The current structures of the town can be traced back to the end of the Song Dynasty(960-1279)and the early era of the Yuan Dynasty(1271-1368).
文摘The 3D reconstruction pipeline uses the Bundle Adjustment algorithm to refine the camera and point parameters. The Bundle Adjustment algorithm is a compute-intensive algorithm, and many researchers have improved its performance by implementing the algorithm on GPUs. In the previous research work, “Improving Accuracy and Computational Burden of Bundle Adjustment Algorithm using GPUs,” the authors demonstrated first the Bundle Adjustment algorithmic performance improvement by reducing the mean square error using an additional radial distorting parameter and explicitly computed analytical derivatives and reducing the computational burden of the Bundle Adjustment algorithm using GPUs. The naïve implementation of the CUDA code, a speedup of 10× for the largest dataset of 13,678 cameras, 4,455,747 points, and 28,975,571 projections was achieved. In this paper, we present the optimization of the Bundle Adjustment algorithm CUDA code on GPUs to achieve higher speedup. We propose a new data memory layout for the parameters in the Bundle Adjustment algorithm, resulting in contiguous memory access. We demonstrate that it improves the memory throughput on the GPUs, thereby improving the overall performance. We also demonstrate an increase in the computational throughput of the algorithm by optimizing the CUDA kernels to utilize the GPU resources effectively. A comparative performance study of explicitly computing an algorithm parameter versus using the Jacobians instead is presented. In the previous work, the Bundle Adjustment algorithm failed to converge for certain datasets due to several block matrices of the cameras in the augmented normal equation, resulting in rank-deficient matrices. In this work, we identify the cameras that cause rank-deficient matrices and preprocess the datasets to ensure the convergence of the BA algorithm. Our optimized CUDA implementation achieves convergence of the Bundle Adjustment algorithm in around 22 seconds for the largest dataset compared to 654 seconds for the sequential implementation, resulting in a speedup of 30×. Our optimized CUDA implementation presented in this paper has achieved a 3× speedup for the largest dataset compared to the previous naïve CUDA implementation.
文摘The purpose of the present study was to describe and provide insight on Detroit’s music industry,with the practical goal to inform the current state of music in Detroit and draw conclusive suggestions for bringing music back to a forefront of Detroit.The study drew on a sample of convenience(N=4)by researcher networking and past work in the Detroit music industry.Eight themes were identified in a content analysis of interview responses.The study revealed a deep and common vision shared among diverse industry professionals-to bring national recognition back to Detroit.Everyone wants Detroit’s music industry to make a comeback,however,they realize it is a slow process to achieve that goal.
基金The National Social Science Foundation of China (No.CBA080236)the Graduate Innovation Project of Jiangsu Province (No.CX08B-016R)
文摘As eye tracking can be used to record moment-to-moment changes of eye movements as people inspect pictures of natural scenes and comprehend information, this paper attempts to use eye-movement technology to investigate how the order of presentation and the characteristics of information affect the semantic mismatch effect in the picture-sentence paradigm. A 3(syntax)×2(semantic relation) factorial design is adopted, with syntax and semantic relations as within-participant variables. The experiment finds that the semantic mismatch is most likely to increase cognitive loads as people have to spend more time, including first-pass time, regression path duration, and total fixation duration. Double negation does not significantly increase the processing difficulty of pictures and information. Experimental results show that people can extract the special syntactic strategy from long-term memory to process pictures and sentences with different semantic relations. It enables readers to comprehend double negation as affirmation. These results demonstrate that the constituent comparison model may not be a general model regarding other languages.
基金National Natural Science Foundation of China(Nos.61941109,62061023)Distinguished Young Scholars of Gansu Province of China(No.21JR7RA345)。
文摘In the study of automatic driving,understanding the road scene is a key to improve driving safety.The semantic segmentation method could divide the image into different areas associated with semantic categories in accordance with the pixel level,so as to help vehicles to perceive and obtain the surrounding road environment information,which would improve driving safety.Deeplabv3+is the current popular semantic segmentation model.There are phenomena that small targets are missed and similar objects are easily misjudged during its semantic segmentation tasks,which leads to rough segmentation boundary and reduces semantic accuracy.This study focuses on the issue,based on the Deeplabv3+network structure and combined with the attention mechanism,to increase the weight of the segmentation area,and then proposes an improved Deeplabv3+fusion attention mechanism for road scene semantic segmentation method.First,a group of parallel position attention module and channel attention module are introduced on the Deeplabv3+encoding end to capture more spatial context information and high-level semantic information.Then,an attention mechanism is introduced to restore the spatial detail information,and the data shall be normalized in order to accelerate the convergence speed of the model at the decoding end.The effects of model segmentation with different attention-introducing mechanisms are compared and tested on CamVid and Cityscapes datasets.The experimental results show that the mean Intersection over Unons of the improved model segmentation accuracies on the two datasets are boosted by 6.88%and 2.58%,respectively,which is better than using Deeplabv3+.This method does not significantly increase the amount of network calculation and complexity,and has a good balance of speed and accuracy.
基金supported by the National Natural Science Foundation of China(No.61501229)the Fundamental Research Funds for the Central Universities(Nos.2019054,2020045)。
文摘The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decision-making experience may be used to help controllers decide control strategies quickly.Considering that there are many traffic scenes and it is hard to label them all,in this paper,we propose an active SVM metric learning(ASVM2L)algorithm to measure and identify the similar traffic scenes.First of all,we obtain some traffic scene samples correctly labeled by experienced air traffic controllers.We design an active sampling strategy based on voting difference to choose the most valuable unlabeled samples and label them.Then the metric matrix of all the labeled samples is learned and used to complete the classification of traffic scenes.We verify the effectiveness of ASVM2L on standard data sets,and then use it to measure and classify the traffic scenes on the historical air traffic data set of the Central South Sector of China.The experimental results show that,compared with other existing methods,the proposed method can use the information of traffic scene samples more thoroughly and achieve better classification performance under limited labeled samples.
文摘This paper makes a brief analysis of the scenes and implication of the famous Ancient Chinese poem River snow. It makes a comparison and contrast of the five typical English versions from the perspective of the translation of its static and dynamic states. The paper also discusses the meaning of “du diao han jiang xue”, and how to better translate the key word “diao”.
基金Supported by the Natural Science Foundation ofHubei Province(2004ABA174)
文摘In digital video analysis, browse, retrieval and query, shot is incapable of meeting needs. Scene is a cluster of a series of shots, which partially meets above demands. In this paper, an algorithm of video scenes clustering based on shot key frame sets is proposed. We use X^2 histogram match and twin histogram comparison for shot detection. A method is presented for key frame set extraction based on distance of non adjacent frames, further more, the minimum distance of key frame sets as distance of shots is computed, eventually scenes are clustered according to the distance of shots. Experiments of this algorithm show satisfactory performance in cor rectness and computing speed.