Important in many different sectors of the industry, the determination of stream velocity has become more and more important due to measurements precision necessity, in order to determine the right production rates, d...Important in many different sectors of the industry, the determination of stream velocity has become more and more important due to measurements precision necessity, in order to determine the right production rates, determine the volumetric production of undesired fluid, establish automated controls based on these measurements avoiding over-flooding or over-production, guaranteeing accurate predictive maintenance, etc. Difficulties being faced have been the determination of the velocity of specific fluids embedded in some others, for example, determining the gas bubbles stream velocity flowing throughout liquid fluid phase. Although different and already applicable methods have been researched and already implemented within the industry, a non-intrusive automated way of providing those stream velocities has its importance, and may have a huge impact in projects budget. Knowing the importance of its determination, this developed script uses a methodology of breaking-down real-time videos media into frame images, analyzing by pixel correlations possible superposition matches for further gas bubbles stream velocity estimation. In raw sense, the script bases itself in functions and procedures already available in MatLab, which can be used for image processing and treatments, allowing the methodology to be implemented. Its accuracy after the running test was of around 97% (ninety-seven percent);the raw source code with comments had almost 3000 (three thousand) characters;and the hardware placed for running the code was an Intel Core Duo 2.13 [Ghz] and 2 [Gb] RAM memory capable workstation. Even showing good results, it could be stated that just the end point correlations were actually getting to the final solution. So that, making use of self-learning functions or neural network, one could surely enhance the capability of the application to be run in real-time without getting exhaust by iterative loops.展开更多
Video steganography plays an important role in secret communication that conceals a secret video in a cover video by perturbing the value of pixels in the cover frames.Imperceptibility is the first and foremost requir...Video steganography plays an important role in secret communication that conceals a secret video in a cover video by perturbing the value of pixels in the cover frames.Imperceptibility is the first and foremost requirement of any steganographic approach.Inspired by the fact that human eyes perceive pixel perturbation differently in different video areas,a novel effective and efficient Deeply‐Recursive Attention Network(DRANet)for video steganography to find suitable areas for information hiding via modelling spatio‐temporal attention is proposed.The DRANet mainly contains two important components,a Non‐Local Self‐Attention(NLSA)block and a Non‐Local Co‐Attention(NLCA)block.Specifically,the NLSA block can select the cover frame areas which are suitable for hiding by computing the correlations among inter‐and intra‐cover frames.The NLCA block aims to effectively produce the enhanced representations of the secret frames to enhance the robustness of the model and alleviate the influence of different areas in the secret video.Furthermore,the DRANet reduces the model parameters by performing similar operations on the different frames within an input video recursively.Experimental results show the proposed DRANet achieves better performance with fewer parameters than the state‐of‐the‐art competitors.展开更多
The transmission delay of realtime video packet mainly depends on the sensing time delay(short-term factor) and the entire frame transmission delay(long-term factor).Therefore,the optimization problem in the spectrum ...The transmission delay of realtime video packet mainly depends on the sensing time delay(short-term factor) and the entire frame transmission delay(long-term factor).Therefore,the optimization problem in the spectrum handoff process should be formulated as the combination of microscopic optimization and macroscopic optimization.In this paper,we focus on the issue of combining these two optimization models,and propose a novel Evolution Spectrum Handoff(ESH)strategy to minimize the expected transmission delay of real-time video packet.In the microoptimized model,considering the tradeoff between Primary User's(PU's) allowable collision percentage of each channel and transmission delay of video packet,we propose a mixed integer non-linear programming scheme.The scheme is able to achieve the minimum sensing time which is termed as an optimal stopping time.In the macro-optimized model,using the optimal stopping time as reward function within the partially observable Markov decision process framework,the EHS strategy is designed to search an optimal target channel set and minimize the expected delay of packet in the long-term real-time video transmission.Meanwhile,the minimum expected transmission delay is obtained under practical cognitive radio networks' conditions,i.e.,secondary user's mobility,PU's random access,imperfect sensing information,etc..Theoretical analysis and simulation results show that the ESH strategy can effectively reduce the transmission delay of video packet in spectrum handoff process.展开更多
With the growth of digital media data manipulation in today’s era due to the availability of readily handy tampering software,the authenticity of records is at high risk,especially in video.There is a dire need to de...With the growth of digital media data manipulation in today’s era due to the availability of readily handy tampering software,the authenticity of records is at high risk,especially in video.There is a dire need to detect such problem and do the necessary actions.In this work,we propose an approach to detect the interframe video forgery utilizing the deep features obtained from the parallel deep neural network model and thorough analytical computations.The proposed approach only uses the deep features extracted from the CNN model and then applies the conventional mathematical approach to these features to find the forgery in the video.This work calculates the correlation coefficient from the deep features of the adjacent frames rather than calculating directly from the frames.We divide the procedure of forgery detection into two phases–video forgery detection and video forgery classification.In video forgery detection,this approach detect input video is original or tampered.If the video is not original,then the video is checked in the next phase,which is video forgery classification.In the video forgery classification,method review the forged video for insertion forgery,deletion forgery,and also again check for originality.The proposed work is generalized and it is tested on two different datasets.The experimental results of our proposed model show that our approach can detect the forgery with the accuracy of 91%on VIFFD dataset,90%in TDTV dataset and classify the type of forgery–insertion and deletion with the accuracy of 82%on VIFFD dataset,86%on TDTV dataset.This work can helps in the analysis of original and tempered video in various domain.展开更多
During the past decade, feature extraction and knowledge acquisition based on video analysis have been extensively researched and tested on many applications such as closed-circuit television (CCTV) data analysis, l...During the past decade, feature extraction and knowledge acquisition based on video analysis have been extensively researched and tested on many applications such as closed-circuit television (CCTV) data analysis, large-scale public event control, and other daily security monitoring and surveillance operations with various degrees of success. However, since the actual video process is a multi-phased one and encompasses extensive theories and techniques ranging from fundamental image processing, computational geometry and graphics, and machine vision, to advanced artificial intelligence, pattern analysis, and even cognitive science, there are still many important problems to resolve before it can be widely applied. Among them, video event identification and detection are two prominent ones. Comparing with the most popular frame-to-frame processing mode of most of today's approaches and systems, this project reorganizes video data as a 3D volume structure that provides the hybrid spatial and temporal information in a unified space. This paper reports an innovative technique to transform original video frames to 3D volume structures denoted by spatial and temporal features. It then highlights the volume array structure in a so-called "pre-suspicion" mechanism for a later process. The focus of this report is the development of an effective and efficient voxel-based segmentation technique suitable to the volumetric nature of video events and ready for deployment in 3D clustering operations. The paper is concluded with a performance evaluation of the devised technique and discussion on the future work for accelerating the pre-processing of the original video data.展开更多
Intensity flicker is a common form of degradation in archived film. Most algorithms on this distortion are complicated and uncontrolled. This paper presented a discrete mathematical model of flicker, designed a block-...Intensity flicker is a common form of degradation in archived film. Most algorithms on this distortion are complicated and uncontrolled. This paper presented a discrete mathematical model of flicker, designed a block-based estimation method of the model's parameters according to their features of intensity variation in large area. With this estimation result it constructed a compensation model to repair the current frame. This restoration approach is full automatic and the repair process of current frame does not need the information of frames behind it. The algorithm was realized to establish a simple and adjustable repair system. The experimental results show that the proposed algorithm can remove most intensity flicker and preserve tho wanted effects.展开更多
A novel temporal shape error concealment technique is proposed, which can he used in the context of object-based video coding schemes. In order to reduce the effect of the shape variations of a video object, the curva...A novel temporal shape error concealment technique is proposed, which can he used in the context of object-based video coding schemes. In order to reduce the effect of the shape variations of a video object, the curvature scale space (CSS) technique is adopted to extract features, and then these features are used for boundary matching between the current frame and the previous frame. Because the temporal, spatial and sta- tistical video contour information are all considered, the proposed method can find the optimal matching, which is used to replace the damaged contours. The simulation results show that the proposed algorithm achieves better subjective, objective qualities and higher efficiency than those previously developed methods.展开更多
A novel Snake model with region information is proposed to detect and track moving objects. Generally, the region-information-based approach is sensitive to illumination changes and small movement in the background, w...A novel Snake model with region information is proposed to detect and track moving objects. Generally, the region-information-based approach is sensitive to illumination changes and small movement in the background, while the edge-information-based approach often obtains incorrect results for ambiguous images. The two types of information are introduced in computing the image force. Edge-information-based features make the algorithm fast and robust, and region information makes the active confour energy function obtains correct results for ambiguous images. Furthermore, an automatic contour initialization method using double difference images is given to meet the requirement of video sequence tracking. Meanwhile, a simple forecast section is added to estimate the position of the contour in the algorithm so that it can improve the convergence speed of the active contour. Experimental results show that the computation time of the algorithm is less than 0.1 s/frame. And it can be applied to a real-time system.展开更多
The multi-armored target tracking(MATT)plays a crucial role in coordinated tracking and strike.The occlusion and insertion among targets and target scale variation is the key problems in MATT.Most stateof-the-art mult...The multi-armored target tracking(MATT)plays a crucial role in coordinated tracking and strike.The occlusion and insertion among targets and target scale variation is the key problems in MATT.Most stateof-the-art multi-object tracking(MOT)works adopt the tracking-by-detection strategy,which rely on compute-intensive sliding window or anchoring scheme in detection module and neglect the target scale variation in tracking module.In this work,we proposed a more efficient and effective spatial-temporal attention scheme to track multi-armored target in the ground battlefield.By simulating the structure of the retina,a novel visual-attention Gabor filter branch is proposed to enhance detection.By introducing temporal information,some online learned target-specific Convolutional Neural Networks(CNNs)are adopted to address occlusion.More importantly,we built a MOT dataset for armored targets,called Armored Target Tracking dataset(ATTD),based on which several comparable experiments with state-ofthe-art methods are conducted.Experimental results show that the proposed method achieves outstanding tracking performance and meets the actual application requirements.展开更多
A real-time pedestrian detection and tracking system using a single video camera was developed to monitor pedestrians. This system contained six modules: video flow capture, pre-processing, movement detection, shadow ...A real-time pedestrian detection and tracking system using a single video camera was developed to monitor pedestrians. This system contained six modules: video flow capture, pre-processing, movement detection, shadow removal, tracking, and object classification. The Gaussian mixture model was utilized to extract the moving object from an image sequence segmented by the mean-shift technique in the pre-processing module. Shadow removal was used to alleviate the negative impact of the shadow to the detected objects. A model-free method was adopted to identify pedestrians. The maximum and minimum integration methods were developed to integrate multiple cues into the mean-shift algorithm and the initial tracking iteration with the competent integrated probability distribution map for object tracking. A simple but effective algorithm was proposed to handle full occlusion cases. The system was tested using real traffic videos from different sites. The results of the test confirm that the system is reliable and has an overall accuracy of over 85%.展开更多
The advent of the COVID-19 pandemic has adversely affected the entire world and has put forth high demand for techniques that remotely manage crowd-related tasks.Video surveillance and crowd management using video ana...The advent of the COVID-19 pandemic has adversely affected the entire world and has put forth high demand for techniques that remotely manage crowd-related tasks.Video surveillance and crowd management using video analysis techniques have significantly impacted today’s research,and numerous applications have been developed in this domain.This research proposed an anomaly detection technique applied to Umrah videos in Kaaba during the COVID-19 pandemic through sparse crowd analysis.Managing theKaaba rituals is crucial since the crowd gathers from around the world and requires proper analysis during these days of the pandemic.The Umrah videos are analyzed,and a system is devised that can track and monitor the crowd flow in Kaaba.The crowd in these videos is sparse due to the pandemic,and we have developed a technique to track the maximum crowd flow and detect any object(person)moving in the direction unlikely of the major flow.We have detected abnormal movement by creating the histograms for the vertical and horizontal flows and applying thresholds to identify the non-majority flow.Our algorithm aims to analyze the crowd through video surveillance and timely detect any abnormal activity tomaintain a smooth crowd flowinKaaba during the pandemic.展开更多
The Rate Distortion Optimization(RDO)algorithm in High Efficiency Video Coding(HEVC)has many iterations and a large number of calculations.In order to decrease the calculation time and meet the requirements of fast sw...The Rate Distortion Optimization(RDO)algorithm in High Efficiency Video Coding(HEVC)has many iterations and a large number of calculations.In order to decrease the calculation time and meet the requirements of fast switching of RDO algorithms of different scales,an RDO dynamic reconfigurable structure is proposed.First,the Quantization Parameter(QP)and bit rate values were loaded through an H⁃tree Configurable Network(HCN),and the execution status of the array was detected in real time.When the switching request of the RDO algorithm was detected,the corresponding configuration information was delivered.This self⁃reconfiguration implementation method improved the flexibility and utilization of hardware.Experimental results show that when the control bit width was only increased by 31.25%,the designed configuration network could increase the number of controllable processing units by 32 times,and the execution cycle was 50%lower than the same type of design.Compared with previous RDO algorithm,the RDO algorithm implemented on the reconfigurable array based on the configuration network had an average operating frequency increase of 12.5%and an area reduction of 56.4%.展开更多
This paper describes a dynamically reconfigurable data-flow hardware architecture optimized for the computation of image and video. It is a scalable hierarchically organized parallel architecture that consists of data...This paper describes a dynamically reconfigurable data-flow hardware architecture optimized for the computation of image and video. It is a scalable hierarchically organized parallel architecture that consists of data-flow clusters and finite-state machine (FSM) controllers. Each cluster contains various kinds of ceils that are optimized for video processing. Furthermore, to facilitate the design process, we provide a C-like language for design specification and associated design tools. Some video applications have been implemented in the architecture to demonstrate the applicability and flexibility of the architecture. Experimental results show that the architecture, along with its video applications, can be used in many real-time video processing.展开更多
This paper presents a novel system for violent scenes detection, which is based on machine learning to handle visual and audio features. MKL (Multiple Kernel Learning) is applied so that multimodality of videos can ...This paper presents a novel system for violent scenes detection, which is based on machine learning to handle visual and audio features. MKL (Multiple Kernel Learning) is applied so that multimodality of videos can be maximized. The largest features of our system is that mid-level concepts clustering is proposed and implemented in order to learn mid-level concepts implicitly. By this algorithm, our system does not need manually tagged annotations. The whole system is trained on the dataset from MediaEval 2013 Affect Task and evaluated by its official metric. The obtained results outperformed its best score.展开更多
Developments in neurophysiology focusing on foveal vision have characterized more and more precisely the spatiotemporal processing that is well adapted to the regularization of the visual information within the retina...Developments in neurophysiology focusing on foveal vision have characterized more and more precisely the spatiotemporal processing that is well adapted to the regularization of the visual information within the retina. The works described in this article focus on a simplified architectural model based on features and mechanisms of adaptation in the retina. Similarly to the biological retina, which transforms luminance information into a series of encoded representations of image characteristics transmitted to the brain, our structural model allows us to reveal more information in the scene. Our modeling of the different functional pathways permits the mapping of important complementary information types at abstract levels of image analysis, and thereby allows a better exploitation of visual clues. Our model is based on a distributed cellular automata network and simulates the retinal processing of stimuli that are stationary or in motion. Thanks to its capacity for dynamic adaptation, our model can adapt itself to different scenes (e.g., bright and dim, stationary and moving, etc.) and can parallelize those processing steps that can be supported by parallel calculators.展开更多
Although the scale of the express industry is large, it is difficult toachieve the function of fully intelligent receiving and sending express. In thispaper, the intelligent express delivery system is proposed based o...Although the scale of the express industry is large, it is difficult toachieve the function of fully intelligent receiving and sending express. In thispaper, the intelligent express delivery system is proposed based on the imageand video processing technology of OpenCV, the Faster R-CNN object detectionalgorithm and other technologies. Through the depth camera and electronic scale,it can identify the object category, volume and weight of the items placed on thescale by the sender and store the video of the objects packed into the cabinet. Theoverall framework of the systemwas constructed;key technologies were applied torealize the system;the function of the system was tested. The experimental resultsshow that it achieves the intelligent automation of delivery and delivery throughthe integrated express delivery system of intelligent identification and informationtraceability, which promotes the development of express delivery industry.展开更多
The side information quality has an immense effect on the compression efficiency of the distributed video coding (DVC) sys- tem. This article, based on the hierarchical motion estimation (HME), proposes a new side inf...The side information quality has an immense effect on the compression efficiency of the distributed video coding (DVC) sys- tem. This article, based on the hierarchical motion estimation (HME), proposes a new side information generation algorithm which is integrated into DVC system. First, forward motion estimation (FME) and bidirectional motion estimation (BME) on the basis of variable block size HME algorithm are used to acquire relatively accurate motion vectors. Second, a motion vector filter (MVF) is i...展开更多
We present a system that automatically recovers scene geometry and illumination from a video, providing a basis for various applications. Previous image based illumination estimation methods require either user intera...We present a system that automatically recovers scene geometry and illumination from a video, providing a basis for various applications. Previous image based illumination estimation methods require either user interaction or external information in the form of a database. We adopt structure-from-motion and multi-view stereo for initial scene reconstruction, and then estimate an environment map represented by spherical harmonics (as these perform better than other bases). We also demonstrate several video editing applications that exploit the recovered geometry and illumination, including object insertion (e.g., for augmented reality), shadow detection, and video relighting.展开更多
This paper proposes a new algorithm based on low-rank matrix recovery to remove salt &pepper noise from surveillance video. Unlike single image denoising techniques, noise removal from video sequences aims to util...This paper proposes a new algorithm based on low-rank matrix recovery to remove salt &pepper noise from surveillance video. Unlike single image denoising techniques, noise removal from video sequences aims to utilize both temporal and spatial information. By grouping neighboring frames based on similarities of the whole images in the temporal domain, we formulate the problem of removing salt &pepper noise from a video tracking sequence as a lowrank matrix recovery problem. The resulting nuclear norm and L1-norm related minimization problems can be efficiently solved by many recently developed methods. To determine the low-rank matrix, we use an averaging method based on other similar images. Our method can not only remove noise but also preserve edges and details. The performance of our proposed approach compares favorably to that of existing algorithms and gives better PSNR and SSIM results.展开更多
In this paper we introduce a video post-processing method that enhances the rhythm of a dancing performance,in the sense that the dancing movements are more in time to the beat of the music.The dancing performance as ...In this paper we introduce a video post-processing method that enhances the rhythm of a dancing performance,in the sense that the dancing movements are more in time to the beat of the music.The dancing performance as observed in a video is analyzed and segmented into motion intervals delimited by motion beats.We present an image-space method to extract the motion beats of a video by detecting frames at which there is a significant change in direction or motion stops.The motion beats are then synchronized with the music beats such that as many beats as possible are matched with as little as possible time-warping distortion to the video.We show two applications for this cross-media synchronization:one where a given dance performance is enhanced to be better synchronized with its original music,and one where a given dance video is automatically adapted to be synchronized with different music.展开更多
基金financial support from the Brazilian Federal Agency for Support and Evaluation of Graduate Education(Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior—CAPES,scholarship process no BEX 0506/15-0)the Brazilian National Agency of Petroleum,Natural Gas and Biofuels(Agencia Nacional do Petroleo,Gas Natural e Biocombustiveis—ANP),in cooperation with the Brazilian Financier of Studies and Projects(Financiadora de Estudos e Projetos—FINEP)the Brazilian Ministry of Science,Technology and Innovation(Ministério da Ciencia,Tecnologia e Inovacao—MCTI)through the ANP’s Human Resources Program of the State University of Sao Paulo(Universidade Estadual Paulista—UNESP)for the Oil and Gas Sector PRH-ANP/MCTI no 48(PRH48).
文摘Important in many different sectors of the industry, the determination of stream velocity has become more and more important due to measurements precision necessity, in order to determine the right production rates, determine the volumetric production of undesired fluid, establish automated controls based on these measurements avoiding over-flooding or over-production, guaranteeing accurate predictive maintenance, etc. Difficulties being faced have been the determination of the velocity of specific fluids embedded in some others, for example, determining the gas bubbles stream velocity flowing throughout liquid fluid phase. Although different and already applicable methods have been researched and already implemented within the industry, a non-intrusive automated way of providing those stream velocities has its importance, and may have a huge impact in projects budget. Knowing the importance of its determination, this developed script uses a methodology of breaking-down real-time videos media into frame images, analyzing by pixel correlations possible superposition matches for further gas bubbles stream velocity estimation. In raw sense, the script bases itself in functions and procedures already available in MatLab, which can be used for image processing and treatments, allowing the methodology to be implemented. Its accuracy after the running test was of around 97% (ninety-seven percent);the raw source code with comments had almost 3000 (three thousand) characters;and the hardware placed for running the code was an Intel Core Duo 2.13 [Ghz] and 2 [Gb] RAM memory capable workstation. Even showing good results, it could be stated that just the end point correlations were actually getting to the final solution. So that, making use of self-learning functions or neural network, one could surely enhance the capability of the application to be run in real-time without getting exhaust by iterative loops.
基金supported in part by NSFC(62002320,U19B2043,61672456)the Key R&D Program of Zhejiang Province,China(2021C01119).
文摘Video steganography plays an important role in secret communication that conceals a secret video in a cover video by perturbing the value of pixels in the cover frames.Imperceptibility is the first and foremost requirement of any steganographic approach.Inspired by the fact that human eyes perceive pixel perturbation differently in different video areas,a novel effective and efficient Deeply‐Recursive Attention Network(DRANet)for video steganography to find suitable areas for information hiding via modelling spatio‐temporal attention is proposed.The DRANet mainly contains two important components,a Non‐Local Self‐Attention(NLSA)block and a Non‐Local Co‐Attention(NLCA)block.Specifically,the NLSA block can select the cover frame areas which are suitable for hiding by computing the correlations among inter‐and intra‐cover frames.The NLCA block aims to effectively produce the enhanced representations of the secret frames to enhance the robustness of the model and alleviate the influence of different areas in the secret video.Furthermore,the DRANet reduces the model parameters by performing similar operations on the different frames within an input video recursively.Experimental results show the proposed DRANet achieves better performance with fewer parameters than the state‐of‐the‐art competitors.
基金supported by the National Natural Science Foundation of China under Grant No.61301101
文摘The transmission delay of realtime video packet mainly depends on the sensing time delay(short-term factor) and the entire frame transmission delay(long-term factor).Therefore,the optimization problem in the spectrum handoff process should be formulated as the combination of microscopic optimization and macroscopic optimization.In this paper,we focus on the issue of combining these two optimization models,and propose a novel Evolution Spectrum Handoff(ESH)strategy to minimize the expected transmission delay of real-time video packet.In the microoptimized model,considering the tradeoff between Primary User's(PU's) allowable collision percentage of each channel and transmission delay of video packet,we propose a mixed integer non-linear programming scheme.The scheme is able to achieve the minimum sensing time which is termed as an optimal stopping time.In the macro-optimized model,using the optimal stopping time as reward function within the partially observable Markov decision process framework,the EHS strategy is designed to search an optimal target channel set and minimize the expected delay of packet in the long-term real-time video transmission.Meanwhile,the minimum expected transmission delay is obtained under practical cognitive radio networks' conditions,i.e.,secondary user's mobility,PU's random access,imperfect sensing information,etc..Theoretical analysis and simulation results show that the ESH strategy can effectively reduce the transmission delay of video packet in spectrum handoff process.
文摘With the growth of digital media data manipulation in today’s era due to the availability of readily handy tampering software,the authenticity of records is at high risk,especially in video.There is a dire need to detect such problem and do the necessary actions.In this work,we propose an approach to detect the interframe video forgery utilizing the deep features obtained from the parallel deep neural network model and thorough analytical computations.The proposed approach only uses the deep features extracted from the CNN model and then applies the conventional mathematical approach to these features to find the forgery in the video.This work calculates the correlation coefficient from the deep features of the adjacent frames rather than calculating directly from the frames.We divide the procedure of forgery detection into two phases–video forgery detection and video forgery classification.In video forgery detection,this approach detect input video is original or tampered.If the video is not original,then the video is checked in the next phase,which is video forgery classification.In the video forgery classification,method review the forged video for insertion forgery,deletion forgery,and also again check for originality.The proposed work is generalized and it is tested on two different datasets.The experimental results of our proposed model show that our approach can detect the forgery with the accuracy of 91%on VIFFD dataset,90%in TDTV dataset and classify the type of forgery–insertion and deletion with the accuracy of 82%on VIFFD dataset,86%on TDTV dataset.This work can helps in the analysis of original and tempered video in various domain.
文摘During the past decade, feature extraction and knowledge acquisition based on video analysis have been extensively researched and tested on many applications such as closed-circuit television (CCTV) data analysis, large-scale public event control, and other daily security monitoring and surveillance operations with various degrees of success. However, since the actual video process is a multi-phased one and encompasses extensive theories and techniques ranging from fundamental image processing, computational geometry and graphics, and machine vision, to advanced artificial intelligence, pattern analysis, and even cognitive science, there are still many important problems to resolve before it can be widely applied. Among them, video event identification and detection are two prominent ones. Comparing with the most popular frame-to-frame processing mode of most of today's approaches and systems, this project reorganizes video data as a 3D volume structure that provides the hybrid spatial and temporal information in a unified space. This paper reports an innovative technique to transform original video frames to 3D volume structures denoted by spatial and temporal features. It then highlights the volume array structure in a so-called "pre-suspicion" mechanism for a later process. The focus of this report is the development of an effective and efficient voxel-based segmentation technique suitable to the volumetric nature of video events and ready for deployment in 3D clustering operations. The paper is concluded with a performance evaluation of the devised technique and discussion on the future work for accelerating the pre-processing of the original video data.
基金National Natural Science Foundation ofChina(No.69905003)
文摘Intensity flicker is a common form of degradation in archived film. Most algorithms on this distortion are complicated and uncontrolled. This paper presented a discrete mathematical model of flicker, designed a block-based estimation method of the model's parameters according to their features of intensity variation in large area. With this estimation result it constructed a compensation model to repair the current frame. This restoration approach is full automatic and the repair process of current frame does not need the information of frames behind it. The algorithm was realized to establish a simple and adjustable repair system. The experimental results show that the proposed algorithm can remove most intensity flicker and preserve tho wanted effects.
基金the National Natural Science Foundation of China (60532070)
文摘A novel temporal shape error concealment technique is proposed, which can he used in the context of object-based video coding schemes. In order to reduce the effect of the shape variations of a video object, the curvature scale space (CSS) technique is adopted to extract features, and then these features are used for boundary matching between the current frame and the previous frame. Because the temporal, spatial and sta- tistical video contour information are all considered, the proposed method can find the optimal matching, which is used to replace the damaged contours. The simulation results show that the proposed algorithm achieves better subjective, objective qualities and higher efficiency than those previously developed methods.
文摘A novel Snake model with region information is proposed to detect and track moving objects. Generally, the region-information-based approach is sensitive to illumination changes and small movement in the background, while the edge-information-based approach often obtains incorrect results for ambiguous images. The two types of information are introduced in computing the image force. Edge-information-based features make the algorithm fast and robust, and region information makes the active confour energy function obtains correct results for ambiguous images. Furthermore, an automatic contour initialization method using double difference images is given to meet the requirement of video sequence tracking. Meanwhile, a simple forecast section is added to estimate the position of the contour in the algorithm so that it can improve the convergence speed of the active contour. Experimental results show that the computation time of the algorithm is less than 0.1 s/frame. And it can be applied to a real-time system.
基金This work was supported by the National Key Research and Development Program of China(No.2016YFC0802904)National Natural Science Foundation of China(No.61671470)+1 种基金Natural Science Foundation of Jiangsu Province(BK20161470)62nd batch of funded projects of China Postdoctoral Science Foundation(No.2017M623423).
文摘The multi-armored target tracking(MATT)plays a crucial role in coordinated tracking and strike.The occlusion and insertion among targets and target scale variation is the key problems in MATT.Most stateof-the-art multi-object tracking(MOT)works adopt the tracking-by-detection strategy,which rely on compute-intensive sliding window or anchoring scheme in detection module and neglect the target scale variation in tracking module.In this work,we proposed a more efficient and effective spatial-temporal attention scheme to track multi-armored target in the ground battlefield.By simulating the structure of the retina,a novel visual-attention Gabor filter branch is proposed to enhance detection.By introducing temporal information,some online learned target-specific Convolutional Neural Networks(CNNs)are adopted to address occlusion.More importantly,we built a MOT dataset for armored targets,called Armored Target Tracking dataset(ATTD),based on which several comparable experiments with state-ofthe-art methods are conducted.Experimental results show that the proposed method achieves outstanding tracking performance and meets the actual application requirements.
基金Project(50778015)supported by the National Natural Science Foundation of ChinaProject(2012CB725403)supported by the Major State Basic Research Development Program of China
文摘A real-time pedestrian detection and tracking system using a single video camera was developed to monitor pedestrians. This system contained six modules: video flow capture, pre-processing, movement detection, shadow removal, tracking, and object classification. The Gaussian mixture model was utilized to extract the moving object from an image sequence segmented by the mean-shift technique in the pre-processing module. Shadow removal was used to alleviate the negative impact of the shadow to the detected objects. A model-free method was adopted to identify pedestrians. The maximum and minimum integration methods were developed to integrate multiple cues into the mean-shift algorithm and the initial tracking iteration with the competent integrated probability distribution map for object tracking. A simple but effective algorithm was proposed to handle full occlusion cases. The system was tested using real traffic videos from different sites. The results of the test confirm that the system is reliable and has an overall accuracy of over 85%.
基金The authors extend their appreciation to the Deputyship for Research and Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number QURDO001Project title:Intelligent Real-Time Crowd Monitoring System Using Unmanned Aerial Vehicle(UAV)Video and Global Positioning Systems(GPS)Data。
文摘The advent of the COVID-19 pandemic has adversely affected the entire world and has put forth high demand for techniques that remotely manage crowd-related tasks.Video surveillance and crowd management using video analysis techniques have significantly impacted today’s research,and numerous applications have been developed in this domain.This research proposed an anomaly detection technique applied to Umrah videos in Kaaba during the COVID-19 pandemic through sparse crowd analysis.Managing theKaaba rituals is crucial since the crowd gathers from around the world and requires proper analysis during these days of the pandemic.The Umrah videos are analyzed,and a system is devised that can track and monitor the crowd flow in Kaaba.The crowd in these videos is sparse due to the pandemic,and we have developed a technique to track the maximum crowd flow and detect any object(person)moving in the direction unlikely of the major flow.We have detected abnormal movement by creating the histograms for the vertical and horizontal flows and applying thresholds to identify the non-majority flow.Our algorithm aims to analyze the crowd through video surveillance and timely detect any abnormal activity tomaintain a smooth crowd flowinKaaba during the pandemic.
基金Sponsored by the National Natural Science Foundation of China(Grant Nos.61834005,61772417,61802304,61602377,and 61634004)the Shaanxi Province Coordination Innovation Project of Science and Technology(Grant No.2016KTZDGY02-04-02)+1 种基金the Shaanxi Provincial Key R&D Plan(Grant No.2017GY-060)the Shaanxi International Science and Technology Cooperation Program(Grant No.2018KW-006).
文摘The Rate Distortion Optimization(RDO)algorithm in High Efficiency Video Coding(HEVC)has many iterations and a large number of calculations.In order to decrease the calculation time and meet the requirements of fast switching of RDO algorithms of different scales,an RDO dynamic reconfigurable structure is proposed.First,the Quantization Parameter(QP)and bit rate values were loaded through an H⁃tree Configurable Network(HCN),and the execution status of the array was detected in real time.When the switching request of the RDO algorithm was detected,the corresponding configuration information was delivered.This self⁃reconfiguration implementation method improved the flexibility and utilization of hardware.Experimental results show that when the control bit width was only increased by 31.25%,the designed configuration network could increase the number of controllable processing units by 32 times,and the execution cycle was 50%lower than the same type of design.Compared with previous RDO algorithm,the RDO algorithm implemented on the reconfigurable array based on the configuration network had an average operating frequency increase of 12.5%and an area reduction of 56.4%.
基金Foundation item: the National Natural Science Foundation of China (No. 61136002), the Key Project of Chinese Ministry of Education (No. 211180), and the Shaanxi Provincial Industrial and Technological Project (No. 2011k06-47).
文摘This paper describes a dynamically reconfigurable data-flow hardware architecture optimized for the computation of image and video. It is a scalable hierarchically organized parallel architecture that consists of data-flow clusters and finite-state machine (FSM) controllers. Each cluster contains various kinds of ceils that are optimized for video processing. Furthermore, to facilitate the design process, we provide a C-like language for design specification and associated design tools. Some video applications have been implemented in the architecture to demonstrate the applicability and flexibility of the architecture. Experimental results show that the architecture, along with its video applications, can be used in many real-time video processing.
文摘This paper presents a novel system for violent scenes detection, which is based on machine learning to handle visual and audio features. MKL (Multiple Kernel Learning) is applied so that multimodality of videos can be maximized. The largest features of our system is that mid-level concepts clustering is proposed and implemented in order to learn mid-level concepts implicitly. By this algorithm, our system does not need manually tagged annotations. The whole system is trained on the dataset from MediaEval 2013 Affect Task and evaluated by its official metric. The obtained results outperformed its best score.
文摘Developments in neurophysiology focusing on foveal vision have characterized more and more precisely the spatiotemporal processing that is well adapted to the regularization of the visual information within the retina. The works described in this article focus on a simplified architectural model based on features and mechanisms of adaptation in the retina. Similarly to the biological retina, which transforms luminance information into a series of encoded representations of image characteristics transmitted to the brain, our structural model allows us to reveal more information in the scene. Our modeling of the different functional pathways permits the mapping of important complementary information types at abstract levels of image analysis, and thereby allows a better exploitation of visual clues. Our model is based on a distributed cellular automata network and simulates the retinal processing of stimuli that are stationary or in motion. Thanks to its capacity for dynamic adaptation, our model can adapt itself to different scenes (e.g., bright and dim, stationary and moving, etc.) and can parallelize those processing steps that can be supported by parallel calculators.
基金This article is supported by the 2020 Innovation and Entrepreneurship Training Program forCollege Students in Jiangsu Province(Project name:Traceablemulti-functional intelligent express cabinet,No.201911460090P,No.202011460090T)This article is supported by the National Natural Science Foundation of China Youth Science Foundation project(Project name:Research on Deep Discriminant Spares Representation Learning Method for Feature Extraction,No.61806098)This article is supported by Scientific Research Project of Nanjing XiaoZhuang University(Project name:Multi-robot collaborative system,No.2017NXY16).
文摘Although the scale of the express industry is large, it is difficult toachieve the function of fully intelligent receiving and sending express. In thispaper, the intelligent express delivery system is proposed based on the imageand video processing technology of OpenCV, the Faster R-CNN object detectionalgorithm and other technologies. Through the depth camera and electronic scale,it can identify the object category, volume and weight of the items placed on thescale by the sender and store the video of the objects packed into the cabinet. Theoverall framework of the systemwas constructed;key technologies were applied torealize the system;the function of the system was tested. The experimental resultsshow that it achieves the intelligent automation of delivery and delivery throughthe integrated express delivery system of intelligent identification and informationtraceability, which promotes the development of express delivery industry.
基金National Natural Science Foundation of China (60702012)
文摘The side information quality has an immense effect on the compression efficiency of the distributed video coding (DVC) sys- tem. This article, based on the hierarchical motion estimation (HME), proposes a new side information generation algorithm which is integrated into DVC system. First, forward motion estimation (FME) and bidirectional motion estimation (BME) on the basis of variable block size HME algorithm are used to acquire relatively accurate motion vectors. Second, a motion vector filter (MVF) is i...
基金This work was supported by the National Natural Science Foundation of China (NSFC) and the Israel Science Foundation (ISF), Joint NSFC-ISF Research Program under Grant No. 61561146393, the National Natural Science Foundation of China under Grant No. 61521002, a research grant from the Beijing Higher Institution Engineering Research Center, and the Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.
文摘We present a system that automatically recovers scene geometry and illumination from a video, providing a basis for various applications. Previous image based illumination estimation methods require either user interaction or external information in the form of a database. We adopt structure-from-motion and multi-view stereo for initial scene reconstruction, and then estimate an environment map represented by spherical harmonics (as these perform better than other bases). We also demonstrate several video editing applications that exploit the recovered geometry and illumination, including object insertion (e.g., for augmented reality), shadow detection, and video relighting.
基金supported by the National Nature Science Foundation of China (Nos. 61332015, 61373078, 61272245, and 61272430)NSFC Joint Fund with Guangdong (No. U1201258)
文摘This paper proposes a new algorithm based on low-rank matrix recovery to remove salt &pepper noise from surveillance video. Unlike single image denoising techniques, noise removal from video sequences aims to utilize both temporal and spatial information. By grouping neighboring frames based on similarities of the whole images in the temporal domain, we formulate the problem of removing salt &pepper noise from a video tracking sequence as a lowrank matrix recovery problem. The resulting nuclear norm and L1-norm related minimization problems can be efficiently solved by many recently developed methods. To determine the low-rank matrix, we use an averaging method based on other similar images. Our method can not only remove noise but also preserve edges and details. The performance of our proposed approach compares favorably to that of existing algorithms and gives better PSNR and SSIM results.
文摘In this paper we introduce a video post-processing method that enhances the rhythm of a dancing performance,in the sense that the dancing movements are more in time to the beat of the music.The dancing performance as observed in a video is analyzed and segmented into motion intervals delimited by motion beats.We present an image-space method to extract the motion beats of a video by detecting frames at which there is a significant change in direction or motion stops.The motion beats are then synchronized with the music beats such that as many beats as possible are matched with as little as possible time-warping distortion to the video.We show two applications for this cross-media synchronization:one where a given dance performance is enhanced to be better synchronized with its original music,and one where a given dance video is automatically adapted to be synchronized with different music.