Video prediction is the problem of generating future frames by exploiting the spatiotemporal correlation from the past frame sequence.It is one of the crucial issues in computer vision and has many real-world applicat...Video prediction is the problem of generating future frames by exploiting the spatiotemporal correlation from the past frame sequence.It is one of the crucial issues in computer vision and has many real-world applications,mainly focused on predicting future scenarios to avoid undesirable outcomes.However,modeling future image content and object is challenging due to the dynamic evolution and complexity of the scene,such as occlusions,camera movements,delay and illumination.Direct frame synthesis or optical-flow estimation are common approaches used by researchers.However,researchers mainly focused on video prediction using one of the approaches.Both methods have limitations,such as direct frame synthesis,usually face blurry prediction due to complex pixel distributions in the scene,and optical-flow estimation,usually produce artifacts due to large object displacements or obstructions in the clip.In this paper,we constructed a deep neural network Frame Prediction Network(FPNet-OF)with multiplebranch inputs(optical flow and original frame)to predict the future video frame by adaptively fusing the future object-motion with the future frame generator.The key idea is to jointly optimize direct RGB frame synthesis and dense optical flow estimation to generate a superior video prediction network.Using various real-world datasets,we experimentally verify that our proposed framework can produce high-level video frame compared to other state-ofthe-art framework.展开更多
A two-stage automatic key frame selection method is proposed to enhance stitching speed and quality for UAV aerial videos. In the first stage, to reduce redundancy, the overlapping rate of the UAV aerial video sequenc...A two-stage automatic key frame selection method is proposed to enhance stitching speed and quality for UAV aerial videos. In the first stage, to reduce redundancy, the overlapping rate of the UAV aerial video sequence within the sampling period is calculated. Lagrange interpolation is used to fit the overlapping rate curve of the sequence. An empirical threshold for the overlapping rate is then applied to filter candidate key frames from the sequence. In the second stage, the principle of minimizing remapping spots is used to dynamically adjust and determine the final key frame close to the candidate key frames. Comparative experiments show that the proposed method significantly improves stitching speed and accuracy by more than 40%.展开更多
A popular and challenging task in video research,frame interpolation aims to increase the frame rate of video.Most existing methods employ a fixed motion model,e.g.,linear,quadratic,or cubic,to estimate the intermedia...A popular and challenging task in video research,frame interpolation aims to increase the frame rate of video.Most existing methods employ a fixed motion model,e.g.,linear,quadratic,or cubic,to estimate the intermediate warping field.However,such fixed motion models cannot well represent the complicated non-linear motions in the real world or rendered animations.Instead,we present an adaptive flow prediction module to better approximate the complex motions in video.Furthermore,interpolating just one intermediate frame between consecutive input frames may be insufficient for complicated non-linear motions.To enable multi-frame interpolation,we introduce the time as a control variable when interpolating frames between original ones in our generic adaptive flow prediction module.Qualitative and quantitative experimental results show that our method can produce high-quality results and outperforms the existing stateof-the-art methods on popular public datasets.展开更多
We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the fo...We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the forwarding resources at a network node. Two buffers are setup on the node to temporarily store the packets for these two types of video applications. For streaming video, a big buffer is used as the associated delay constraint of the application is moderate and a very small buffer is used for conversational video to ensure that the forwarding delay of every packet is limited. A scheduler is located behind these two buffers that dynamically assigns transmission slots on the outgoing link to the two buffers. Rate-distortion side information is used to perform RD-optimized frame dropping in case of node overload. Sharing the data rate on the outgoing link between the con- versational and the streaming videos is done either based on the fullness of the two associated buffers or on the mean incoming rates of the respective videos. Simulation results showed that our proposed RD-optimized frame dropping and scheduling ap- proach provides significant improvements in performance over the popular priority-based random dropping (PRD) technique.展开更多
Identifying inter-frame forgery is a hot topic in video forensics. In this paper, we propose a method based on the assumption that the correlation coefficients of gray values is consistent in an original video, while ...Identifying inter-frame forgery is a hot topic in video forensics. In this paper, we propose a method based on the assumption that the correlation coefficients of gray values is consistent in an original video, while in forgeries the consistency will be destroyed. We first extract the consistency of correlation coefficients of gray values (CCCoGV for short) after normalization and quantization as distinguishing feature to identify interframe forgeries. Then we test the CCCoGV in a large database with the help of SVM (Support Vector Machine). Experimental results show that the proposed method is efficient in classifying original videos and forgeries. Furthermore, the proposed method performs also pretty well in classifying frame insertion and frame deletion forgeries.展开更多
This paper presents a novel technique for embedding a digital watermark into video frames based on motion vectors and discrete wavelet transform (DWT). In the proposed scheme, the binary image watermark is divided int...This paper presents a novel technique for embedding a digital watermark into video frames based on motion vectors and discrete wavelet transform (DWT). In the proposed scheme, the binary image watermark is divided into blocks and each watermark block is embedded several times in each selected video frame at different locations. The block-based motion estimation algorithm is used to select the video frame blocks having the greatest motion vectors magnitude. The DWT is applied to the selected frame blocks, and then, the watermark block is hidden into these blocks by modifying the coefficients of the Horizontal sub-bands (HL). Adding the watermark at different locations in the same video frame makes the scheme more robust against different types of attacks. The method was tested on different types of videos. The average peak signal to noise ratio (PSNR) and the normalized correlation (NC) are used to measure the performance of the proposed method. Experimental results show that the proposed algorithm does not affect the visual quality of video frames and the scheme is robust against a variety of attacks.展开更多
Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opini...Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opinion. However, despite deepfake technology’s risks, current deepfake detection methods lack generalization and are inconsistent when applied to unknown videos, i.e., videos on which they have not been trained. The purpose of this study is to develop a generalizable deepfake detection model by training convoluted neural networks (CNNs) to classify human facial features in videos. The study formulated the research questions: “How effectively does the developed model provide reliable generalizations?” A CNN model was trained to distinguish between real and fake videos using the facial features of human subjects in videos. The model was trained, validated, and tested using the FaceForensiq++ dataset, which contains more than 500,000 frames and subsets of the DFDC dataset, totaling more than 22,000 videos. The study demonstrated high generalizability, as the accuracy of the unknown dataset was only marginally (about 1%) lower than that of the known dataset. The findings of this study indicate that detection systems can be more generalizable, lighter, and faster by focusing on just a small region (the human face) of an entire video.展开更多
Video shreds of evidence are usually admissible in the court of law all over the world. However, individuals manipulate these videos to either defame or incriminate innocent people. Others indulge in video tampering t...Video shreds of evidence are usually admissible in the court of law all over the world. However, individuals manipulate these videos to either defame or incriminate innocent people. Others indulge in video tampering to falsely escape the wrath of the law against misconducts. One way impostors can forge these videos is through inter-frame video forgery. Thus, the integrity of such videos is under threat. This is because these digital forgeries seriously debase the credibility of video contents as being definite records of events. <span style="font-family:Verdana;">This leads to an increasing concern about the trustworthiness of video contents. Hence, it continues to affect the social and legal system, forensic investigations, intelligence services, and security and surveillance systems as the case may be. The problem of inter-frame video forgery is increasingly spontaneous as more video-editing software continues to emerge. These video editing tools can easily manipulate videos without leaving obvious traces and these tampered videos become viral. Alarmingly, even the beginner users of these editing tools can alter the contents of digital videos in a manner that renders them practically indistinguishable from the original content by mere observations. </span><span style="font-family:Verdana;">This paper, however, leveraged on the concept of correlation coefficients to produce a more elaborate and reliable inter-frame video detection to aid forensic investigations, especially in Nigeria. The model employed the use of the idea of a threshold to efficiently distinguish forged videos from authentic videos. A benchmark and locally manipulated video datasets were used to evaluate the proposed model. Experimentally, our approach performed better than the existing methods. The overall accuracy for all the evaluation metrics such as accuracy, recall, precision and F1-score was 100%. The proposed method implemented in the MATLAB programming language has proven to effectively detect inter-frame forgeries.</span>展开更多
In video surveillance, there are many interference factors such as target changes, complex scenes, and target deformation in the moving object tracking. In order to resolve this issue, based on the comparative analysi...In video surveillance, there are many interference factors such as target changes, complex scenes, and target deformation in the moving object tracking. In order to resolve this issue, based on the comparative analysis of several common moving object detection methods, a moving object detection and recognition algorithm combined frame difference with background subtraction is presented in this paper. In the algorithm, we first calculate the average of the values of the gray of the continuous multi-frame image in the dynamic image, and then get background image obtained by the statistical average of the continuous image sequence, that is, the continuous interception of the N-frame images are summed, and find the average. In this case, weight of object information has been increasing, and also restrains the static background. Eventually the motion detection image contains both the target contour and more target information of the target contour point from the background image, so as to achieve separating the moving target from the image. The simulation results show the effectiveness of the proposed algorithm.展开更多
This paper proposes a mobile video surveillance system consisting of intelligent video analysis and mobile communication networking. This multilevel distillation approach helps mobile users monitor tremendous surveill...This paper proposes a mobile video surveillance system consisting of intelligent video analysis and mobile communication networking. This multilevel distillation approach helps mobile users monitor tremendous surveillance videos on demand through video streaming over mobile communication networks. The intelligent video analysis includes moving object detection/tracking and key frame selection which can browse useful video clips. The communication networking services, comprising video transcoding, multimedia messaging, and mobile video streaming, transmit surveillance information into mobile appliances. Moving object detection is achieved by background subtraction and particle filter tracking. Key frame selection, which aims to deliver an alarm to a mobile client using multimedia messaging service accompanied with an extracted clear frame, is reached by devising a weighted importance criterion considering object clarity and face appearance. Besides, a spatial- domain cascaded transcoder is developed to convert the filtered image sequence of detected objects into the mobile video streaming format. Experimental results show that the system can successfully detect all events of moving objects for a complex surveillance scene, choose very appropriate key frames for users, and transcode the images with a high power signal-to-noise ratio (PSNR).展开更多
An approach to detection of moving objects in video sequences, with application to video surveillance is presented. The algorithm combines two kinds of change points, which are detected from the region-based frame dif...An approach to detection of moving objects in video sequences, with application to video surveillance is presented. The algorithm combines two kinds of change points, which are detected from the region-based frame difference and adjusted background subtraction. An adaptive threshold technique is employed to automatically choose the threshold value to segment the moving objects from the still background. And experiment results show that the algorithm is effective and efficient in practical situations. Furthermore, the algorithm is robust to the effects of the changing of lighting condition and can be applied for video surveillance system.展开更多
This paper introduces a system based on Tls fifth generation DSP(Digital Signal Processor) device-TMS320C50 to construct the simplest system of digitalizing underwater video signal. The system realizes collecting 3 di...This paper introduces a system based on Tls fifth generation DSP(Digital Signal Processor) device-TMS320C50 to construct the simplest system of digitalizing underwater video signal. The system realizes collecting 3 different density image data by means of software designation. The system may expand its outer data memory to 4 Giga byte by using a technology of memory page extension. Two different interface circuits for different speed peripheral devices and C50 are also designed: one is high speed A/D, and the other is static memory whose access time is 70ns. The system can digitalize analog video signal and process the gathered data in limited time.展开更多
Error-resilient video communication over lossy packet networks is often designed and operated based on models for the effect of losses on the reconstructed video quality. This paper analyzes the channel distortion for...Error-resilient video communication over lossy packet networks is often designed and operated based on models for the effect of losses on the reconstructed video quality. This paper analyzes the channel distortion for video over lossy packet networks and proposes a new model that, compared to previous models, more accurately estimates the expected mean-squared error distortion for different packet loss patterns by accounting for inter-frame error propagation and the correlation between error frames. The accuracy of the proposed model is validated with JVT/H.264 encoded standard test sequences and previous frame concealment, where the proposed model provides an obvious accuracy gain over previous models.展开更多
MPEG-4 AVC encoded video streams have been analyzed using video traces and statistical features have been extracted, in the context of supporting efficient deployment of networked and multimedia services. The statisti...MPEG-4 AVC encoded video streams have been analyzed using video traces and statistical features have been extracted, in the context of supporting efficient deployment of networked and multimedia services. The statistical features include the number of scenes composing the video and the sizes of different types of frames, within the overall trace and each scene. Statistical processing has been performed upon the traces and subsequent fitting upon statistical distributions (Pareto and lognormal). Through the construction of a synthetic trace, based upon this analysis, our selections of statistical distribution have been verified. In addition, different types of content, in terms of level of activity (quantified as different scene change ratio) have been considered. Through modelling and fitting, the stability of the main statistical parameters has been verified as well as observations on the dependence of these parameters upon the video activity level.展开更多
街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的...街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的问题,本文提出锚框校准和空间位置信息补偿视频实例分割(Anchor frame calibration and Spatial position information compensation for Video Instance Segmentation,AS-VIS)网络.首先,在预测头3个分支中添加锚框校准模块实现同锚框纵横比匹配的多类型感受野采样,解决目标边缘提取不充分问题.其次,设计多感受野下采样模块将各种感受野采样后的特征融合,解决下采样信息缺失问题.最后,应用多感受野下采样模块将特征金字塔低层目标区域激活特征映射嵌入到高层中实现空间位置信息补偿,解决高层特征空间细节位置信息匮乏问题.在Youtube-VIS标准库中提取街道场景视频数据集,其中包括训练集329个视频和验证集53个视频.实验结果与YolactEdge检测和分割精度指标定量对比表明,锚框校准平均精度分别提升8.63%和5.09%,空间位置信息补偿特征金字塔平均精度分别提升7.76%和4.75%,AS-VIS总体平均精度分别提升9.26%和6.46%.本文方法实现了街道场景视频序列实例级同步检测、跟踪与分割,为无人驾驶车辆环境感知提供有效的理论依据.展开更多
A novel technique for the video watermarking based on the discrete wavelet transform (DWT) is present. The intra frames of video are transformed to three gray image firstly, and then the 2th-level discrete wavelet dec...A novel technique for the video watermarking based on the discrete wavelet transform (DWT) is present. The intra frames of video are transformed to three gray image firstly, and then the 2th-level discrete wavelet decomposition of the gray images is computed, with which the watermark W is embedded simultaneously into and invert wavelet transform is done to obtain the gray images which contain the secret information. Change the intra frames of video based on the three gray images to make the intra frame contain the secret information. While extracting the secret information, the intra frames are transformed to three gray image, 2th-level discrete wavelet transform is done to the gray images, and the watermark W’ is distilled from the wavelet coefficients of the three gray images. The test results show the superior performance of the technique and potential for the watermarking of video.展开更多
基金supported by Incheon NationalUniversity Research Grant in 2017.
文摘Video prediction is the problem of generating future frames by exploiting the spatiotemporal correlation from the past frame sequence.It is one of the crucial issues in computer vision and has many real-world applications,mainly focused on predicting future scenarios to avoid undesirable outcomes.However,modeling future image content and object is challenging due to the dynamic evolution and complexity of the scene,such as occlusions,camera movements,delay and illumination.Direct frame synthesis or optical-flow estimation are common approaches used by researchers.However,researchers mainly focused on video prediction using one of the approaches.Both methods have limitations,such as direct frame synthesis,usually face blurry prediction due to complex pixel distributions in the scene,and optical-flow estimation,usually produce artifacts due to large object displacements or obstructions in the clip.In this paper,we constructed a deep neural network Frame Prediction Network(FPNet-OF)with multiplebranch inputs(optical flow and original frame)to predict the future video frame by adaptively fusing the future object-motion with the future frame generator.The key idea is to jointly optimize direct RGB frame synthesis and dense optical flow estimation to generate a superior video prediction network.Using various real-world datasets,we experimentally verify that our proposed framework can produce high-level video frame compared to other state-ofthe-art framework.
文摘A two-stage automatic key frame selection method is proposed to enhance stitching speed and quality for UAV aerial videos. In the first stage, to reduce redundancy, the overlapping rate of the UAV aerial video sequence within the sampling period is calculated. Lagrange interpolation is used to fit the overlapping rate curve of the sequence. An empirical threshold for the overlapping rate is then applied to filter candidate key frames from the sequence. In the second stage, the principle of minimizing remapping spots is used to dynamically adjust and determine the final key frame close to the candidate key frames. Comparative experiments show that the proposed method significantly improves stitching speed and accuracy by more than 40%.
基金supported by the Research Grants Council of the Hong Kong Special Administrative Region,under RGC General Research Fund(Project No.CUHK 14201017)Shenzhen Science and Technology Program(No.JCYJ20180507182410327)the Science and Technology Plan Project of Guangzhou(No.201704020141)。
文摘A popular and challenging task in video research,frame interpolation aims to increase the frame rate of video.Most existing methods employ a fixed motion model,e.g.,linear,quadratic,or cubic,to estimate the intermediate warping field.However,such fixed motion models cannot well represent the complicated non-linear motions in the real world or rendered animations.Instead,we present an adaptive flow prediction module to better approximate the complex motions in video.Furthermore,interpolating just one intermediate frame between consecutive input frames may be insufficient for complicated non-linear motions.To enable multi-frame interpolation,we introduce the time as a control variable when interpolating frames between original ones in our generic adaptive flow prediction module.Qualitative and quantitative experimental results show that our method can produce high-quality results and outperforms the existing stateof-the-art methods on popular public datasets.
基金Project (No. STE1093/1-1) supported by the German ResearchFoundation, Germany
文摘We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the forwarding resources at a network node. Two buffers are setup on the node to temporarily store the packets for these two types of video applications. For streaming video, a big buffer is used as the associated delay constraint of the application is moderate and a very small buffer is used for conversational video to ensure that the forwarding delay of every packet is limited. A scheduler is located behind these two buffers that dynamically assigns transmission slots on the outgoing link to the two buffers. Rate-distortion side information is used to perform RD-optimized frame dropping in case of node overload. Sharing the data rate on the outgoing link between the con- versational and the streaming videos is done either based on the fullness of the two associated buffers or on the mean incoming rates of the respective videos. Simulation results showed that our proposed RD-optimized frame dropping and scheduling ap- proach provides significant improvements in performance over the popular priority-based random dropping (PRD) technique.
文摘Identifying inter-frame forgery is a hot topic in video forensics. In this paper, we propose a method based on the assumption that the correlation coefficients of gray values is consistent in an original video, while in forgeries the consistency will be destroyed. We first extract the consistency of correlation coefficients of gray values (CCCoGV for short) after normalization and quantization as distinguishing feature to identify interframe forgeries. Then we test the CCCoGV in a large database with the help of SVM (Support Vector Machine). Experimental results show that the proposed method is efficient in classifying original videos and forgeries. Furthermore, the proposed method performs also pretty well in classifying frame insertion and frame deletion forgeries.
文摘This paper presents a novel technique for embedding a digital watermark into video frames based on motion vectors and discrete wavelet transform (DWT). In the proposed scheme, the binary image watermark is divided into blocks and each watermark block is embedded several times in each selected video frame at different locations. The block-based motion estimation algorithm is used to select the video frame blocks having the greatest motion vectors magnitude. The DWT is applied to the selected frame blocks, and then, the watermark block is hidden into these blocks by modifying the coefficients of the Horizontal sub-bands (HL). Adding the watermark at different locations in the same video frame makes the scheme more robust against different types of attacks. The method was tested on different types of videos. The average peak signal to noise ratio (PSNR) and the normalized correlation (NC) are used to measure the performance of the proposed method. Experimental results show that the proposed algorithm does not affect the visual quality of video frames and the scheme is robust against a variety of attacks.
文摘Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opinion. However, despite deepfake technology’s risks, current deepfake detection methods lack generalization and are inconsistent when applied to unknown videos, i.e., videos on which they have not been trained. The purpose of this study is to develop a generalizable deepfake detection model by training convoluted neural networks (CNNs) to classify human facial features in videos. The study formulated the research questions: “How effectively does the developed model provide reliable generalizations?” A CNN model was trained to distinguish between real and fake videos using the facial features of human subjects in videos. The model was trained, validated, and tested using the FaceForensiq++ dataset, which contains more than 500,000 frames and subsets of the DFDC dataset, totaling more than 22,000 videos. The study demonstrated high generalizability, as the accuracy of the unknown dataset was only marginally (about 1%) lower than that of the known dataset. The findings of this study indicate that detection systems can be more generalizable, lighter, and faster by focusing on just a small region (the human face) of an entire video.
文摘Video shreds of evidence are usually admissible in the court of law all over the world. However, individuals manipulate these videos to either defame or incriminate innocent people. Others indulge in video tampering to falsely escape the wrath of the law against misconducts. One way impostors can forge these videos is through inter-frame video forgery. Thus, the integrity of such videos is under threat. This is because these digital forgeries seriously debase the credibility of video contents as being definite records of events. <span style="font-family:Verdana;">This leads to an increasing concern about the trustworthiness of video contents. Hence, it continues to affect the social and legal system, forensic investigations, intelligence services, and security and surveillance systems as the case may be. The problem of inter-frame video forgery is increasingly spontaneous as more video-editing software continues to emerge. These video editing tools can easily manipulate videos without leaving obvious traces and these tampered videos become viral. Alarmingly, even the beginner users of these editing tools can alter the contents of digital videos in a manner that renders them practically indistinguishable from the original content by mere observations. </span><span style="font-family:Verdana;">This paper, however, leveraged on the concept of correlation coefficients to produce a more elaborate and reliable inter-frame video detection to aid forensic investigations, especially in Nigeria. The model employed the use of the idea of a threshold to efficiently distinguish forged videos from authentic videos. A benchmark and locally manipulated video datasets were used to evaluate the proposed model. Experimentally, our approach performed better than the existing methods. The overall accuracy for all the evaluation metrics such as accuracy, recall, precision and F1-score was 100%. The proposed method implemented in the MATLAB programming language has proven to effectively detect inter-frame forgeries.</span>
文摘In video surveillance, there are many interference factors such as target changes, complex scenes, and target deformation in the moving object tracking. In order to resolve this issue, based on the comparative analysis of several common moving object detection methods, a moving object detection and recognition algorithm combined frame difference with background subtraction is presented in this paper. In the algorithm, we first calculate the average of the values of the gray of the continuous multi-frame image in the dynamic image, and then get background image obtained by the statistical average of the continuous image sequence, that is, the continuous interception of the N-frame images are summed, and find the average. In this case, weight of object information has been increasing, and also restrains the static background. Eventually the motion detection image contains both the target contour and more target information of the target contour point from the background image, so as to achieve separating the moving target from the image. The simulation results show the effectiveness of the proposed algorithm.
文摘This paper proposes a mobile video surveillance system consisting of intelligent video analysis and mobile communication networking. This multilevel distillation approach helps mobile users monitor tremendous surveillance videos on demand through video streaming over mobile communication networks. The intelligent video analysis includes moving object detection/tracking and key frame selection which can browse useful video clips. The communication networking services, comprising video transcoding, multimedia messaging, and mobile video streaming, transmit surveillance information into mobile appliances. Moving object detection is achieved by background subtraction and particle filter tracking. Key frame selection, which aims to deliver an alarm to a mobile client using multimedia messaging service accompanied with an extracted clear frame, is reached by devising a weighted importance criterion considering object clarity and face appearance. Besides, a spatial- domain cascaded transcoder is developed to convert the filtered image sequence of detected objects into the mobile video streaming format. Experimental results show that the system can successfully detect all events of moving objects for a complex surveillance scene, choose very appropriate key frames for users, and transcode the images with a high power signal-to-noise ratio (PSNR).
文摘An approach to detection of moving objects in video sequences, with application to video surveillance is presented. The algorithm combines two kinds of change points, which are detected from the region-based frame difference and adjusted background subtraction. An adaptive threshold technique is employed to automatically choose the threshold value to segment the moving objects from the still background. And experiment results show that the algorithm is effective and efficient in practical situations. Furthermore, the algorithm is robust to the effects of the changing of lighting condition and can be applied for video surveillance system.
文摘视频合成孔径雷达(video synthetic aperture radar,VideoSAR)的超长相干孔径观测使得区域动态信息的快速浏览极其困难。为以机器视觉方式自动捕捉地物散射消失-瞬态持续-消失-瞬态持续-消失的关键帧变化全过程,提出了一种子孔径能量梯度(subaperture energy gradient,SEG)和低秩与稀疏分解(low-rank plus sparse decomposition,LRSD)相结合的VideoSAR关键帧提取器。提取器为系列性通用架构,适用于任何SEG和LRSD系列方法相结合的形式。所提技术首要针对同时单通道、单波段、单航迹等有限信息条件的解决途径,有助于打破应急响应场景中难以采集多通道、多波段、多航迹或多传感器数据的应用局限性。基于实测数据处理和多种先进LRSD算法进行了对比验证,其代表性散射信息的充分提取可促进未来快速地理解并浓缩区域动态。
文摘This paper introduces a system based on Tls fifth generation DSP(Digital Signal Processor) device-TMS320C50 to construct the simplest system of digitalizing underwater video signal. The system realizes collecting 3 different density image data by means of software designation. The system may expand its outer data memory to 4 Giga byte by using a technology of memory page extension. Two different interface circuits for different speed peripheral devices and C50 are also designed: one is high speed A/D, and the other is static memory whose access time is 70ns. The system can digitalize analog video signal and process the gathered data in limited time.
基金Project (No. Y2001005) supported by the Natural Science Foundation of Shandong Province, China
文摘Error-resilient video communication over lossy packet networks is often designed and operated based on models for the effect of losses on the reconstructed video quality. This paper analyzes the channel distortion for video over lossy packet networks and proposes a new model that, compared to previous models, more accurately estimates the expected mean-squared error distortion for different packet loss patterns by accounting for inter-frame error propagation and the correlation between error frames. The accuracy of the proposed model is validated with JVT/H.264 encoded standard test sequences and previous frame concealment, where the proposed model provides an obvious accuracy gain over previous models.
文摘MPEG-4 AVC encoded video streams have been analyzed using video traces and statistical features have been extracted, in the context of supporting efficient deployment of networked and multimedia services. The statistical features include the number of scenes composing the video and the sizes of different types of frames, within the overall trace and each scene. Statistical processing has been performed upon the traces and subsequent fitting upon statistical distributions (Pareto and lognormal). Through the construction of a synthetic trace, based upon this analysis, our selections of statistical distribution have been verified. In addition, different types of content, in terms of level of activity (quantified as different scene change ratio) have been considered. Through modelling and fitting, the stability of the main statistical parameters has been verified as well as observations on the dependence of these parameters upon the video activity level.
文摘街道场景视频实例分割是无人驾驶技术研究中的关键问题之一,可为车辆在街道场景下的环境感知和路径规划提供决策依据.针对现有方法存在多纵横比锚框应用单一感受野采样导致边缘特征提取不充分以及高层特征金字塔空间细节位置信息匮乏的问题,本文提出锚框校准和空间位置信息补偿视频实例分割(Anchor frame calibration and Spatial position information compensation for Video Instance Segmentation,AS-VIS)网络.首先,在预测头3个分支中添加锚框校准模块实现同锚框纵横比匹配的多类型感受野采样,解决目标边缘提取不充分问题.其次,设计多感受野下采样模块将各种感受野采样后的特征融合,解决下采样信息缺失问题.最后,应用多感受野下采样模块将特征金字塔低层目标区域激活特征映射嵌入到高层中实现空间位置信息补偿,解决高层特征空间细节位置信息匮乏问题.在Youtube-VIS标准库中提取街道场景视频数据集,其中包括训练集329个视频和验证集53个视频.实验结果与YolactEdge检测和分割精度指标定量对比表明,锚框校准平均精度分别提升8.63%和5.09%,空间位置信息补偿特征金字塔平均精度分别提升7.76%和4.75%,AS-VIS总体平均精度分别提升9.26%和6.46%.本文方法实现了街道场景视频序列实例级同步检测、跟踪与分割,为无人驾驶车辆环境感知提供有效的理论依据.
基金Supported by the Science and Technology Plan Foundation of GuangDong (No. 2004B16001006) and the Science and Technology Plan Foundation of Dongguan (No. 2004D1015)
文摘A novel technique for the video watermarking based on the discrete wavelet transform (DWT) is present. The intra frames of video are transformed to three gray image firstly, and then the 2th-level discrete wavelet decomposition of the gray images is computed, with which the watermark W is embedded simultaneously into and invert wavelet transform is done to obtain the gray images which contain the secret information. Change the intra frames of video based on the three gray images to make the intra frame contain the secret information. While extracting the secret information, the intra frames are transformed to three gray image, 2th-level discrete wavelet transform is done to the gray images, and the watermark W’ is distilled from the wavelet coefficients of the three gray images. The test results show the superior performance of the technique and potential for the watermarking of video.