In this paper,based on the field-programmable gate array(FPGA)xc5vlx220 of Xilinx Company,the FPGA verification method for application specific integrated circuit(ASIC)design is introduced.Firstly,the basic principles...In this paper,based on the field-programmable gate array(FPGA)xc5vlx220 of Xilinx Company,the FPGA verification method for application specific integrated circuit(ASIC)design is introduced.Firstly,the basic principles of FPGA verification are introduced.Then,the structure of the FPGA board and the verification methods are illustrated.Finally,the workflow of FPGA verification for audio video coding standard(AVS)decoder and the method of restoring images are introduced in detail.The FPGA resources occupancy is shown and analyzed.The result shows that FPGA can verify the ASIC rapidly and effectively so as to shorten the development cycle.展开更多
In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is de...In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits.展开更多
This paper presented a new solution for motion compensation module in the high definition television (HDTV) video decoder. The overall architecture and the design of the major functional units, such as the motion vect...This paper presented a new solution for motion compensation module in the high definition television (HDTV) video decoder. The overall architecture and the design of the major functional units, such as the motion vector decoder, the predictor, and the mixer, were discussed. Based on the exploitation of the special characteristics inherent in the motion compensation algorithm, the motion compensation module and its functional units adopt various novel architectures in order to allow the module to meet real-time constraints. This solution resolves the problem of high hardware costs, low bus efficiency and complex control schemes in conventional designs.展开更多
This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design...This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design, according to the regularity of the codewords, the first one detector is used to solve the low efficiency and high power dissipation problem within the traditional method of table-searching. Considering the relevance of the data used in the process of runbefore's decoding, arithmetic operation is combined with finite state machine (FSM), which achieves higher decoding efficiency. According to the CAVLC decoding flow, clock gating is employed in the module level and the register level respectively, which reduces 43% of the overall dynamic power dissipation. The proposed design can decode every syntax element in one clock cycle. When the proposed design is synthesized at the clock constraint of 100 MHz, the synthesis result shows that the design costs 11 300 gates under a 0.25 μm CMOS technology, which meets the demand of real time decoding in the H.264/AVC standard.展开更多
This paper presents a formal approach, FSPD (Formal Specifications for Protocols of Decoders), to specify decoder communication protocols. Based on axiomatic, FSPD is a precise language with which programmers could us...This paper presents a formal approach, FSPD (Formal Specifications for Protocols of Decoders), to specify decoder communication protocols. Based on axiomatic, FSPD is a precise language with which programmers could use only one suitable driver to handle various types of decoders. FSPD is helpful for programmers to get high adaptability and reusability of decoder-driver software. Key words formalization - digital video security system - protocol of decoders CLC number TP 311 Biography: YUAN Meng-ting(1976-), Ph.D candidate, research direction: software engineering, formal method.展开更多
时空视频超分辨率(space-time video super-resolution,STVSR)通过时间和空间2个尺度提升视频的质量,从而实现在视频采集设备、传输或者存储有限的情况下依然能实时地呈现高分辨率和高帧率的视频,满足人们对超高清画质的追求。相比两阶...时空视频超分辨率(space-time video super-resolution,STVSR)通过时间和空间2个尺度提升视频的质量,从而实现在视频采集设备、传输或者存储有限的情况下依然能实时地呈现高分辨率和高帧率的视频,满足人们对超高清画质的追求。相比两阶段方法,一阶段方法实现的是特征层面而非像素层面的帧插值,其在推理速度和计算复杂度上都明显更胜一筹。一些现有的一阶段STVSR方法采用基于像素幻觉的特征插值,这幻化了像素,因此很难应对帧间快速运动物体的预测。为此,提出一种基于光流法的金字塔编码器-解码器网络来进行时间特征插值,实现快速的双向光流估计和更真实自然的纹理合成,在使得网络结构更高效的同时弥补了大运动对光流估计带来的不稳定性。另外,空间模块采用基于滑动窗口的局部传播和基于循环网络的双向传播来强化帧对齐,整个网络称为时间特征细化网络(temporal feature refinement netowrk,TFRnet)。为了进一步挖掘TFRnet的潜力,将空间超分辨率先于时间超分辨率(space-first),在几种广泛使用的数据基准和评估指标上的实验证明了所提出方法TFRnet-sf的出色性能,在总体峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(structural similarity,SSIM)提升的同时,插入中间帧的PSNR和SSIM也得到提升,在一定程度上缓和了插入的中间帧与原有帧之间PSNR和SSIM差距过大的问题。展开更多
基金Science and Technology Key Project of Guangzhou(2007Z3-D3101)Production and Research Project of Zhuhai(PC20082002)Technology Innovation Project of Guangdong Province(2008778113)
文摘In this paper,based on the field-programmable gate array(FPGA)xc5vlx220 of Xilinx Company,the FPGA verification method for application specific integrated circuit(ASIC)design is introduced.Firstly,the basic principles of FPGA verification are introduced.Then,the structure of the FPGA board and the verification methods are illustrated.Finally,the workflow of FPGA verification for audio video coding standard(AVS)decoder and the method of restoring images are introduced in detail.The FPGA resources occupancy is shown and analyzed.The result shows that FPGA can verify the ASIC rapidly and effectively so as to shorten the development cycle.
基金supported in part by the National Natural Science Foundation of China under Grant 61873277in part by the Natural Science Basic Research Plan in Shaanxi Province of China underGrant 2020JQ-758in part by the Chinese Postdoctoral Science Foundation under Grant 2020M673446.
文摘In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits.
文摘This paper presented a new solution for motion compensation module in the high definition television (HDTV) video decoder. The overall architecture and the design of the major functional units, such as the motion vector decoder, the predictor, and the mixer, were discussed. Based on the exploitation of the special characteristics inherent in the motion compensation algorithm, the motion compensation module and its functional units adopt various novel architectures in order to allow the module to meet real-time constraints. This solution resolves the problem of high hardware costs, low bus efficiency and complex control schemes in conventional designs.
基金Project supported by the Applied Materials Shanghai Research and Development Foundation (Grant No.08700741000)the Foundation of Shanghai Municipal Education Commission (Grant No.2006AZ068)
文摘This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design, according to the regularity of the codewords, the first one detector is used to solve the low efficiency and high power dissipation problem within the traditional method of table-searching. Considering the relevance of the data used in the process of runbefore's decoding, arithmetic operation is combined with finite state machine (FSM), which achieves higher decoding efficiency. According to the CAVLC decoding flow, clock gating is employed in the module level and the register level respectively, which reduces 43% of the overall dynamic power dissipation. The proposed design can decode every syntax element in one clock cycle. When the proposed design is synthesized at the clock constraint of 100 MHz, the synthesis result shows that the design costs 11 300 gates under a 0.25 μm CMOS technology, which meets the demand of real time decoding in the H.264/AVC standard.
文摘This paper presents a formal approach, FSPD (Formal Specifications for Protocols of Decoders), to specify decoder communication protocols. Based on axiomatic, FSPD is a precise language with which programmers could use only one suitable driver to handle various types of decoders. FSPD is helpful for programmers to get high adaptability and reusability of decoder-driver software. Key words formalization - digital video security system - protocol of decoders CLC number TP 311 Biography: YUAN Meng-ting(1976-), Ph.D candidate, research direction: software engineering, formal method.
文摘时空视频超分辨率(space-time video super-resolution,STVSR)通过时间和空间2个尺度提升视频的质量,从而实现在视频采集设备、传输或者存储有限的情况下依然能实时地呈现高分辨率和高帧率的视频,满足人们对超高清画质的追求。相比两阶段方法,一阶段方法实现的是特征层面而非像素层面的帧插值,其在推理速度和计算复杂度上都明显更胜一筹。一些现有的一阶段STVSR方法采用基于像素幻觉的特征插值,这幻化了像素,因此很难应对帧间快速运动物体的预测。为此,提出一种基于光流法的金字塔编码器-解码器网络来进行时间特征插值,实现快速的双向光流估计和更真实自然的纹理合成,在使得网络结构更高效的同时弥补了大运动对光流估计带来的不稳定性。另外,空间模块采用基于滑动窗口的局部传播和基于循环网络的双向传播来强化帧对齐,整个网络称为时间特征细化网络(temporal feature refinement netowrk,TFRnet)。为了进一步挖掘TFRnet的潜力,将空间超分辨率先于时间超分辨率(space-first),在几种广泛使用的数据基准和评估指标上的实验证明了所提出方法TFRnet-sf的出色性能,在总体峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(structural similarity,SSIM)提升的同时,插入中间帧的PSNR和SSIM也得到提升,在一定程度上缓和了插入的中间帧与原有帧之间PSNR和SSIM差距过大的问题。