摘要
针对H.264视频编码标准关键技术52级标量量化的VLSI实现过程中,传统结构的速度和面积不能有效满足H.264在高速高并行编码应用中的实时要求,通过采用部分CSD码无符号压缩移位加法树、参考电平连线、对量化系数和步长重新进行分组分段编码等方法,有效替代了H.264标量量化过程中出现的矩阵乘法、查表、除法等不利于硬件加速的算法,提出了一种非常适合流水加速的基于4×4块并行的VLSI结构,通过控制级联加法器级数就可以有效调节其速度性能,当级数为2时,其块处理速率可以达到121.6MHz,能够满足4096×2304@120Hz视频的实时处理要求。该结构在面积和功耗方面较传统结构也有较大的改进,采用SMIC 0.13μm工艺单元库,综合时钟频率设为100MHz时,等效门和功耗分别节省了38%和30%。
52-level scalar quantization technology plays an important role in H.264/AVC. A novel parallel VLSI architecture is proposed for its hardware implementation, in which the 4×4 matrix multiplications is replaced by 16 unsigned compressed shift-adder-trees using partial CSD code scheme, switching reference wirings substitutes for look-up operation, and division is also avoided effectively, and no ROM or RAM is adopted in the overall quantizer. It can fulfill all the quantization calculations for all H.264 hybrid transform in 4×4 block parallelism. Its block throughput can reach 121.6 MHz, which can meet the real-time requirement for 4096×2304 @ 120 Hz ( 119.43936 M/s) video compression. Compared with the conventional architecture, 38 % cost and 30% power are saved. Considering speed and cost. optimization, this architecture is very suitable for pipeline acceleration, and it is a useful IP for high resolution H .264 encoder VLSI realization.
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2008年第4期522-526,共5页
Acta Scientiarum Naturalium Universitatis Pekinensis