A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) mult...A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.展开更多
A novel asynchronous ACS(add-compare-select) processor for Viterbi decoder is described.It is controlled by local handshake signals instead of the globe clock.The circuits of asynchronous adder unit,asynchronous compa...A novel asynchronous ACS(add-compare-select) processor for Viterbi decoder is described.It is controlled by local handshake signals instead of the globe clock.The circuits of asynchronous adder unit,asynchronous comparator unit,and asynchronous selector unit are proposed.A full-custom design of asynchronous 4-bit ACS processor is fabricated in CSMC-HJ 0.6μm CMOS 2P2M mixed-mode process.At a supply voltage of 5V,when it operates at 20MHz,the power consumption is 75.5mW.The processor has no dynamic power consumption when it awaits an opportunity in sleep mode.The results of performance test of asynchronous 4-bit ACS processor show that the average case response time 19.18ns is only 82% of the worst-case response time 23.37ns.Compared with the synchronous 4-bit ACS processor in power consumption and performance by simulation,it reveals that the asynchronous ACS processor has some advantages than the synchronous one.展开更多
The enhanced variable rate codec (EVRC) is a standard for the 'Speech ServiceOption 3 for Wideband Spread Spectrum Digital System,' which has been employed in both IS-95cellular systems and ANSI J-STC-008 PCS ...The enhanced variable rate codec (EVRC) is a standard for the 'Speech ServiceOption 3 for Wideband Spread Spectrum Digital System,' which has been employed in both IS-95cellular systems and ANSI J-STC-008 PCS (personal communications systems). This paper concentrateson channel decoders that exploit the residual redundancy inherent in the enhanced variable ratecodec bitstream. This residual redundancy is quantified by modeling the parameters as first orderMarkov chains and computing the entropy rate based on the relative frequencies of transitions.Moreover, this residual redundancy can be exploited by an appropriately 'tuned' channel decoder toprovide substantial coding gain when compared with the decoders that do not exploit it. Channelcoding schemes include convolutional codes, and iteratively decoded parallel concatenatedconvolutional 'turbo' codes.展开更多
Abstract: The layered decoding algorithm has been widely used in the implementation of Low Density Parity Check (LDPC) decoders, due to its high convergence speed. However, the pipeline operation of the layered dec...Abstract: The layered decoding algorithm has been widely used in the implementation of Low Density Parity Check (LDPC) decoders, due to its high convergence speed. However, the pipeline operation of the layered decoder may introduce memory access conflicts, which heavily deteriorates the decoder throughput. To essentially deal with the issue of memory access conflicts,展开更多
In the Davey-MacKay(DM) construction,the inner decoder treats unknown transmitted bits as random independent substitution errors. It limits the synchronization capability of the inner decoder, and thus weakens the err...In the Davey-MacKay(DM) construction,the inner decoder treats unknown transmitted bits as random independent substitution errors. It limits the synchronization capability of the inner decoder, and thus weakens the error-correcting capability of the DM construction.In order to improve the performance of the DM construction, an iterative decoding scheme is proposed, which iteratively utilizes the more accurate estimates of transmitted codewords. In the proposed scheme, the estimated average bit error rates and the estimated low-density parity-check(LDPC) codewords from the outer decoder are fed back into the inner decoder to update the synchronization process. Simulation results show that the proposed iterative decoding scheme significantly outperforms the traditional DM construction.展开更多
In orthogonal frequency division multiplexing (OFDM) based multihop communications, the conventional decodeand-forward (DF) relay scheme severely suffers from the error propagation problem. This drawback is seriou...In orthogonal frequency division multiplexing (OFDM) based multihop communications, the conventional decodeand-forward (DF) relay scheme severely suffers from the error propagation problem. This drawback is serious in multihop networks as errors made by any relay node may fail the decoder at the destination in great chance. In this paper, we propose a bit error rate (BER) modified DF protocol (BMDF) which can be applied to systems where error correction channel coding and M-ary modulation are used. By modeling all links except the last one as a binary symmetric channel (BSC), we derive a log likelihood ratio (LLR) modification function relying only on the accumulated BER of all previous links to be applied to the output of the soft demapper. Furthermore, to reduce the computational complexity and signaling overhead, the modification function is simplified from its original exponential expression and less BERs are delivered between nodes by making successive subcarriers share the same BER. In addition, for situations where the channel state information (CSI) of forward link is available, the proposed BMDF can be further enhanced by combining with subcarrier pairing (SP) and power allocation (PA), where a sorted-channel gain SP scheme and a greedy PA algorithm are proposed. The simulation results verify thesignificant performance improvement to the conventional DF.展开更多
Polar codes have become increasingly popular recently because of their capacity achieving property.In this paper,a memory efficient stage-combined belief propagation(BP) decoder design for polar codes is presented.Fir...Polar codes have become increasingly popular recently because of their capacity achieving property.In this paper,a memory efficient stage-combined belief propagation(BP) decoder design for polar codes is presented.Firstly,we briefly reviewed the conventional BP decoding algorithm.Then a stage-combined BP decoding algorithm which combines two adjacent stages into one stage and the corresponding belief message updating rules are introduced.Based on this stage-combined decoding algorithm,a memory-efficient polar BP decoder is designed.The demonstrated decoder design achieves 50%memory and decoding latency reduction in the cost of some combinational logic complexity overhead.The proposed decoder is synthesized under TSMC 45 nm Low Power CMOS technology.It achieves 0.96 Gb/s throughput with 14.2mm^2 area when code length N=2^(16)which reduces 51.5%decoder area compared with the conventional decoder design.展开更多
Forward-backward algorithm, used by watermark decoder for correcting non-binary synchronization errors, requires to traverse a very large scale trellis in order to achieve the proper posterior probability, leading to ...Forward-backward algorithm, used by watermark decoder for correcting non-binary synchronization errors, requires to traverse a very large scale trellis in order to achieve the proper posterior probability, leading to high computational complexity. In order to reduce the number of the states involved in the computation, an adaptive pruning method for the trellis is proposed. In this scheme, we prune the states which have the low forward-backward quantities below a carefully-chosen threshold. Thus, a wandering trellis with much less states is achieved, which contains most of the states with quite high probability. Simulation results reveal that, with the proper scaling factor, significant complexity reduction in the forward-backward algorithm is achieved at the expense of slight performance degradation.展开更多
Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development...Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio (SR) systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element (SPE) of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform.展开更多
Steganography based on bits-modification of speech frames is a kind of commonly used method, which targets at RTP payloads and offers covert communications over voice-over-IP(Vo IP). However, direct modification on fr...Steganography based on bits-modification of speech frames is a kind of commonly used method, which targets at RTP payloads and offers covert communications over voice-over-IP(Vo IP). However, direct modification on frames is often independent of the inherent speech features, which may lead to great degradation of speech quality. A novel frame-bitrate-change based steganography is proposed in this work, which discovers a novel covert channel for Vo IP and introduces less distortion. This method exploits the feature of multi-rate speech codecs that the practical bitrate of speech frame is identified only by speech decoder at receiving end. Based on this characteristic, two steganography strategies called bitrate downgrading(BD) and bitrate switching(BS)are provided. The first strategy substitutes high bit-rate speech frames with lower ones to embed secret message, which introduces very low distortion in practice, and much less than other bits-modification based methods with the same embedding capacity. The second one encodes secret message bits into different types of speech frames, which is an alternative choice for supplement. The two strategies are implemented and tested on our covert communication system Steg Vo IP. The experiment results show that our proposed method is effective and fulfills the real-time requirement of Vo IP communication.展开更多
This paper presents a macroblock-level (MB-level) decoding and deblocking method for supporting the flexible macroblock ordering (FMO) and arbitrary slice ordering (ASO) bit streams in H.264 decoder and its SOC/ASIC i...This paper presents a macroblock-level (MB-level) decoding and deblocking method for supporting the flexible macroblock ordering (FMO) and arbitrary slice ordering (ASO) bit streams in H.264 decoder and its SOC/ASIC implementation. By searching the slice containing the current macroblock in the bit stream and switching slices correctly, MBs can be decoded in the raster scan order, while the decoding process can immediately begin as long as the slice containing the current MB is available. This architectural modification enables the MB-level decoding and deblocking 3-stage pipeline, and saves about 20% of SDRAM bandwidth. Implementation results showed that the design achieves real-time decoding of 1080HD (1920×1088@30 fps) at a system clock of 166 MHz.展开更多
In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC...In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.展开更多
The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object detection.I...The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object detection.Inspired by the great progress of Transformer,we propose a novel general and robust voxel feature encoder for 3D object detection based on the traditional Transformer.We first investigate the permutation invariance of sequence data of the self-attention and apply it to point cloud processing.Then we construct a voxel feature layer based on the self-attention to adaptively learn local and robust context of a voxel according to the spatial relationship and context information exchanging between all points within the voxel.Lastly,we construct a general voxel feature learning framework with the voxel feature layer as the core for 3D object detection.The voxel feature with Transformer(VFT)can be plugged into any other voxel-based 3D object detection framework easily,and serves as the backbone for voxel feature extractor.Experiments results on the KITTI dataset demonstrate that our method achieves the state-of-the-art performance on 3D object detection.展开更多
This paper presents an equalization algorithm for continuous phase modulation (CPM) over frequency-selective channels. A specific training sequence is first embedded in each data packet. By recursive least-squares ...This paper presents an equalization algorithm for continuous phase modulation (CPM) over frequency-selective channels. A specific training sequence is first embedded in each data packet. By recursive least-squares (RLS) estimation, the channel information parameters can be acquired, and a fractionally Simulation results show that the proposed algorithm can acquire the spaced equalizer performs joint decoding and equalization. channel information parameters rapidly and accurately, and that the fractionally spaced equalizer can eliminate the intersymbol interference (ISI) effectively, and is not sensitive to timing inaccuracy, so this algorithm can be exploited for demodulation system in burst mode.展开更多
An adaptive pipelining scheme for H.264/AVC context-based adaptive binary arithmetic coding(CABAC) decoder for high definition(HD) applications is proposed to solve data hazard problems coming from the data dependenci...An adaptive pipelining scheme for H.264/AVC context-based adaptive binary arithmetic coding(CABAC) decoder for high definition(HD) applications is proposed to solve data hazard problems coming from the data dependencies in CABAC decoding process.An efficiency model of CABAC decoding pipeline is derived according to the analysis of a common pipeline.Based on that,several adaptive strategies are provided.The pipelining scheme with these strategies can be adaptive to different types of syntax elements(SEs) and the pipeline will not stall during decoding process when these strategies are adopted.In addition,the decoder proposed can fully support H.264/AVC high4:2:2 profile and the experimental results show that the efficiency of decoder is much higher than other architectures with one engine.Taking both performance and cost into consideration,our design makes a good tradeoff compared with other work and it is sufficient for HD real-time decoding.展开更多
In this paper,based on the field-programmable gate array(FPGA)xc5vlx220 of Xilinx Company,the FPGA verification method for application specific integrated circuit(ASIC)design is introduced.Firstly,the basic principles...In this paper,based on the field-programmable gate array(FPGA)xc5vlx220 of Xilinx Company,the FPGA verification method for application specific integrated circuit(ASIC)design is introduced.Firstly,the basic principles of FPGA verification are introduced.Then,the structure of the FPGA board and the verification methods are illustrated.Finally,the workflow of FPGA verification for audio video coding standard(AVS)decoder and the method of restoring images are introduced in detail.The FPGA resources occupancy is shown and analyzed.The result shows that FPGA can verify the ASIC rapidly and effectively so as to shorten the development cycle.展开更多
The growing number of mobile users, as well as the diversification in types of services have resulted in increasing demands for wireless network bandwidth in recent years. Although evolving transmission techniques are...The growing number of mobile users, as well as the diversification in types of services have resulted in increasing demands for wireless network bandwidth in recent years. Although evolving transmission techniques are able to enlarge the network capacity to some degree, they still cannot satisfy the requirements of mobile users. Meanwhile, following Moore's Law, the data processing capabilities of mobile user terminals are continuously improving. In this paper, we explore possible methods of trading strong computational power at wireless terminals for transmission efficiency of communications. Taking the specific scenario of wireless video conversation, we propose a model-based video coding scheme by learning the structures in multimedia contents. Benefiting from both strong computing capability and pre-learned model priors, only low-dimensional parameters need to be transmitted; and the intact multimedia contents can also be reconstructed at the receivers in real-time. Experiment results indicate that, compared to conventional video codecs, the proposed scheme significantly reduces the data rate with the aid of computational capability at wireless terminals.展开更多
In order to improve the efficiency of embedded software running on processor core, this paper proposes a hard-ware/software co-optimization approach for embedded software from the system point of view. The proposed st...In order to improve the efficiency of embedded software running on processor core, this paper proposes a hard-ware/software co-optimization approach for embedded software from the system point of view. The proposed stepwise methods aim at exploiting the structure and the resources of the processor as much as possible for software algorithm optimization. To achieve low memory usage and low frequency need for the same performance, this co-optimization approach was used to optimize embedded software of MP3 decoder based on a 16-bit fixed-point DSP core. After the optimization, the results of decoding 128 kbps, 44.1 kHz stereo MP3 on DSP evaluation platform need 45.9 MIPS and 20.4 kbytes memory space. The optimization rate achieves 65.6% for memory and 49.6% for frequency respectively compared with the results by compiler using floating-point computation. The experimental result indicates the availability of the hardware/software co-optimization approach depending on the algorithm and architecture.展开更多
To compress screen image sequence in real-time remote and interactive applications,a novel compression method is proposed.The proposed method is named as CABHG.CABHG employs hybrid coding schemes that consist of intra...To compress screen image sequence in real-time remote and interactive applications,a novel compression method is proposed.The proposed method is named as CABHG.CABHG employs hybrid coding schemes that consist of intra-frame and inter-frame coding modes.The intra-frame coding is a rate-distortion optimized adaptive block size that can be also used for the compression of a single screen image.The inter-frame coding utilizes hierarchical group of pictures(GOP) structure to improve system performance during random accesses and fast-backward scans.Experimental results demonstrate that the proposed CABHG method has approximately 47%-48% higher compression ratio and 46%-53% lower CPU utilization than professional screen image sequence codecs such as TechSmith Ensharpen codec and Sorenson 3 codec.Compared with general video codecs such as H.264 codec,XviD MPEG-4 codec and Apple's Animation codec,CABHG also shows 87%-88% higher compression ratio and 64%-81% lower CPU utilization than these general video codecs.展开更多
文摘A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.
文摘A novel asynchronous ACS(add-compare-select) processor for Viterbi decoder is described.It is controlled by local handshake signals instead of the globe clock.The circuits of asynchronous adder unit,asynchronous comparator unit,and asynchronous selector unit are proposed.A full-custom design of asynchronous 4-bit ACS processor is fabricated in CSMC-HJ 0.6μm CMOS 2P2M mixed-mode process.At a supply voltage of 5V,when it operates at 20MHz,the power consumption is 75.5mW.The processor has no dynamic power consumption when it awaits an opportunity in sleep mode.The results of performance test of asynchronous 4-bit ACS processor show that the average case response time 19.18ns is only 82% of the worst-case response time 23.37ns.Compared with the synchronous 4-bit ACS processor in power consumption and performance by simulation,it reveals that the asynchronous ACS processor has some advantages than the synchronous one.
文摘The enhanced variable rate codec (EVRC) is a standard for the 'Speech ServiceOption 3 for Wideband Spread Spectrum Digital System,' which has been employed in both IS-95cellular systems and ANSI J-STC-008 PCS (personal communications systems). This paper concentrateson channel decoders that exploit the residual redundancy inherent in the enhanced variable ratecodec bitstream. This residual redundancy is quantified by modeling the parameters as first orderMarkov chains and computing the entropy rate based on the relative frequencies of transitions.Moreover, this residual redundancy can be exploited by an appropriately 'tuned' channel decoder toprovide substantial coding gain when compared with the decoders that do not exploit it. Channelcoding schemes include convolutional codes, and iteratively decoded parallel concatenatedconvolutional 'turbo' codes.
基金the National Natural Science Foundation of China,the National Key Basic Research Program of China,The authors would like to thank all project partners for their valuable contributions and feedbacks
文摘Abstract: The layered decoding algorithm has been widely used in the implementation of Low Density Parity Check (LDPC) decoders, due to its high convergence speed. However, the pipeline operation of the layered decoder may introduce memory access conflicts, which heavily deteriorates the decoder throughput. To essentially deal with the issue of memory access conflicts,
基金supported in part by National Natural Science Foundation of China(61671324)the Director’s Funding from Qingdao National Laboratory for Marine Science and Technology
文摘In the Davey-MacKay(DM) construction,the inner decoder treats unknown transmitted bits as random independent substitution errors. It limits the synchronization capability of the inner decoder, and thus weakens the error-correcting capability of the DM construction.In order to improve the performance of the DM construction, an iterative decoding scheme is proposed, which iteratively utilizes the more accurate estimates of transmitted codewords. In the proposed scheme, the estimated average bit error rates and the estimated low-density parity-check(LDPC) codewords from the outer decoder are fed back into the inner decoder to update the synchronization process. Simulation results show that the proposed iterative decoding scheme significantly outperforms the traditional DM construction.
基金The authors would like to thank National Natural Science Foundation of China (No. 61072059).
文摘In orthogonal frequency division multiplexing (OFDM) based multihop communications, the conventional decodeand-forward (DF) relay scheme severely suffers from the error propagation problem. This drawback is serious in multihop networks as errors made by any relay node may fail the decoder at the destination in great chance. In this paper, we propose a bit error rate (BER) modified DF protocol (BMDF) which can be applied to systems where error correction channel coding and M-ary modulation are used. By modeling all links except the last one as a binary symmetric channel (BSC), we derive a log likelihood ratio (LLR) modification function relying only on the accumulated BER of all previous links to be applied to the output of the soft demapper. Furthermore, to reduce the computational complexity and signaling overhead, the modification function is simplified from its original exponential expression and less BERs are delivered between nodes by making successive subcarriers share the same BER. In addition, for situations where the channel state information (CSI) of forward link is available, the proposed BMDF can be further enhanced by combining with subcarrier pairing (SP) and power allocation (PA), where a sorted-channel gain SP scheme and a greedy PA algorithm are proposed. The simulation results verify thesignificant performance improvement to the conventional DF.
基金jointly supported by the National Nature Science Foundation of China under Grant No.61370040 and 61006018the Fundamental Research Funds for the Central Universities+1 种基金the Priority Academic Program Development of Jiangsu Higher Education InstitutionsOpen Project of State Key Laboratory of ASIC & System(Fudan University)12KF006
文摘Polar codes have become increasingly popular recently because of their capacity achieving property.In this paper,a memory efficient stage-combined belief propagation(BP) decoder design for polar codes is presented.Firstly,we briefly reviewed the conventional BP decoding algorithm.Then a stage-combined BP decoding algorithm which combines two adjacent stages into one stage and the corresponding belief message updating rules are introduced.Based on this stage-combined decoding algorithm,a memory-efficient polar BP decoder is designed.The demonstrated decoder design achieves 50%memory and decoding latency reduction in the cost of some combinational logic complexity overhead.The proposed decoder is synthesized under TSMC 45 nm Low Power CMOS technology.It achieves 0.96 Gb/s throughput with 14.2mm^2 area when code length N=2^(16)which reduces 51.5%decoder area compared with the conventional decoder design.
基金supported in part by National Natural Science Foundation of China (61101114, 61671324) the Program for New Century Excellent Talents in University (NCET-12-0401)
文摘Forward-backward algorithm, used by watermark decoder for correcting non-binary synchronization errors, requires to traverse a very large scale trellis in order to achieve the proper posterior probability, leading to high computational complexity. In order to reduce the number of the states involved in the computation, an adaptive pruning method for the trellis is proposed. In this scheme, we prune the states which have the low forward-backward quantities below a carefully-chosen threshold. Thus, a wandering trellis with much less states is achieved, which contains most of the states with quite high probability. Simulation results reveal that, with the proper scaling factor, significant complexity reduction in the forward-backward algorithm is achieved at the expense of slight performance degradation.
文摘Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio (SR) systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element (SPE) of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform.
基金Project(2011CB302305)supported by National Basic Research Program(973 Program)of ChinaProjects(61232004,61302094)supported by National Natural Science Foundation of China+2 种基金Project(ZQN-PY115)supported by Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University,ChinaProject(JA13012)supported by Education Science Research Program for Young and Middle-aged Teacher of Fujian Province of ChinaProject(2014J01238)supported by Natural Science Foundation of Fujian Province of China
文摘Steganography based on bits-modification of speech frames is a kind of commonly used method, which targets at RTP payloads and offers covert communications over voice-over-IP(Vo IP). However, direct modification on frames is often independent of the inherent speech features, which may lead to great degradation of speech quality. A novel frame-bitrate-change based steganography is proposed in this work, which discovers a novel covert channel for Vo IP and introduces less distortion. This method exploits the feature of multi-rate speech codecs that the practical bitrate of speech frame is identified only by speech decoder at receiving end. Based on this characteristic, two steganography strategies called bitrate downgrading(BD) and bitrate switching(BS)are provided. The first strategy substitutes high bit-rate speech frames with lower ones to embed secret message, which introduces very low distortion in practice, and much less than other bits-modification based methods with the same embedding capacity. The second one encodes secret message bits into different types of speech frames, which is an alternative choice for supplement. The two strategies are implemented and tested on our covert communication system Steg Vo IP. The experiment results show that our proposed method is effective and fulfills the real-time requirement of Vo IP communication.
基金Project (No. 2002AA1Z1190) supported by the National Hi-Tech Research and Development Program (863) of China
文摘This paper presents a macroblock-level (MB-level) decoding and deblocking method for supporting the flexible macroblock ordering (FMO) and arbitrary slice ordering (ASO) bit streams in H.264 decoder and its SOC/ASIC implementation. By searching the slice containing the current macroblock in the bit stream and switching slices correctly, MBs can be decoded in the raster scan order, while the decoding process can immediately begin as long as the slice containing the current MB is available. This architectural modification enables the MB-level decoding and deblocking 3-stage pipeline, and saves about 20% of SDRAM bandwidth. Implementation results showed that the design achieves real-time decoding of 1080HD (1920×1088@30 fps) at a system clock of 166 MHz.
文摘In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.
基金National Natural Science Foundation of China(No.61806006)Innovation Program for Graduate of Jiangsu Province(No.KYLX160-781)University Superior Discipline Construction Project of Jiangsu Province。
文摘The self-attention networks and Transformer have dominated machine translation and natural language processing fields,and shown great potential in image vision tasks such as image classification and object detection.Inspired by the great progress of Transformer,we propose a novel general and robust voxel feature encoder for 3D object detection based on the traditional Transformer.We first investigate the permutation invariance of sequence data of the self-attention and apply it to point cloud processing.Then we construct a voxel feature layer based on the self-attention to adaptively learn local and robust context of a voxel according to the spatial relationship and context information exchanging between all points within the voxel.Lastly,we construct a general voxel feature learning framework with the voxel feature layer as the core for 3D object detection.The voxel feature with Transformer(VFT)can be plugged into any other voxel-based 3D object detection framework easily,and serves as the backbone for voxel feature extractor.Experiments results on the KITTI dataset demonstrate that our method achieves the state-of-the-art performance on 3D object detection.
文摘This paper presents an equalization algorithm for continuous phase modulation (CPM) over frequency-selective channels. A specific training sequence is first embedded in each data packet. By recursive least-squares (RLS) estimation, the channel information parameters can be acquired, and a fractionally Simulation results show that the proposed algorithm can acquire the spaced equalizer performs joint decoding and equalization. channel information parameters rapidly and accurately, and that the fractionally spaced equalizer can eliminate the intersymbol interference (ISI) effectively, and is not sensitive to timing inaccuracy, so this algorithm can be exploited for demodulation system in burst mode.
基金Supported by the National Natural Science Foundation of China(No.61076021)the National Basic Research Program of China(No.2009CB320903)China Postdoctoral Science Foundation(No.2012M511364)
文摘An adaptive pipelining scheme for H.264/AVC context-based adaptive binary arithmetic coding(CABAC) decoder for high definition(HD) applications is proposed to solve data hazard problems coming from the data dependencies in CABAC decoding process.An efficiency model of CABAC decoding pipeline is derived according to the analysis of a common pipeline.Based on that,several adaptive strategies are provided.The pipelining scheme with these strategies can be adaptive to different types of syntax elements(SEs) and the pipeline will not stall during decoding process when these strategies are adopted.In addition,the decoder proposed can fully support H.264/AVC high4:2:2 profile and the experimental results show that the efficiency of decoder is much higher than other architectures with one engine.Taking both performance and cost into consideration,our design makes a good tradeoff compared with other work and it is sufficient for HD real-time decoding.
基金Science and Technology Key Project of Guangzhou(2007Z3-D3101)Production and Research Project of Zhuhai(PC20082002)Technology Innovation Project of Guangdong Province(2008778113)
文摘In this paper,based on the field-programmable gate array(FPGA)xc5vlx220 of Xilinx Company,the FPGA verification method for application specific integrated circuit(ASIC)design is introduced.Firstly,the basic principles of FPGA verification are introduced.Then,the structure of the FPGA board and the verification methods are illustrated.Finally,the workflow of FPGA verification for audio video coding standard(AVS)decoder and the method of restoring images are introduced in detail.The FPGA resources occupancy is shown and analyzed.The result shows that FPGA can verify the ASIC rapidly and effectively so as to shorten the development cycle.
基金supported by the National Basic Research Project of China (973) (2013CB329006)National Natural Science Foundation of China (NSFC, 61101071,61471220, 61021001)Tsinghua University Initiative Scientific Research Program
文摘The growing number of mobile users, as well as the diversification in types of services have resulted in increasing demands for wireless network bandwidth in recent years. Although evolving transmission techniques are able to enlarge the network capacity to some degree, they still cannot satisfy the requirements of mobile users. Meanwhile, following Moore's Law, the data processing capabilities of mobile user terminals are continuously improving. In this paper, we explore possible methods of trading strong computational power at wireless terminals for transmission efficiency of communications. Taking the specific scenario of wireless video conversation, we propose a model-based video coding scheme by learning the structures in multimedia contents. Benefiting from both strong computing capability and pre-learned model priors, only low-dimensional parameters need to be transmitted; and the intact multimedia contents can also be reconstructed at the receivers in real-time. Experiment results indicate that, compared to conventional video codecs, the proposed scheme significantly reduces the data rate with the aid of computational capability at wireless terminals.
基金Project supported by the Key-Tech Program of Zhejiang Province,China (No. 021101559), and the Fok Ying Tong Education Founda-tion (No. 94031), China
文摘In order to improve the efficiency of embedded software running on processor core, this paper proposes a hard-ware/software co-optimization approach for embedded software from the system point of view. The proposed stepwise methods aim at exploiting the structure and the resources of the processor as much as possible for software algorithm optimization. To achieve low memory usage and low frequency need for the same performance, this co-optimization approach was used to optimize embedded software of MP3 decoder based on a 16-bit fixed-point DSP core. After the optimization, the results of decoding 128 kbps, 44.1 kHz stereo MP3 on DSP evaluation platform need 45.9 MIPS and 20.4 kbytes memory space. The optimization rate achieves 65.6% for memory and 49.6% for frequency respectively compared with the results by compiler using floating-point computation. The experimental result indicates the availability of the hardware/software co-optimization approach depending on the algorithm and architecture.
基金Project(60873230) supported by the National Natural Science Foundation of China
文摘To compress screen image sequence in real-time remote and interactive applications,a novel compression method is proposed.The proposed method is named as CABHG.CABHG employs hybrid coding schemes that consist of intra-frame and inter-frame coding modes.The intra-frame coding is a rate-distortion optimized adaptive block size that can be also used for the compression of a single screen image.The inter-frame coding utilizes hierarchical group of pictures(GOP) structure to improve system performance during random accesses and fast-backward scans.Experimental results demonstrate that the proposed CABHG method has approximately 47%-48% higher compression ratio and 46%-53% lower CPU utilization than professional screen image sequence codecs such as TechSmith Ensharpen codec and Sorenson 3 codec.Compared with general video codecs such as H.264 codec,XviD MPEG-4 codec and Apple's Animation codec,CABHG also shows 87%-88% higher compression ratio and 64%-81% lower CPU utilization than these general video codecs.