After the extension of depth modeling mode 4(DMM-4)in 3D high efficiency video coding(3D-HEVC),the computational complexity increases sharply,which causes the real-time performance of video coding to be impacted.To re...After the extension of depth modeling mode 4(DMM-4)in 3D high efficiency video coding(3D-HEVC),the computational complexity increases sharply,which causes the real-time performance of video coding to be impacted.To reduce the computational complexity of DMM-4,a simplified hardware-friendly contour prediction algorithm is proposed in this paper.Based on the similarity between texture and depth map,the proposed algorithm directly codes depth blocks to calculate edge regions to reduce the number of reference blocks.Through the verification of the test sequence on HTM16.1,the proposed algorithm coding time is reduced by 9.42%compared with the original algorithm.To avoid the time consuming of serial coding on HTM,a parallelization design of the proposed algorithm based on reconfigurable array processor(DPR-CODEC)is proposed.The parallelization design reduces the storage access time,configuration time and saves the storage cost.Verified with the Xilinx Virtex 6 FPGA,experimental results show that parallelization design is capable of processing HD 1080p at a speed above 30 frames per second.Compared with the related work,the scheme reduces the LUTs by 42.3%,the REG by 85.5%and the hardware resources by 66.7%.The data loading speedup ratio of parallel scheme can reach 3.4539.On average,the different sized templates serial/parallel speedup ratio of encoding time can reach 2.446.展开更多
Object recognition in very high-resolution remote sensing images is a basic problem in the field of aerial and satellite image analysis.With the development of sensor technology and aerospace remote sensing technology...Object recognition in very high-resolution remote sensing images is a basic problem in the field of aerial and satellite image analysis.With the development of sensor technology and aerospace remote sensing technology,the quality and quantity of remote sensing images are improved.Traditional recognition methods have a certain limitation in describing higher-level features,but object recognition method based on convolutional neural network(CNN)can not only deal with large scale images,but also train features automatically with high efficiency.It is mainly used on object recognition for remote sensing images.In this paper,an AlexNet CNN model is trained using 2100 remote sensing images,and correction rate can reach 97.6%after 2000 iterations.Then based on trained model,a parallel design of CNN for remote sensing images object recognition based on data-driven array processor(DDAP)is proposed.The consuming cycles are counted.Simultaneously,the proposed architecture is realized on Xilinx V6 development board,and synthesized based on SMIC 130 nm complementary metal oxid semiconductor(CMOS)technology.The experimental results show that the proposed architecture has a certain degree of parallelism to achieve the purpose of accelerating calculations.展开更多
The utilization of computation resources and reconfiguration time has a large impact on reconfiguration system performance. In order to promote the performance, a dynamical self-reconfigurable mechanism for data-drive...The utilization of computation resources and reconfiguration time has a large impact on reconfiguration system performance. In order to promote the performance, a dynamical self-reconfigurable mechanism for data-driven cell array is proposed. Cells can be fired only when the needed data arrives, and cell array can be worked on two modes: fixed execution and reconfiguration. On reconfiguration mode, cell function and data flow direction are changed automatically at run time according to contexts. Simultaneously using an H-tree interconnection network, through pre-storing multiple application mapping contexts in reconfiguration buffer, multiple applications can execute concurrently and context switching time is the minimal. For verifying system performance, some algorithms are selected for mapping onto the proposed structure, and the amount of configuration contexts and execution time are recorded for statistical analysis. The results show that the proposed self-reconfigurable mechanism can reduce the number of contexts efficiently, and has a low computing time.展开更多
基金Supported by the National Natural Science Foundation of China(No.61834005,61772417,61802304,61602377,61874087,61634004)the Shaanxi Province Key R&D Plan(No.2020JM-525,2021GY-029,2021KW-16)。
文摘After the extension of depth modeling mode 4(DMM-4)in 3D high efficiency video coding(3D-HEVC),the computational complexity increases sharply,which causes the real-time performance of video coding to be impacted.To reduce the computational complexity of DMM-4,a simplified hardware-friendly contour prediction algorithm is proposed in this paper.Based on the similarity between texture and depth map,the proposed algorithm directly codes depth blocks to calculate edge regions to reduce the number of reference blocks.Through the verification of the test sequence on HTM16.1,the proposed algorithm coding time is reduced by 9.42%compared with the original algorithm.To avoid the time consuming of serial coding on HTM,a parallelization design of the proposed algorithm based on reconfigurable array processor(DPR-CODEC)is proposed.The parallelization design reduces the storage access time,configuration time and saves the storage cost.Verified with the Xilinx Virtex 6 FPGA,experimental results show that parallelization design is capable of processing HD 1080p at a speed above 30 frames per second.Compared with the related work,the scheme reduces the LUTs by 42.3%,the REG by 85.5%and the hardware resources by 66.7%.The data loading speedup ratio of parallel scheme can reach 3.4539.On average,the different sized templates serial/parallel speedup ratio of encoding time can reach 2.446.
基金This work was supported by the National Natural Science Foundation of China(61802304,61834005,61772417,61634004,61602377)the Shaanxi Provincial Co-ordination Innovation Project of Science and Technology(20I6KTZDGY02-04-02)the Shaanxi Provincial Key Research and Development Plan(2017GY-060).
文摘Object recognition in very high-resolution remote sensing images is a basic problem in the field of aerial and satellite image analysis.With the development of sensor technology and aerospace remote sensing technology,the quality and quantity of remote sensing images are improved.Traditional recognition methods have a certain limitation in describing higher-level features,but object recognition method based on convolutional neural network(CNN)can not only deal with large scale images,but also train features automatically with high efficiency.It is mainly used on object recognition for remote sensing images.In this paper,an AlexNet CNN model is trained using 2100 remote sensing images,and correction rate can reach 97.6%after 2000 iterations.Then based on trained model,a parallel design of CNN for remote sensing images object recognition based on data-driven array processor(DDAP)is proposed.The consuming cycles are counted.Simultaneously,the proposed architecture is realized on Xilinx V6 development board,and synthesized based on SMIC 130 nm complementary metal oxid semiconductor(CMOS)technology.The experimental results show that the proposed architecture has a certain degree of parallelism to achieve the purpose of accelerating calculations.
基金the National Natural Science Foundation of China (Nos. 61802304, 61834005, 61772417, 61634004, and 61602377)the Shaanxi Provincial Co-ordination Innovation Project of Science and Technology (No. 2016KTZDGY02-04-02)。
文摘The utilization of computation resources and reconfiguration time has a large impact on reconfiguration system performance. In order to promote the performance, a dynamical self-reconfigurable mechanism for data-driven cell array is proposed. Cells can be fired only when the needed data arrives, and cell array can be worked on two modes: fixed execution and reconfiguration. On reconfiguration mode, cell function and data flow direction are changed automatically at run time according to contexts. Simultaneously using an H-tree interconnection network, through pre-storing multiple application mapping contexts in reconfiguration buffer, multiple applications can execute concurrently and context switching time is the minimal. For verifying system performance, some algorithms are selected for mapping onto the proposed structure, and the amount of configuration contexts and execution time are recorded for statistical analysis. The results show that the proposed self-reconfigurable mechanism can reduce the number of contexts efficiently, and has a low computing time.