To solve the problem of grid coarse-grained reconfigurable array task mapping under multiple constraints,we propose a Loop Subgraph-Level Greedy Mapping(LSLGM)algorithm using parallelism and processing element fragmen...To solve the problem of grid coarse-grained reconfigurable array task mapping under multiple constraints,we propose a Loop Subgraph-Level Greedy Mapping(LSLGM)algorithm using parallelism and processing element fragmentation.Under the constraint of a reconfigurable array,the LSLGM algorithm schedules node from a ready queue to the current reconfigurable cell array block.After mapping a node,its successor’s indegree value will be dynamically updated.If its successor’s indegree is zero,it will be directly scheduled to the ready queue;otherwise,the predecessor must be dynamically checked.If the predecessor cannot be mapped,it will be scheduled to a blocking queue.To dynamically adjust the ready node scheduling order,the scheduling function is constructed by exploiting factors,such as node number,node level,and node dependency.Compared with the loop subgraph-level mapping algorithm,experimental results show that the total cycles of the LSLGM algorithm decreases by an average of 33.0%(PEA44)and 33.9%(PEA_(7×7)).Compared with the epimorphism map algorithm,the total cycles of the LSLGM algorithm decrease by an average of 38.1%(PEA_(4×4))and 39.0%(PEA_(7×7)).The feasibility of LSLGM is verified.展开更多
To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC alg...To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC algorithm.The data-level parallelism is improved by instructions to dynamically configure the multi-core computing units.Simultaneously,an intelligent adjustment strategy based on a programmable wake-up controller(WuC)is designed so that the computing mode,operating voltage,and frequency of the QC-LDPC algorithm can be adjusted.This adjustment can improve the computing efficiency of the processor.The QC-LDPC processors are verified on the Xilinx ZCU102 field programmable gate array(FPGA)board and the computing efficiency is measured.The experimental results indicate that the QC-LDPC processor can support two encoding lengths of three typical QC-LDPC algorithms and 20 adaptive operating modes of operating voltage and frequency.The maximum efficiency can reach up to 12.18 Gbit/(s·W),which is more flexible than existing state-of-the-art processors for QC-LDPC.展开更多
基金This research was supported by the Natural Science Foundation of Anhui Province(No.1808085MF203)the Natural Science Foundation of China(Nos.61972438 and 61432017).
文摘To solve the problem of grid coarse-grained reconfigurable array task mapping under multiple constraints,we propose a Loop Subgraph-Level Greedy Mapping(LSLGM)algorithm using parallelism and processing element fragmentation.Under the constraint of a reconfigurable array,the LSLGM algorithm schedules node from a ready queue to the current reconfigurable cell array block.After mapping a node,its successor’s indegree value will be dynamically updated.If its successor’s indegree is zero,it will be directly scheduled to the ready queue;otherwise,the predecessor must be dynamically checked.If the predecessor cannot be mapped,it will be scheduled to a blocking queue.To dynamically adjust the ready node scheduling order,the scheduling function is constructed by exploiting factors,such as node number,node level,and node dependency.Compared with the loop subgraph-level mapping algorithm,experimental results show that the total cycles of the LSLGM algorithm decreases by an average of 33.0%(PEA44)and 33.9%(PEA_(7×7)).Compared with the epimorphism map algorithm,the total cycles of the LSLGM algorithm decrease by an average of 38.1%(PEA_(4×4))and 39.0%(PEA_(7×7)).The feasibility of LSLGM is verified.
基金the National Key Research and Development Program of China(2019YFB1803600)the Key Scientific Research Program of Shaanxi Provincial Department of Education(22JY059)the China Civil Aviation Airworthiness Center Open Foundation(SH2021111903)。
文摘To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC algorithm.The data-level parallelism is improved by instructions to dynamically configure the multi-core computing units.Simultaneously,an intelligent adjustment strategy based on a programmable wake-up controller(WuC)is designed so that the computing mode,operating voltage,and frequency of the QC-LDPC algorithm can be adjusted.This adjustment can improve the computing efficiency of the processor.The QC-LDPC processors are verified on the Xilinx ZCU102 field programmable gate array(FPGA)board and the computing efficiency is measured.The experimental results indicate that the QC-LDPC processor can support two encoding lengths of three typical QC-LDPC algorithms and 20 adaptive operating modes of operating voltage and frequency.The maximum efficiency can reach up to 12.18 Gbit/(s·W),which is more flexible than existing state-of-the-art processors for QC-LDPC.