Due to 5G's stringent and uncertainty traffic requirements,open ecosystem would be one inevitable way to develop 5G.On the other hand,GPP based mobile communication becomes appealing recently attributed to its str...Due to 5G's stringent and uncertainty traffic requirements,open ecosystem would be one inevitable way to develop 5G.On the other hand,GPP based mobile communication becomes appealing recently attributed to its striking advantage in flexibility and re-configurability.In this paper,both the advantages and challenges of GPP platform are detailed analyzed.Furthermore,both GPP based software and hardware architectures for open 5G are presented and the performances of real-time signal processing and power consumption are also evaluated.The evaluation results indicate that turbo and power consumption may be another challengeable problem should be further solved to meet the requirements of realistic deployments.展开更多
General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT),due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and m...General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT),due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and memory access,optimizing the FFT performance of a GPP also benefits the performances of many other applications.To facilitate the analysis of FFT,this paper proposes a theoretical model of the FFT processing.The model gives out a tight lower bound of the runtime of FFT on a GPP,and guides the architecture optimization for GPP as well.Based on the model,two theorems on optimization of architecture parameters are deduced,which refer to the lower bounds of register number and memory bandwidth.Experimental results on different processor architectures (including Intel Core i7 and Godson-3B) validate the performance model.The above investigations were adopted in the development of Godson-3B,which is an industrial GPP.The optimization techniques deduced from our performance model improve the FFT performance by about 40%,while incurring only 0.8% additional area cost.Consequently,Godson-3B solves the 1024-point single-precision complex FFT in 0.368 μs with about 40 Watt power consumption,and has the highest performance-per-watt in complex FFT among processors as far as we know.This work could benefit optimization of other GPPs as well.展开更多
The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-...The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual- interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS.展开更多
基金funded in part by National Natural Science Foundation of China(grant NO.61471347)National S&T Mayor Project of the Ministry of S&T of China(grant NO.2016ZX03001020-003)+1 种基金key program for international S&T Cooperation Program of China(grant NO.2014DFA11640)Shanghai Natural Science Foundation(grant NO.16ZR1435100)
文摘Due to 5G's stringent and uncertainty traffic requirements,open ecosystem would be one inevitable way to develop 5G.On the other hand,GPP based mobile communication becomes appealing recently attributed to its striking advantage in flexibility and re-configurability.In this paper,both the advantages and challenges of GPP platform are detailed analyzed.Furthermore,both GPP based software and hardware architectures for open 5G are presented and the performances of real-time signal processing and power consumption are also evaluated.The evaluation results indicate that turbo and power consumption may be another challengeable problem should be further solved to meet the requirements of realistic deployments.
基金supported by the National Science and Technology Major Project under Grant Nos.2009ZX01028-002-003,2009ZX01029-001-003,2010ZX01036-001-002the National Natural Science Foundation of China under Grant Nos.61050002,61003064,60921002
文摘General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT),due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and memory access,optimizing the FFT performance of a GPP also benefits the performances of many other applications.To facilitate the analysis of FFT,this paper proposes a theoretical model of the FFT processing.The model gives out a tight lower bound of the runtime of FFT on a GPP,and guides the architecture optimization for GPP as well.Based on the model,two theorems on optimization of architecture parameters are deduced,which refer to the lower bounds of register number and memory bandwidth.Experimental results on different processor architectures (including Intel Core i7 and Godson-3B) validate the performance model.The above investigations were adopted in the development of Godson-3B,which is an industrial GPP.The optimization techniques deduced from our performance model improve the FFT performance by about 40%,while incurring only 0.8% additional area cost.Consequently,Godson-3B solves the 1024-point single-precision complex FFT in 0.368 μs with about 40 Watt power consumption,and has the highest performance-per-watt in complex FFT among processors as far as we know.This work could benefit optimization of other GPPs as well.
基金Supported by the National Natural Science Foundation of China under Grant Nos.61173006,60921002the National BasicResearch 973 Program of China under Grant No.2011CB302503the Strategic Priority Research Program of the Chinese Academyof Sciences under Grant No.XDA06010403
文摘The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual- interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS.