在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead bran...在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead branch prediction architecture,ABPA).ABPA为流水线前端取指部件提供简单的分支预测表,以实现快速分支预测;复杂的预测算法和较大的存储结构均被移至流水线后端实现,从而保证了分支预测精度.对于一直难以准确预测的多目标间接分支指令,设计提出基于分支历史和目标路径的间接分支预测算法(indirect branch prediction algorithm based on branch history and target path,BHTP algorithm).提前分支预测算法采用改进的高精度分支预测算法和BHTP算法的混合.嵌入提前分支预测算法的分支预测引擎实现流水线后端的分支推测和目标预测,以及流水线前端的分支预测表更新.实验结果表明:采用ABPA结构和BHTP算法的分支预测系统平均精度达到94.27%.设计不仅实现了快速、高精度分支预测,更为分支预测的深入研究提供了条件.展开更多
Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-b...Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-branch pre- diction has become a performance bottleneck, especially for the applications written in object-oriented languages. Previous hardware-based indirect-branch predictors are generally inefficient, for they either require significant hardware storage or predict indirect-branch targets slowly. In this paper, we propose an energy-efficient indirect-branch prediction technique called TAP (target address pointer) prediction. Its key idea includes two parts: utilizing specific hardware pointers to accelerate the indirect branch prediction flow and reusing the existing processor components to reduce additional hardware costs and power consumption. When fetching an indirect branch, TAP prediction first gets the specific pointers called target address pointers from the conditional branch predictor, and then uses such pointers to generate virtual addresses which index the indirect-branch targets. This technique spends similar time compared to the dedicated storage techniques without requiring additional large amounts of storage. Our evaluation shows that TAP prediction with some representative state-of-the-art branch predictors can improve performance significantly over the baseline processor. Compared with those hardware-based indirect-branch predictors, the TAP-Perceptron scheme achieves performance improvement equivalent to that provided by an 8 K-entry TTC predictor, and also outperforms the VPC predictor.展开更多
In theory, branch predictors with more compli- cated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be imp...In theory, branch predictors with more compli- cated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional pre- dictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, oper- ations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in AZBE Experiments with the standard performance evaluation corpora- tion (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that AzBP improves average performance by 2.92% compared with a commonly used branch target bufferbased predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% com- pared with the traditional algorithm.展开更多
Predicting indirect-branch targets has become a performance bottleneck for many applications. Previous high- performance indirect-branch predictors usually require significant hardware storage or additional compiler s...Predicting indirect-branch targets has become a performance bottleneck for many applications. Previous high- performance indirect-branch predictors usually require significant hardware storage or additional compiler support, which increases the complexity of the processor front-end or the compilers. This paper proposes a complexity-effective indirect- branch prediction mechanism, called the Set-Way Index Pointing (SWIP) prediction. It stores multiple indirect-branch targets in different branch target buffer (BTB) entries, whose set indices and way locations are treated as set-way index pointers. These pointers are stored in the existing branch-direction predictor. SWIP prediction reuses the branch direction predictor to provide such pointers, and then accesses the pointed BTB entries for the predicted indirect-branch target. Our evaluation shows that SWIP prediction could achieve attractive performance improvement without requiring large dedicated storage or additional compiler support. It improves the indirect-branch prediction accuracy by 36.5% compared to that of a commonly-used BTB, resulting in average performance improvement of 18.56%. Its energy consumption is also reduced by 14.34% over that of the baseline.展开更多
文摘在理论上,越来越复杂的分支预测算法和更大的存储结构会使分支预测精度不断提高,但当前复杂算法和庞大数据结构所引发的分支预测时延已无法满足流水线单周期运行要求.针对分支预测精度和时延的矛盾,设计提出提前分支预测结构(ahead branch prediction architecture,ABPA).ABPA为流水线前端取指部件提供简单的分支预测表,以实现快速分支预测;复杂的预测算法和较大的存储结构均被移至流水线后端实现,从而保证了分支预测精度.对于一直难以准确预测的多目标间接分支指令,设计提出基于分支历史和目标路径的间接分支预测算法(indirect branch prediction algorithm based on branch history and target path,BHTP algorithm).提前分支预测算法采用改进的高精度分支预测算法和BHTP算法的混合.嵌入提前分支预测算法的分支预测引擎实现流水线后端的分支推测和目标预测,以及流水线前端的分支预测表更新.实验结果表明:采用ABPA结构和BHTP算法的分支预测系统平均精度达到94.27%.设计不仅实现了快速、高精度分支预测,更为分支预测的深入研究提供了条件.
文摘Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-branch pre- diction has become a performance bottleneck, especially for the applications written in object-oriented languages. Previous hardware-based indirect-branch predictors are generally inefficient, for they either require significant hardware storage or predict indirect-branch targets slowly. In this paper, we propose an energy-efficient indirect-branch prediction technique called TAP (target address pointer) prediction. Its key idea includes two parts: utilizing specific hardware pointers to accelerate the indirect branch prediction flow and reusing the existing processor components to reduce additional hardware costs and power consumption. When fetching an indirect branch, TAP prediction first gets the specific pointers called target address pointers from the conditional branch predictor, and then uses such pointers to generate virtual addresses which index the indirect-branch targets. This technique spends similar time compared to the dedicated storage techniques without requiring additional large amounts of storage. Our evaluation shows that TAP prediction with some representative state-of-the-art branch predictors can improve performance significantly over the baseline processor. Compared with those hardware-based indirect-branch predictors, the TAP-Perceptron scheme achieves performance improvement equivalent to that provided by an 8 K-entry TTC predictor, and also outperforms the VPC predictor.
文摘In theory, branch predictors with more compli- cated algorithms and larger data structures provide more accurate predictions. Unfortunately, overly large structures and excessively complicated algorithms cannot be implemented because of their long access delay. To date, many strategies have been proposed to balance delay with accuracy, but none has completely solved the issue. The architecture for ahead branch prediction (A2BP) separates traditional pre- dictors into two parts. First is a small table located at the front-end of the pipeline, which makes the prediction brief enough even for some aggressive processors. Second, oper- ations on complicated algorithms and large data structures for accurate predictions are all moved to the back-end of the pipeline. An effective mechanism is introduced for ahead branch prediction in the back-end and small table update in the front. To substantially improve prediction accuracy, an indirect branch prediction algorithm based on branch history and target path (BHTP) is implemented in AZBE Experiments with the standard performance evaluation corpora- tion (SPEC) benchmarks on gem5/SimpleScalar simulators demonstrate that AzBP improves average performance by 2.92% compared with a commonly used branch target bufferbased predictor. In addition, indirect branch misses with the BHTP algorithm are reduced by an average of 28.98% com- pared with the traditional algorithm.
基金supported by the "HGJ" National Science and Technology Major Project of China under Grant No. 2009ZX01029-001-002
文摘Predicting indirect-branch targets has become a performance bottleneck for many applications. Previous high- performance indirect-branch predictors usually require significant hardware storage or additional compiler support, which increases the complexity of the processor front-end or the compilers. This paper proposes a complexity-effective indirect- branch prediction mechanism, called the Set-Way Index Pointing (SWIP) prediction. It stores multiple indirect-branch targets in different branch target buffer (BTB) entries, whose set indices and way locations are treated as set-way index pointers. These pointers are stored in the existing branch-direction predictor. SWIP prediction reuses the branch direction predictor to provide such pointers, and then accesses the pointed BTB entries for the predicted indirect-branch target. Our evaluation shows that SWIP prediction could achieve attractive performance improvement without requiring large dedicated storage or additional compiler support. It improves the indirect-branch prediction accuracy by 36.5% compared to that of a commonly-used BTB, resulting in average performance improvement of 18.56%. Its energy consumption is also reduced by 14.34% over that of the baseline.