Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-b...Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-branch pre- diction has become a performance bottleneck, especially for the applications written in object-oriented languages. Previous hardware-based indirect-branch predictors are generally inefficient, for they either require significant hardware storage or predict indirect-branch targets slowly. In this paper, we propose an energy-efficient indirect-branch prediction technique called TAP (target address pointer) prediction. Its key idea includes two parts: utilizing specific hardware pointers to accelerate the indirect branch prediction flow and reusing the existing processor components to reduce additional hardware costs and power consumption. When fetching an indirect branch, TAP prediction first gets the specific pointers called target address pointers from the conditional branch predictor, and then uses such pointers to generate virtual addresses which index the indirect-branch targets. This technique spends similar time compared to the dedicated storage techniques without requiring additional large amounts of storage. Our evaluation shows that TAP prediction with some representative state-of-the-art branch predictors can improve performance significantly over the baseline processor. Compared with those hardware-based indirect-branch predictors, the TAP-Perceptron scheme achieves performance improvement equivalent to that provided by an 8 K-entry TTC predictor, and also outperforms the VPC predictor.展开更多
Predicting indirect-branch targets has become a performance bottleneck for many applications. Previous high- performance indirect-branch predictors usually require significant hardware storage or additional compiler s...Predicting indirect-branch targets has become a performance bottleneck for many applications. Previous high- performance indirect-branch predictors usually require significant hardware storage or additional compiler support, which increases the complexity of the processor front-end or the compilers. This paper proposes a complexity-effective indirect- branch prediction mechanism, called the Set-Way Index Pointing (SWIP) prediction. It stores multiple indirect-branch targets in different branch target buffer (BTB) entries, whose set indices and way locations are treated as set-way index pointers. These pointers are stored in the existing branch-direction predictor. SWIP prediction reuses the branch direction predictor to provide such pointers, and then accesses the pointed BTB entries for the predicted indirect-branch target. Our evaluation shows that SWIP prediction could achieve attractive performance improvement without requiring large dedicated storage or additional compiler support. It improves the indirect-branch prediction accuracy by 36.5% compared to that of a commonly-used BTB, resulting in average performance improvement of 18.56%. Its energy consumption is also reduced by 14.34% over that of the baseline.展开更多
文摘Nowadays energy-efficiency becomes the first design metric in chip development. To pursue higher energy efficiency, the processor architects should reduce or eliminate those unnecessary energy dissipations. Indirect-branch pre- diction has become a performance bottleneck, especially for the applications written in object-oriented languages. Previous hardware-based indirect-branch predictors are generally inefficient, for they either require significant hardware storage or predict indirect-branch targets slowly. In this paper, we propose an energy-efficient indirect-branch prediction technique called TAP (target address pointer) prediction. Its key idea includes two parts: utilizing specific hardware pointers to accelerate the indirect branch prediction flow and reusing the existing processor components to reduce additional hardware costs and power consumption. When fetching an indirect branch, TAP prediction first gets the specific pointers called target address pointers from the conditional branch predictor, and then uses such pointers to generate virtual addresses which index the indirect-branch targets. This technique spends similar time compared to the dedicated storage techniques without requiring additional large amounts of storage. Our evaluation shows that TAP prediction with some representative state-of-the-art branch predictors can improve performance significantly over the baseline processor. Compared with those hardware-based indirect-branch predictors, the TAP-Perceptron scheme achieves performance improvement equivalent to that provided by an 8 K-entry TTC predictor, and also outperforms the VPC predictor.
基金supported by the "HGJ" National Science and Technology Major Project of China under Grant No. 2009ZX01029-001-002
文摘Predicting indirect-branch targets has become a performance bottleneck for many applications. Previous high- performance indirect-branch predictors usually require significant hardware storage or additional compiler support, which increases the complexity of the processor front-end or the compilers. This paper proposes a complexity-effective indirect- branch prediction mechanism, called the Set-Way Index Pointing (SWIP) prediction. It stores multiple indirect-branch targets in different branch target buffer (BTB) entries, whose set indices and way locations are treated as set-way index pointers. These pointers are stored in the existing branch-direction predictor. SWIP prediction reuses the branch direction predictor to provide such pointers, and then accesses the pointed BTB entries for the predicted indirect-branch target. Our evaluation shows that SWIP prediction could achieve attractive performance improvement without requiring large dedicated storage or additional compiler support. It improves the indirect-branch prediction accuracy by 36.5% compared to that of a commonly-used BTB, resulting in average performance improvement of 18.56%. Its energy consumption is also reduced by 14.34% over that of the baseline.