Mainstream processors implement the instruction scheduler using a monolithic CAM-based issue queue (IQ), which consumes increasingly high energy as its size scales. In particular, its instruction wakeup logic accoun...Mainstream processors implement the instruction scheduler using a monolithic CAM-based issue queue (IQ), which consumes increasingly high energy as its size scales. In particular, its instruction wakeup logic accounts for a major portion of the consumed energy. Our study shows that instructions with 2 non-ready operands (called 2OP instructions) are in small percentage, but tend to spend long latencies in the IQ. They can be effectively shelved in a small RAM-based waiting instruction buffer (WIB) and steered into the IQ at appropriate time. With this two-level shelving ability, half of the CAM tag comparators are eliminated in the IQ, which significantly reduces the energy of wakeup operation. In addition, we propose an adaptive banking scheme to downsize the IQ and reduce the bit-width of tag comparators. Experiments indicate that for an 8-wide issue superscalar or SMT proeessor,the energy consumption of the instruction scheduler can be reduced by 67%. Furthermore, the new design has potentially faster scheduler clock speed while maintaining close IPC to the monolithic scheduler design. Compared with the previous work on eliminating tags through prediction, our design is superior in terms of both energy reduction and SMT support.展开更多
文摘Mainstream processors implement the instruction scheduler using a monolithic CAM-based issue queue (IQ), which consumes increasingly high energy as its size scales. In particular, its instruction wakeup logic accounts for a major portion of the consumed energy. Our study shows that instructions with 2 non-ready operands (called 2OP instructions) are in small percentage, but tend to spend long latencies in the IQ. They can be effectively shelved in a small RAM-based waiting instruction buffer (WIB) and steered into the IQ at appropriate time. With this two-level shelving ability, half of the CAM tag comparators are eliminated in the IQ, which significantly reduces the energy of wakeup operation. In addition, we propose an adaptive banking scheme to downsize the IQ and reduce the bit-width of tag comparators. Experiments indicate that for an 8-wide issue superscalar or SMT proeessor,the energy consumption of the instruction scheduler can be reduced by 67%. Furthermore, the new design has potentially faster scheduler clock speed while maintaining close IPC to the monolithic scheduler design. Compared with the previous work on eliminating tags through prediction, our design is superior in terms of both energy reduction and SMT support.