摘要
With the increasing amount of data,there is an urgent need for efficient sorting algorithms to process large data sets.Hardware sorting algorithms have attracted much attention because they can take advantage of different hardware's parallelism.But the traditional hardware sort accelerators suffer“memory wall”problems since their multiple rounds of data transmission between the memory and the processor.In this paper,we utilize the in-situ processing ability of the ReRAM crossbar to design a new ReCAM array that can process the matrix-vector multiplication operation and the vector-scalar comparison in the same array simultaneously.Using this designed ReCAM array,we present ReCSA,which is the first dedicated ReCAM-based sort accelerator.Besides hardware designs,we also develop algorithms to maximize memory utilization and minimize memory exchanges to improve sorting performance.The sorting algorithm in ReCSA can process various data types,such as integer,float,double,and strings.We also present experiments to evaluate the performance and energy efficiency against the state-of-the-art sort accelerators.The experimental results show that ReCSA has 90.92×,46.13×,27.38×,84.57×,and 3.36×speedups against CPU-,GPU-,FPGA-,NDP-,and PIM-based platforms when processing numeric data sets.ReCSA also has 24.82×,32.94×,and 18.22×performance improvement when processing string data sets compared with CPU-,GPU-,and FPGA-based platforms.
基金
supported by the National Natural Science Foundation of China(Grant Nos.61832006,62072195,and 61825202).