A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk ac...A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk according to the similar service time. Firstly, the files were sorted and stored at the set I in descending order in terms of their service time, then one disk of cluster node was selected randomly when the files were to be assigned, and at last the continuous files were taken orderly from the set I to the disk until the disk reached its load maximum. The experimental results show that the new strategy improves the performance by 20.2% when the load of the system is light and by 31.6% when the load is heavy. And the higher the data access rate, the more evident the improvement of the performance obtained by the heuristic file sorted assignment algorithm.展开更多
Due to the fact that the register files seriously affect the performance and area of coarse-grained reconfigurable cryptographic processors, an efficient structure of the distributed cross-domain register file is prop...Due to the fact that the register files seriously affect the performance and area of coarse-grained reconfigurable cryptographic processors, an efficient structure of the distributed cross-domain register file is proposed to realize a cryptographic processor with a high performance and a lowarea cost. In order to meet the demands of high performance and high flexibility at a lowarea cost, a union structure with the multi-ports access structure, i, e., a distributed crossdomain register file, is designed by analyzing the algorithm features of different ciphers. Considering different algorithm requirements of the global register files and local register files,the circuit design is realized by adopting different design parameters under TSMC( Taiwan Semiconductor Manufacturing Company) 40 nm CMOS( complementary metal oxide semiconductor) technology and compared with other similar works. The experimental results showthat the proposed distributed cross-domain register structure can effectively improve the performance of the unit area, of which the total performance of block per cycle is improved by17. 79% and performance of block per cycle per area is improved by 117%.展开更多
Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are lik...Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallel file systems.The traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence problems.This work instead pursues to enhance the metadata performance in the scale-up manner.Specifically,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in parallel.Our proposal designs a novel metadata server architecture,which employs CPU to interact with file system clients,while offloading the computing tasks about metadata into GPU.To take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file systems.The new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests concurrently.We implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50%under typical metadata operations.The superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads.展开更多
文摘A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk according to the similar service time. Firstly, the files were sorted and stored at the set I in descending order in terms of their service time, then one disk of cluster node was selected randomly when the files were to be assigned, and at last the continuous files were taken orderly from the set I to the disk until the disk reached its load maximum. The experimental results show that the new strategy improves the performance by 20.2% when the load of the system is light and by 31.6% when the load is heavy. And the higher the data access rate, the more evident the improvement of the performance obtained by the heuristic file sorted assignment algorithm.
基金The National Natural Science Foundation of China(No.61176024)
文摘Due to the fact that the register files seriously affect the performance and area of coarse-grained reconfigurable cryptographic processors, an efficient structure of the distributed cross-domain register file is proposed to realize a cryptographic processor with a high performance and a lowarea cost. In order to meet the demands of high performance and high flexibility at a lowarea cost, a union structure with the multi-ports access structure, i, e., a distributed crossdomain register file, is designed by analyzing the algorithm features of different ciphers. Considering different algorithm requirements of the global register files and local register files,the circuit design is realized by adopting different design parameters under TSMC( Taiwan Semiconductor Manufacturing Company) 40 nm CMOS( complementary metal oxide semiconductor) technology and compared with other similar works. The experimental results showthat the proposed distributed cross-domain register structure can effectively improve the performance of the unit area, of which the total performance of block per cycle is improved by17. 79% and performance of block per cycle per area is improved by 117%.
基金Supported by the National Key Research and Development Program of China under Grant No. 2018YFB0203904the National Natural Science Foundation of China under Grant Nos. 61872392, U1811461 and 61832020+4 种基金the Pearl River Science and Technology Nova Program of Guangzhou under Grant No. 201906010008Guangdong Natural Science Foundation under Grant No. 2018B030312002the Major Program of Guangdong Basic and Applied Research under Grant No. 2019B030302002the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant No. 2016ZT06D211the Key-Area Research and Development Program of Guang Dong Province of China under Grant No. 2019B010107001.
文摘Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallel file systems.The traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence problems.This work instead pursues to enhance the metadata performance in the scale-up manner.Specifically,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in parallel.Our proposal designs a novel metadata server architecture,which employs CPU to interact with file system clients,while offloading the computing tasks about metadata into GPU.To take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file systems.The new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests concurrently.We implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50%under typical metadata operations.The superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads.