摘要
指出了由于SSD(Hard Disk Driver,HDD)的不对称I/O特性和价格因素,在分布式系统Hadoop中,由SSD和HDD组成混合存储系统是一种有效的解决方案。HDFS是针对同构集群设计的,对存储介质的类型未加以区分,在为数据块分配存储空间和保存数据块的过程中,不考虑存储介质之间的性能差异,导致新型存储介质的优势无法完全地发挥出来。为此,提出了一种动态的数据分布算法,将写频率较高的数据块存储在写性能较好的存储介质上,将读频率较高的数据块存储在读性能较好的存储介质上,以提高Hadoop系统的数据存取速度。测试结果表明:数据调度线程能够根据系统中数据块的读写情况,自适应地选择数据块的存放位置,提高了系统的数据存取速度。
Due to SSD’s I/O performance is asymmetrical and the high price,SSD can’t completely replace the traditional hard disk drive.Thus hybrid storage systems composed by the SSD and HDD gradually become the main way.HDFS is designed for the homogeneous cluster.There is no distinction between the types of storage media.In the process of allocating storage space for data blocks and saving data blocks,the performance differences between storage media are not taken into account,which inhibits the advantage of the storage medium.A dynamic data distribution algorithm is proposed to store data blocks with higher write frequency on storage media with better write performance and store data blocks with higher read frequency on storage media with better read performance to improve HDFS data access speed..The test results show that the block scheduling threads can adaptively select the storage location of block in the system to improve the data access speed of the system.
作者
蔡宇昂
张鑫䶮
Cai Yuang;Zhang Xinyan(Hubei University of Police,Wuhan,Hubei,430034,China;Huazhong Universily of Science and Technology,Wuhan,Hubei,430074,China)
出处
《绿色科技》
2020年第6期222-225,共4页
Journal of Green Science and Technology