摘要
为解决Ceph在处理海量气象小文件时,由于集群数据双倍写入会导致存储性能下降问题,提出了一种Ceph系统中海量气象小文件存取性能优化方法。该方法通过分析文件历史访问日志得到气象小文件间的关联概率,然后依据关联概率设计出文件合并算法将相关联的小文件合并后再存储到Ceph集群;访问文件时,根据文件块的利用率和相关率来衡量合并后小文件间的相关性,并根据其相关性进行文件预读取,减少用户与集群的交互以提高小文件的访问效率。实验表明,该方法与现有方法相比,能明显提高Ceph系统中海量气象小文件的存储效率和访问效率。
In order to solve the problem of the storage performance degrades due to double writing of cluster data when Ceph is dealing with massive meteorological small files.This paper proposes an optimization method for accessing the mass meteorological small files in Ceph system.By analyzing the history file access log to get the association probability between meteorological small files,and then based on the association probability of document merging algorithm to design a small file associated with the relevant storage and then to Ceph;When reading a large number of meteorological small files through the utilization of the file block and the correlation rate to measure the correlation between the merged small files,and according to their relevance to pre-read the file,reducing user interaction with the cluster to improve the reading performance of large meteorological small files.The results of experiment show that the proposed method can significantly improve the efficiency of storing and accessing mass meteorological small files in Ceph system compared with the existing methods.
作者
陆小霞
王勇
雷晓春
LU Xiaoxia;WANG Yong;LEI Xiaochun(School of Information and Communication,Guilin University of Electronic Technology,Guilin 541004,China;School of Computer and Information Security,Guilin University of Electronic Technology,Guilin 541004,China;Guangxi Cooperative Innovation Center of Cloud Computing and Big Data,Guilin University of Electronic Technology,Guilin 541004,China)
出处
《桂林电子科技大学学报》
2019年第1期61-66,共6页
Journal of Guilin University of Electronic Technology
基金
国家自然科学基金(61662018,61661015)
中国博士后科学基金(2016M602922XB)
广西云计算与大数据协同创新中心项目(YDQ17001)