在对HDFS进行分析和研究的基础上,在HDFS文件分布式系统中应用File System API进行文件存储和访问,并通过改进的蚁群算法对副本选择进行优化。HDFS API能够有效完成海量数据的存储和管理,提高海量数据存储的效率。通过改进的蚁群算法提...在对HDFS进行分析和研究的基础上,在HDFS文件分布式系统中应用File System API进行文件存储和访问,并通过改进的蚁群算法对副本选择进行优化。HDFS API能够有效完成海量数据的存储和管理,提高海量数据存储的效率。通过改进的蚁群算法提升了文件读取时副本选择的效率,进一步提高了系统效率并使负载均衡。展开更多
Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performanc...Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performance greatly. HDFS is a distributed file system on Hadoop which is the most popular platform for big data analytics. And HDFS adopts fixed-size chunking policy, which is inefficient facing incremental computing. Therefore, in this paper, we proposed iHDFS (incremental HDFS), a distributed file system, which can provide basic guarantee for big data parallel processing. The iHDFS is implemented as an extension to HDFS. In iHDFS, Rabin fingerprint algorithm is applied to achieve content defined chunking. This policy make data chunking has much higher stability, and the intermediate processing results can be reused efficiently, so the performance of incremental data processing can be improved significantly. The effectiveness and efficiency of iHDFS have been demonstrated by the experimental results.展开更多
文摘在对HDFS进行分析和研究的基础上,在HDFS文件分布式系统中应用File System API进行文件存储和访问,并通过改进的蚁群算法对副本选择进行优化。HDFS API能够有效完成海量数据的存储和管理,提高海量数据存储的效率。通过改进的蚁群算法提升了文件读取时副本选择的效率,进一步提高了系统效率并使负载均衡。
文摘Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performance greatly. HDFS is a distributed file system on Hadoop which is the most popular platform for big data analytics. And HDFS adopts fixed-size chunking policy, which is inefficient facing incremental computing. Therefore, in this paper, we proposed iHDFS (incremental HDFS), a distributed file system, which can provide basic guarantee for big data parallel processing. The iHDFS is implemented as an extension to HDFS. In iHDFS, Rabin fingerprint algorithm is applied to achieve content defined chunking. This policy make data chunking has much higher stability, and the intermediate processing results can be reused efficiently, so the performance of incremental data processing can be improved significantly. The effectiveness and efficiency of iHDFS have been demonstrated by the experimental results.