摘要
医疗旅游是目前兴起的一个新型产业,面对今后日益增长的巨大数据,有效数据的存储和用户的快速访问是急需解决的课题。Hadoop的出现满足了这一需求。但Hadoop并不适合用来处理大量的小文件,其HDFS(Hadoop distributed file system)采用主从架构,存储大量的小文件时,元数据快速增加,Name Node内存被大量占用,读取性能也受到一定的影响,直接降低了整个系统的扩展性及效率。利用RDBMS和Hadoop的优势,提出一种改进的小文件存储优化方案,同时又根据电子健康档案数据的特点,提出按副本组进行数据传输存储的方案,并采用数据预取机制,提高访问效率。实验表明,该方法能有效提高电子健康档案中的小文件存储和读取的性能,一定程度上很好地解决了NameNode内存瓶颈问题。
Medical tourist is a newly arisen industry currently. Facing the growing huge data in the future, the storage of valid data and the quick user accessing is a question to be solved urgently. The emergence of Hadoop satisfies the demand. However, Hadoop is not suitable for dealing with massive small files. Its HDFS ( Hadoop distributed file system) adopts the master-slave architecture, when storing a large number of small files, the metadata increases rapidly, and huge amount of NameNode RAM is occupied, the reading performance is also impacted, these reduce the scalability and efficiency of the whole system directly. By utilising the advantages of RDBMS and Hadoop, this paper proposes an improved optimisation scheme of small files storage, and also proposes a scheme of data transmission and storage according copy groups based on the characteristics of digital health archives. And we also use data prefetching mechanism to improve the accessing efficiency. Experiment shows that the method can improve storing and reading performances of digital health archives effectively. It solves the bottleneck problem of NameNode memories to a certain extent.
出处
《计算机应用与软件》
CSCD
2016年第1期21-23,41,共4页
Computer Applications and Software
基金
海南省教育厅自然科学类重点项目(Hj kj2013-03)