In the Big Data era,numerous sources and environments generate massive amounts of data.This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and ...In the Big Data era,numerous sources and environments generate massive amounts of data.This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and anticipate decisions for future changes.Hadoop is used to process this kind of data.It is known to handle vast volumes of data more efficiently than tiny amounts,which results in inefficiency in the framework.This study proposes a novel solution to the problem by applying the Enhanced Best Fit Merging algorithm(EBFM)that merges files depending on predefined parameters(type and size).Implementing this algorithm will ensure that the maximum amount of the block size and the generated file size will be in the same range.Its primary goal is to dynamically merge files with the stated criteria based on the file type to guarantee the efficacy and efficiency of the established system.This procedure takes place before the files are available for the Hadoop framework.Additionally,the files generated by the system are named with specific keywords to ensure there is no data loss(file overwrite).The proposed approach guarantees the generation of the fewest possible large files,which reduces the input/output memory burden and corresponds to the Hadoop framework’s effectiveness.The findings show that the proposed technique enhances the framework’s performance by approximately 64%while comparing all other potential performance-impairing variables.The proposed approach is implementable in any environment that uses the Hadoop framework,not limited to smart cities,real-time data analysis,etc.展开更多
In this paper, we present a distributed multi-level cache system based on cloud storage, which is aimed at the low access efficiency of small spatio-temporal data files in information service system of Smart City. Tak...In this paper, we present a distributed multi-level cache system based on cloud storage, which is aimed at the low access efficiency of small spatio-temporal data files in information service system of Smart City. Taking classification attribute of small spatio-temporal data files in Smart City as the basis of cache content selection, the cache system adopts different cache pool management strategies in different levels of cache. The results of experiment in prototype system indicate that multi-level cache in this paper effectively increases the access bandwidth of small spatio-temporal files in Smart City and greatly improves service quality of multiple concurrent access in system.展开更多
With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and pub...With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and public administration.Parallel file systems provide storage services for multiple applications.As a result,various requirements need to be met.However,parallel file systems usually provide a unified storage solution,which cannot meet specific application needs.In this paper,an extended tile handle scheme is proposed to deal with this problem.The original file handle is extended to record I/O optimization information,which allows file systems to specify optimizations for a file or directory based on workload characteristics.Therefore,fine-grained management of I/O optimizations can be achieved.On the basis of the extended file handle scheme,data prefetching and small file optimization mechanisms are proposed for parallel file systems.The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.展开更多
基金This research was supported by the Universiti Sains Malaysia(USM)and the ministry of Higher Education Malaysia through Fundamental Research Grant Scheme(FRGS-Grant No:FRGS/1/2020/TK0/USM/02/1).
文摘In the Big Data era,numerous sources and environments generate massive amounts of data.This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and anticipate decisions for future changes.Hadoop is used to process this kind of data.It is known to handle vast volumes of data more efficiently than tiny amounts,which results in inefficiency in the framework.This study proposes a novel solution to the problem by applying the Enhanced Best Fit Merging algorithm(EBFM)that merges files depending on predefined parameters(type and size).Implementing this algorithm will ensure that the maximum amount of the block size and the generated file size will be in the same range.Its primary goal is to dynamically merge files with the stated criteria based on the file type to guarantee the efficacy and efficiency of the established system.This procedure takes place before the files are available for the Hadoop framework.Additionally,the files generated by the system are named with specific keywords to ensure there is no data loss(file overwrite).The proposed approach guarantees the generation of the fewest possible large files,which reduces the input/output memory burden and corresponds to the Hadoop framework’s effectiveness.The findings show that the proposed technique enhances the framework’s performance by approximately 64%while comparing all other potential performance-impairing variables.The proposed approach is implementable in any environment that uses the Hadoop framework,not limited to smart cities,real-time data analysis,etc.
基金Supported by the Natural Science Foundation of Hubei Province(2012FFC034,2014CFC1100)
文摘In this paper, we present a distributed multi-level cache system based on cloud storage, which is aimed at the low access efficiency of small spatio-temporal data files in information service system of Smart City. Taking classification attribute of small spatio-temporal data files in Smart City as the basis of cache content selection, the cache system adopts different cache pool management strategies in different levels of cache. The results of experiment in prototype system indicate that multi-level cache in this paper effectively increases the access bandwidth of small spatio-temporal files in Smart City and greatly improves service quality of multiple concurrent access in system.
基金supported by the National key R&D Program of China(2018YFB0203901)the National Natural Science Foundation of China(Grant No.61772053)+1 种基金the Science Challenge Project,No.TZ2016002the fund of the State Key Laboratory of Software Development Environment(SKLSDE-2017ZX-10)。
文摘With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and public administration.Parallel file systems provide storage services for multiple applications.As a result,various requirements need to be met.However,parallel file systems usually provide a unified storage solution,which cannot meet specific application needs.In this paper,an extended tile handle scheme is proposed to deal with this problem.The original file handle is extended to record I/O optimization information,which allows file systems to specify optimizations for a file or directory based on workload characteristics.Therefore,fine-grained management of I/O optimizations can be achieved.On the basis of the extended file handle scheme,data prefetching and small file optimization mechanisms are proposed for parallel file systems.The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.