期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
基于HDFS的小文件存储技术研究
1
作者 高朝艳 鹿虹 +1 位作者 黄娟 张一 《电信技术研究》 2020年第3期10-15,共6页
大数据平台中的HDFS(Hadoop Distributed File System,Hadoop分布式文件系统)文件系统通用性强、稳定性好,生态圈成熟。通过对HDFS文件系统的研究,在分析了海量数据文件的大小、分布、应用等特点的基础上,针对大容量的信息处理,形成了基... 大数据平台中的HDFS(Hadoop Distributed File System,Hadoop分布式文件系统)文件系统通用性强、稳定性好,生态圈成熟。通过对HDFS文件系统的研究,在分析了海量数据文件的大小、分布、应用等特点的基础上,针对大容量的信息处理,形成了基于HDFS文件系统合并存储管理小文件的模型。在系统已经使用了HDFS的基础上,为保证技术成熟度、节约成本,在HDFS管理大文件的同时,通过合理设计文件存储大小、优化小文件信息管理等方式,在6节点的HDFS文件系统上实现了小文件写速率峰值2GB/S,读写混合时毫秒级读取文件的能力。实现了基于HDFS的海量大文件、小文件的分类存储。 展开更多
关键词 hdfs:Hadoop Distributed file System Hadoop分布式文件系统 NameNode:名字节点 用来管理文件的名字空间和调节客户端访问文件的主服务器。
下载PDF
Performance Improvement through Novel Adaptive Node and Container Aware Scheduler with Resource Availability Control in Hadoop YARN
2
作者 J.S.Manjaly T.Subbulakshmi 《Computer Systems Science & Engineering》 SCIE EI 2023年第12期3083-3108,共26页
The default scheduler of Apache Hadoop demonstrates operational inefficiencies when connecting external sources and processing transformation jobs.This paper has proposed a novel scheduler for enhancement of the perfo... The default scheduler of Apache Hadoop demonstrates operational inefficiencies when connecting external sources and processing transformation jobs.This paper has proposed a novel scheduler for enhancement of the performance of the Hadoop Yet Another Resource Negotiator(YARN)scheduler,called the Adaptive Node and Container Aware Scheduler(ANACRAC),that aligns cluster resources to the demands of the applications in the real world.The approach performs to leverage the user-provided configurations as a unique design to apportion nodes,or containers within the nodes,to application thresholds.Additionally,it provides the flexibility to the applications for selecting and choosing which node’s resources they want to manage and adds limits to prevent threshold breaches by adding additional jobs as needed.Node or container awareness can be utilized individually or in combination to increase efficiency.On top of this,the resource availability within the node and containers can also be investigated.This paper also focuses on the elasticity of the containers and self-adaptiveness depending on the job type.The results proved that 15%–20%performance improvement was achieved compared with the node and container awareness feature of the ANACRAC.It has been validated that this ANACRAC scheduler demonstrates a 70%–90%performance improvement compared with the default Fair scheduler.Experimental results also demonstrated the success of the enhancement and a performance improvement in the range of 60%to 200%when applications were connected with external interfaces and high workloads. 展开更多
关键词 Big data HADOOP YARN hadoop distributed file system(hdfs) MapReduce scheduling fair scheduler
下载PDF
New Spam Filtering Method with Hadoop Tuning-Based MapReduce Naïve Bayes
3
作者 Keungyeup Ji Youngmi Kwon 《Computer Systems Science & Engineering》 SCIE EI 2023年第4期201-214,共14页
As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medi... As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques,we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering.Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine(SVM),Naïve Bayes,K-Nearest Neighbor(KNN),and Decision Tree)in terms of execution time and accuracy.Malicious email was filtered with MapReduce programming using the Naïve Bayes technique,which is a supervised machine learning method,in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied.According to the results of a comparison of the accuracy and predictive error rates of the two methods,the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method. 展开更多
关键词 HADOOP hadoop distributed file system(hdfs) MAPREDUCE configuration parameter malicious email filtering Naïve Bayes
下载PDF
A Forensic Method for Efficient File Extraction in HDFS Based on Three-Level Mapping 被引量:2
4
作者 GAO Yuanzhao LI Binglong 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2017年第2期114-126,共13页
The large scale and distribution of cloud computing storage have become the major challenges in cloud forensics for file extraction. Current disk forensic methods do not adapt to cloud computing well and the forensic ... The large scale and distribution of cloud computing storage have become the major challenges in cloud forensics for file extraction. Current disk forensic methods do not adapt to cloud computing well and the forensic research on distributed file system is inadequate. To address the forensic problems, this paper uses the Hadoop distributed file system (HDFS) as a case study and proposes a forensic method for efficient file extraction based on three-level (3L) mapping. First, HDFS is analyzed from overall architecture to local file system. Second, the 3L mapping of an HDFS file from HDFS namespace to data blocks on local file system is established and a recovery method for deleted files based on 3L mapping is presented. Third, a multi-node Hadoop framework via Xen virtualization platform is set up to test the performance of the method. The results indicate that the proposed method could succeed in efficient location of large files stored across data nodes, make selective image of disk data and get high recovery rate of deleted files. 展开更多
关键词 the Hadoop distributed file system hdfs forensics cloud forensics three-level (3L) mapping metadata file extraction file recovery Ext4
原文传递
BlockHDFS:Blockchain-integrated Hadoop distributed file system for secure provenance traceability 被引量:2
5
作者 Viraaji Mothukuri Sai S.Cheerla +2 位作者 Reza M.Parizi Qi Zhang Kim-Kwang Raymond Choo 《Blockchain(Research and Applications)》 2021年第4期30-36,共7页
Hadoop Distributed File System(HDFS)is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop.HDFS allows one to manage large volumes of data using low-cost commodity hardwa... Hadoop Distributed File System(HDFS)is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop.HDFS allows one to manage large volumes of data using low-cost commodity hardware.However,vulnerabilities in HDFS can be exploited for nefarious activities.This reinforces the importance of ensuring robust security to facilitate file sharing in Hadoop as well as having a trusted mechanism to check the authenticity of shared files.This is the focus of this paper,where we aim to improve the security of HDFS using a blockchain-enabled approach(hereafter referred to as BlockHDFS).Specifically,the proposed BlockHDFS uses the enterprise-level Hyperledger Fabric platform to capitalize on files'metadata for building trusted data security and traceability in HDFS. 展开更多
关键词 Big data HADOOP Blockchain Hyperledger fabric Hadoop distributed file system(hdfs) TRACEABILITY Security Privacy
原文传递
MIX-RS:A Multi-Indexing System Based on HDFS for Remote Sensing Data Storage 被引量:3
6
作者 Jiashu Wu Jingpan Xiong +2 位作者 Hao Dai Yang Wang Chengzhong Xu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第6期881-893,共13页
A large volume of Remote Sensing(RS)data has been generated with the deployment of satellite technologies.The data facilitate research in ecological monitoring,land management and desertification,etc.The characteristi... A large volume of Remote Sensing(RS)data has been generated with the deployment of satellite technologies.The data facilitate research in ecological monitoring,land management and desertification,etc.The characteristics of RS data(e.g.,enormous volume,large single-file size,and demanding requirement of fault tolerance)make the Hadoop Distributed File System(HDFS)an ideal choice for RS data storage as it is efficient,scalable,and equipped with a data replication mechanism for failure resilience.To use RS data,one of the most important techniques is geospatial indexing.However,the large data volume makes it time-consuming to efficiently construct and leverage.Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures,deploying multiple geospatial indices becomes natural to optimise the efficacy.Moreover,because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data,the use of multi-indexing will not cause large overhead.Therefore,we design a framework called Multi-IndeXing-RS(MIX-RS)that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency.Given the fault tolerance provided by the HDFS,RS data are structurally stored inside for faster geospatial indexing.Additionally,multi-indexing enhances efficiency.The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts.The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences,demonstrating excellent geospatial indexing performance. 展开更多
关键词 Remote Sensing(RS)data geospatial indexing multi-indexing mechanism Hadoop Distributed file System(hdfs) Multi-IndeXing-RS(MIX-RS)
原文传递
Mobile Internet Big Data Platform in China Unicom 被引量:6
7
作者 Wenliang Huang Zhen Chen +3 位作者 Wenyu Dong Hang Li Bin Cao Junwei Cao 《Tsinghua Science and Technology》 SCIE EI CAS 2014年第1期95-101,共7页
China Unicorn, the largest WCDMA 3G operator in China, meets the requirements of the historical Mobile Internet Explosion, or the surging of Mobile Internet Traffic from mobile terminals. According to the internal sta... China Unicorn, the largest WCDMA 3G operator in China, meets the requirements of the historical Mobile Internet Explosion, or the surging of Mobile Internet Traffic from mobile terminals. According to the internal statistics of China Unicom, mobile user traffic has increased rapidly with a Compound Annual Growth Rate (CAGR) of 135%. Currently China Unicorn monthly stores more than 2 trillion records, data volume is over 525 TB, and the highest data volume has reached a peak of 5 PB. Since October 2009, China Unicom has been developing a home-brewed big data storage and analysis platform based on the open source Hadoop Distributed File System (HDFS) as it has a long-term strategy to make full use of this Big Data. All Mobile Internet Traffic is well served using this big data platform. Currently, the writing speed has reached 1 390 000 records per second, and the record retrieval time in the table that contains trillions of records is less than 100 ms. To take advantage of this opportunity to be a Big Data Operator, China Unicom has developed new functions and has multiple innovations to solve space and time constraint challenges presented in data processing. In this paper, we will introduce our big data platform in detail. Based on this big data platform, China Unicom is building an industry ecosystem based on Mobile Internet Big Data, and considers that a telecom operator centric ecosystem can be formed that is critical to reach prosperity in the modern communications business. 展开更多
关键词 big data platform China Unicorn 3G wireless network Hadoop Distributed file System hdfs mobilenternet network forensic data warehouse HBASE
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部