期刊文献+
共找到11篇文章
< 1 >
每页显示 20 50 100
BlockHDFS:Blockchain-integrated Hadoop distributed file system for secure provenance traceability 被引量:2
1
作者 Viraaji Mothukuri Sai S.Cheerla +2 位作者 Reza M.Parizi Qi Zhang Kim-Kwang Raymond Choo 《Blockchain(Research and Applications)》 2021年第4期30-36,共7页
Hadoop Distributed File System(HDFS)is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop.HDFS allows one to manage large volumes of data using low-cost commodity hardwa... Hadoop Distributed File System(HDFS)is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop.HDFS allows one to manage large volumes of data using low-cost commodity hardware.However,vulnerabilities in HDFS can be exploited for nefarious activities.This reinforces the importance of ensuring robust security to facilitate file sharing in Hadoop as well as having a trusted mechanism to check the authenticity of shared files.This is the focus of this paper,where we aim to improve the security of HDFS using a blockchain-enabled approach(hereafter referred to as BlockHDFS).Specifically,the proposed BlockHDFS uses the enterprise-level Hyperledger Fabric platform to capitalize on files'metadata for building trusted data security and traceability in HDFS. 展开更多
关键词 Big data HADOOP Blockchain Hyperledger fabric Hadoop distributed file system(HDFS) TRACEABILITY Security Privacy
原文传递
An adaptive dynamic feedback load balancing algorithm based on QoS in distributed file system 被引量:1
2
作者 Ming Wang Jianfeng Guan 《Journal of Communications and Information Networks》 2017年第3期30-40,共11页
An adaptive dynamic load balancing algorithm based on QoS is proposed to improve the performance of load balancing in distributed file system,combining the advantages of a variety of load balancing algorithms.The new ... An adaptive dynamic load balancing algorithm based on QoS is proposed to improve the performance of load balancing in distributed file system,combining the advantages of a variety of load balancing algorithms.The new algorithm uses a tuple containing the number of files and the total file size as the QoS measure for the requested task.The master node sets a threshold for the requested task based on the QoS to filter storage nodes that meet the requirements of the task.In order to guarantee the reliability of the new algorithm,we consider the impact of CPU utilization,memory usage,disk IO occupancy rate,network bandwidth usage and hard disk usage on load balancing performance when calculating the real-time load balancing of storage nodes.The heterogeneity of the network is considered when the master node schedule task assignments to ensure the fairness of the algorithm.The comprehensive evaluation value is determined based the performance load ratio,which is calculated from the real-time load value of the storage node and a performance value after normalization.The master node assigns tasks to the storage node with the highest comprehensive evaluation value.The storage nodes provide adaptive feedback based on changes in the degree of connectivity,rather than periodic update of the load information.The actual distributed file system environment is set up on the server cluster,the performance of the new algorithm is tested through a contrast experiment.The experimental results show that the new algorithm can effectively reduce the average response time of the system,improve throughput,and enable the system load to reach a good balance. 展开更多
关键词 distributed file system load balancing QOS performance load ratio adaptive dynamic feedback
原文传递
Research on Real-Time High Reliable Network File Distribution Technology
3
作者 Chenglong Li Peipeng Liu +5 位作者 Hewei Yu Mengmeng Ge Xiangzhan Yu Yi Xin Yuhang Wang Dongyu Zhang 《Computers, Materials & Continua》 SCIE EI 2020年第11期1739-1752,共14页
The rapid development of Internet of Things(IoT)technology has made previously unavailable data available,and applications can take advantage of device data for people to visualize,explore,and build complex analyses.A... The rapid development of Internet of Things(IoT)technology has made previously unavailable data available,and applications can take advantage of device data for people to visualize,explore,and build complex analyses.As the size of the network and the number of network users continue to increase,network requests tend to aggregate on a small number of network resources,which results in uneven load on network requests.Real-time,highly reliable network file distribution technology is of great importance in the Internet of Things.This paper studies real-time and highly reliable file distribution technology for large-scale networks.In response to this topic,this paper studies the current file distribution technology,proposes a file distribution model,and proposes a corresponding load balancing method based on the file distribution model.Experiments show that the system has achieved real-time and high reliability of network transmission. 展开更多
关键词 High reliable network file distribution load balancing
下载PDF
Performance Improvement through Novel Adaptive Node and Container Aware Scheduler with Resource Availability Control in Hadoop YARN
4
作者 J.S.Manjaly T.Subbulakshmi 《Computer Systems Science & Engineering》 SCIE EI 2023年第12期3083-3108,共26页
The default scheduler of Apache Hadoop demonstrates operational inefficiencies when connecting external sources and processing transformation jobs.This paper has proposed a novel scheduler for enhancement of the perfo... The default scheduler of Apache Hadoop demonstrates operational inefficiencies when connecting external sources and processing transformation jobs.This paper has proposed a novel scheduler for enhancement of the performance of the Hadoop Yet Another Resource Negotiator(YARN)scheduler,called the Adaptive Node and Container Aware Scheduler(ANACRAC),that aligns cluster resources to the demands of the applications in the real world.The approach performs to leverage the user-provided configurations as a unique design to apportion nodes,or containers within the nodes,to application thresholds.Additionally,it provides the flexibility to the applications for selecting and choosing which node’s resources they want to manage and adds limits to prevent threshold breaches by adding additional jobs as needed.Node or container awareness can be utilized individually or in combination to increase efficiency.On top of this,the resource availability within the node and containers can also be investigated.This paper also focuses on the elasticity of the containers and self-adaptiveness depending on the job type.The results proved that 15%–20%performance improvement was achieved compared with the node and container awareness feature of the ANACRAC.It has been validated that this ANACRAC scheduler demonstrates a 70%–90%performance improvement compared with the default Fair scheduler.Experimental results also demonstrated the success of the enhancement and a performance improvement in the range of 60%to 200%when applications were connected with external interfaces and high workloads. 展开更多
关键词 Big data HADOOP YARN hadoop distributed file system(HDFS) MapReduce scheduling fair scheduler
下载PDF
New Spam Filtering Method with Hadoop Tuning-Based MapReduce Naïve Bayes
5
作者 Keungyeup Ji Youngmi Kwon 《Computer Systems Science & Engineering》 SCIE EI 2023年第4期201-214,共14页
As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medi... As the importance of email increases,the amount of malicious email is also increasing,so the need for malicious email filtering is growing.Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques,we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering.Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine(SVM),Naïve Bayes,K-Nearest Neighbor(KNN),and Decision Tree)in terms of execution time and accuracy.Malicious email was filtered with MapReduce programming using the Naïve Bayes technique,which is a supervised machine learning method,in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied.According to the results of a comparison of the accuracy and predictive error rates of the two methods,the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method. 展开更多
关键词 HADOOP hadoop distributed file system(HDFS) MAPREDUCE configuration parameter malicious email filtering Naïve Bayes
下载PDF
基于HDFS的小文件存储技术研究
6
作者 高朝艳 鹿虹 +1 位作者 黄娟 张一 《电信技术研究》 2020年第3期10-15,共6页
大数据平台中的HDFS(Hadoop Distributed File System,Hadoop分布式文件系统)文件系统通用性强、稳定性好,生态圈成熟。通过对HDFS文件系统的研究,在分析了海量数据文件的大小、分布、应用等特点的基础上,针对大容量的信息处理,形成了基... 大数据平台中的HDFS(Hadoop Distributed File System,Hadoop分布式文件系统)文件系统通用性强、稳定性好,生态圈成熟。通过对HDFS文件系统的研究,在分析了海量数据文件的大小、分布、应用等特点的基础上,针对大容量的信息处理,形成了基于HDFS文件系统合并存储管理小文件的模型。在系统已经使用了HDFS的基础上,为保证技术成熟度、节约成本,在HDFS管理大文件的同时,通过合理设计文件存储大小、优化小文件信息管理等方式,在6节点的HDFS文件系统上实现了小文件写速率峰值2GB/S,读写混合时毫秒级读取文件的能力。实现了基于HDFS的海量大文件、小文件的分类存储。 展开更多
关键词 HDFS:Hadoop distributed file system Hadoop分布式文件系统 NameNode:名字节点 用来管理文件的名字空间和调节客户端访问文件的主服务器。
下载PDF
MIX-RS:A Multi-Indexing System Based on HDFS for Remote Sensing Data Storage 被引量:3
7
作者 Jiashu Wu Jingpan Xiong +2 位作者 Hao Dai Yang Wang Chengzhong Xu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第6期881-893,共13页
A large volume of Remote Sensing(RS)data has been generated with the deployment of satellite technologies.The data facilitate research in ecological monitoring,land management and desertification,etc.The characteristi... A large volume of Remote Sensing(RS)data has been generated with the deployment of satellite technologies.The data facilitate research in ecological monitoring,land management and desertification,etc.The characteristics of RS data(e.g.,enormous volume,large single-file size,and demanding requirement of fault tolerance)make the Hadoop Distributed File System(HDFS)an ideal choice for RS data storage as it is efficient,scalable,and equipped with a data replication mechanism for failure resilience.To use RS data,one of the most important techniques is geospatial indexing.However,the large data volume makes it time-consuming to efficiently construct and leverage.Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures,deploying multiple geospatial indices becomes natural to optimise the efficacy.Moreover,because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data,the use of multi-indexing will not cause large overhead.Therefore,we design a framework called Multi-IndeXing-RS(MIX-RS)that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency.Given the fault tolerance provided by the HDFS,RS data are structurally stored inside for faster geospatial indexing.Additionally,multi-indexing enhances efficiency.The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts.The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences,demonstrating excellent geospatial indexing performance. 展开更多
关键词 Remote Sensing(RS)data geospatial indexing multi-indexing mechanism Hadoop distributed file system(HDFS) Multi-IndeXing-RS(MIX-RS)
原文传递
A Forensic Method for Efficient File Extraction in HDFS Based on Three-Level Mapping 被引量:2
8
作者 GAO Yuanzhao LI Binglong 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2017年第2期114-126,共13页
The large scale and distribution of cloud computing storage have become the major challenges in cloud forensics for file extraction. Current disk forensic methods do not adapt to cloud computing well and the forensic ... The large scale and distribution of cloud computing storage have become the major challenges in cloud forensics for file extraction. Current disk forensic methods do not adapt to cloud computing well and the forensic research on distributed file system is inadequate. To address the forensic problems, this paper uses the Hadoop distributed file system (HDFS) as a case study and proposes a forensic method for efficient file extraction based on three-level (3L) mapping. First, HDFS is analyzed from overall architecture to local file system. Second, the 3L mapping of an HDFS file from HDFS namespace to data blocks on local file system is established and a recovery method for deleted files based on 3L mapping is presented. Third, a multi-node Hadoop framework via Xen virtualization platform is set up to test the performance of the method. The results indicate that the proposed method could succeed in efficient location of large files stored across data nodes, make selective image of disk data and get high recovery rate of deleted files. 展开更多
关键词 the Hadoop distributed file system (HDFS) forensics cloud forensics three-level (3L) mapping metadata file extraction file recovery Ext4
原文传递
TIFAflow: Enhancing Traffic Archiving System with Flow Granularity for Forensic Analysis in Network Security 被引量:3
9
作者 Zhen Chen Linyun Ruan +2 位作者 Junwei Cao Yifan Yu Xin Jiang 《Tsinghua Science and Technology》 SCIE EI CAS 2013年第4期406-417,共12页
The archiving of Internet traffic is an essential function for retrospective network event analysis and forensic computer communication. The state-of-the-art approach for network monitoring and analysis involves stora... The archiving of Internet traffic is an essential function for retrospective network event analysis and forensic computer communication. The state-of-the-art approach for network monitoring and analysis involves storage and analysis of network flow statistic. However, this approach loses much valuable information within the Internet traffic. With the advancement of commodity hardware, in particular the volume of storage devices and the speed of interconnect technologies used in network adapter cards and multi-core processors, it is now possible to capture 10 Gbps and beyond real-time network traffic using a commodity computer, such as n2disk. Also with the advancement of distributed file system (such as Hadoop, ZFS, etc.) and open cloud computing platform (such as OpenStack, CloudStack, and Eucalyptus, etc.), it is practical to store such large volume of traffic data and fully in-depth analyse the inside communication within an acceptable latency. In this paper, based on well- known TimeMachine, we present TIFAflow, the design and implementation of a novel system for archiving and querying network flows. Firstly, we enhance the traffic archiving system named TImemachine+FAstbit (TIFA) with flow granularity, i.e., supply the system with flow table and flow module. Secondly, based on real network traces, we conduct performance comparison experiments of TIFAflow with other implementations such as common database solution, TimeMachine and TIFA system. Finally, based on comparison results, we demonstrate that TIFAflow has a higher performance improvement in storing and querying performance than TimeMachine and TIFA, both in time and space metrics. 展开更多
关键词 network security traffic archival forensic analysis phishing attack bitmap database hadoop distributed file system cloud computing NoSQL
原文传递
Mobile Internet Big Data Platform in China Unicom 被引量:6
10
作者 Wenliang Huang Zhen Chen +3 位作者 Wenyu Dong Hang Li Bin Cao Junwei Cao 《Tsinghua Science and Technology》 SCIE EI CAS 2014年第1期95-101,共7页
China Unicorn, the largest WCDMA 3G operator in China, meets the requirements of the historical Mobile Internet Explosion, or the surging of Mobile Internet Traffic from mobile terminals. According to the internal sta... China Unicorn, the largest WCDMA 3G operator in China, meets the requirements of the historical Mobile Internet Explosion, or the surging of Mobile Internet Traffic from mobile terminals. According to the internal statistics of China Unicom, mobile user traffic has increased rapidly with a Compound Annual Growth Rate (CAGR) of 135%. Currently China Unicorn monthly stores more than 2 trillion records, data volume is over 525 TB, and the highest data volume has reached a peak of 5 PB. Since October 2009, China Unicom has been developing a home-brewed big data storage and analysis platform based on the open source Hadoop Distributed File System (HDFS) as it has a long-term strategy to make full use of this Big Data. All Mobile Internet Traffic is well served using this big data platform. Currently, the writing speed has reached 1 390 000 records per second, and the record retrieval time in the table that contains trillions of records is less than 100 ms. To take advantage of this opportunity to be a Big Data Operator, China Unicom has developed new functions and has multiple innovations to solve space and time constraint challenges presented in data processing. In this paper, we will introduce our big data platform in detail. Based on this big data platform, China Unicom is building an industry ecosystem based on Mobile Internet Big Data, and considers that a telecom operator centric ecosystem can be formed that is critical to reach prosperity in the modern communications business. 展开更多
关键词 big data platform China Unicorn 3G wireless network Hadoop distributed file system (HDFS) mobilenternet network forensic data warehouse HBASE
原文传递
MobSafe:Cloud Computing Based Forensic Analysis for Massive Mobile Applications Using Data Mining 被引量:2
11
作者 Jianlin Xu Yifan Yu +4 位作者 Zhen Chen Bin Cao Wenyu Dong Yu Guo Junwei Cao 《Tsinghua Science and Technology》 SCIE EI CAS 2013年第4期418-427,共10页
With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Int... With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Internet, the apps replace the PC client software as the major target of malicious usage. In this paper, to improve the security status of current mobile apps, we propose a methodology to evaluate mobile apps based on cloud computing platform and data mining. We also present a prototype system named MobSafe to identify the mobile app's virulence or benignancy. Compared with traditional method, such as permission pattern based method, MobSafe combines the dynamic and static analysis methods to comprehensively evaluate an Android app. In the implementation, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF), the two representative dynamic and static analysis methods, to evaluate the Android apps and estimate the total time needed to evaluate all the apps stored in one mobile app market. Based on the real trace from a commercial mobile app market called AppChina, we can collect the statistics of the number of active Android apps, the average number apps installed in one Android device, and the expanding ratio of mobile apps. As mobile app market serves as the main line of defence against mobile malwares, our evaluation results show that it is practical to use cloud computing platform and data mining to verify all stored apps routinely to filter out malware apps from mobile app markets. As the future work, MobSafe can extensively use machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage. 展开更多
关键词 Android platform mobile malware detection cloud computing forensic analysis machine learning redis key-value store big data hadoop distributed file system data mining
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部