期刊文献+
共找到12篇文章
< 1 >
每页显示 20 50 100
基于Hadoop MapReduce作业调度方法研究
1
作者 王玉芹 《电脑知识与技术》 2018年第9X期10-11,共2页
信息技术飞速发展,以及如今的互联网的快速性的发展,云计算作为现在最为流行的计算模式,很好达到了传统的互联网式服务模式。关系到集群运行得很好的性能以及整个集群,能否有效利用各个的调度系统资源的关键,更好地来满足信息用户的需... 信息技术飞速发展,以及如今的互联网的快速性的发展,云计算作为现在最为流行的计算模式,很好达到了传统的互联网式服务模式。关系到集群运行得很好的性能以及整个集群,能否有效利用各个的调度系统资源的关键,更好地来满足信息用户的需求。但是随着不同应用的环境变化,Hadoop的调度算法很难充分的适应用户的应用式环境变化的多种性需求,因此改进现有的业务调度式流程算法非常的重要。本文针对在数据的区域性方面初始作业的调度算法的不足性,在引入预先提取技术的基础上,设计了一种基于资源提前取出的Hadoop MapReduce作业调度算法。 展开更多
关键词 云计算 业务调度式流程数据的区域性 资源提前取出 hadoop mapreduce
下载PDF
基于内容的Hadoop/MapReduce架构图像检索方法
2
作者 蔡丽娟 《福建广播电视大学学报》 2014年第5期41-45,共5页
运用Hadoop/Map Reduce并行海量图像处理框架进行基于内容的海量图像检索,将海量图像数据分布式存储在众多节点上,运用优化的ACCC算法在各节点上进行基于内容的图像搜索分析算法一体化处理,通过与传统并行计算方法和单节点方法试验对比... 运用Hadoop/Map Reduce并行海量图像处理框架进行基于内容的海量图像检索,将海量图像数据分布式存储在众多节点上,运用优化的ACCC算法在各节点上进行基于内容的图像搜索分析算法一体化处理,通过与传统并行计算方法和单节点方法试验对比,演示本方法在存储能力和计算检索效能的优势。 展开更多
关键词 hadoop/mapreduce 并行处理 图像处理
下载PDF
基于Hadoop/MapReduce的K_NN算法
3
作者 艾树宇 《科技传播》 2013年第1期203-204,200,共3页
分布式框架Hadoop/MapReduce的逐渐流行,本文针对机器算法K最邻近算法,阐述其在Hadoop/MapReduce上的实现过程,其中对于文本的相似度会利用余弦定理还处理,这样在Hadoop/MapReduce环境下实现K-NN算法对位置类别的文本分类。机器算法在Ha... 分布式框架Hadoop/MapReduce的逐渐流行,本文针对机器算法K最邻近算法,阐述其在Hadoop/MapReduce上的实现过程,其中对于文本的相似度会利用余弦定理还处理,这样在Hadoop/MapReduce环境下实现K-NN算法对位置类别的文本分类。机器算法在Hadoop/MapReduce上的实现,同时增加了算法的可扩展性和分析文本的能力,适应于当今大规模数据处理的要求。 展开更多
关键词 K-NN hadoop/mapreduce 文本分类 余弦定理 机器学习
下载PDF
Job schedulers for Big data processing in Hadoop environment: testing real-life schedulers using benchmark programs 被引量:2
4
作者 Mohd Usama Mengchen Liu Min Chen 《Digital Communications and Networks》 SCIE 2017年第4期260-273,共14页
At present, big data is very popular, because it has proved to be much successful in many fields such as social media, E-commerce transactions, etc. Big data describes the tools and technologies needed to capture, man... At present, big data is very popular, because it has proved to be much successful in many fields such as social media, E-commerce transactions, etc. Big data describes the tools and technologies needed to capture, manage, store, distribute, and analyze petabyte or larger-sized datasets having different structures with high speed. Big data can be structured, unstructured, or semi structured. Hadoop is an open source framework that is used to process large amounts of data in an inexpensive and efficient way, and job scheduling is a key factor for achieving high performance in big data processing. This paper gives an overview of big data and highlights the problems and challenges in big data. It then highlights Hadoop Distributed File System (HDFS), Hadoop MapReduce, and various parameters that affect the performance of job scheduling algorithms in big data such as Job Tracker, Task Tracker, Name Node, Data Node, etc. The primary purpose of this paper is to present a comparative study of job scheduling algorithms along with their experimental results in Hadoop environment. In addition, this paper describes the advantages, disadvantages, features, and drawbacks of various Hadoop job schedulers such as FIFO, Fair, capacity, Deadline Constraints, Delay, LATE, Resource Aware, etc, and provides a comparative study among these schedulers. 展开更多
关键词 Big Data hadoop mapreduce HDFS Scheduler Classification Locality Benchmark
下载PDF
分布式网络环境中基于Hadoop的矩阵乘法算法研究
5
作者 杨博 《信息通信》 2016年第4期18-19,共2页
互联网时代已经来临,面对大规模数据的处理,传统计算机技术已跟不上步伐,文章引入了开源云计算系统Hadoop(一种分布式计算平台),利用Mapreduce编程模式对互联网中经常涉及的大规模矩阵乘法的算法理论进行了相关研究,并对Hadoop相关技术... 互联网时代已经来临,面对大规模数据的处理,传统计算机技术已跟不上步伐,文章引入了开源云计算系统Hadoop(一种分布式计算平台),利用Mapreduce编程模式对互联网中经常涉及的大规模矩阵乘法的算法理论进行了相关研究,并对Hadoop相关技术领域的应用做了展望。 展开更多
关键词 矩阵乘法 hadoopmapreduce 并行化计算
下载PDF
MapReduce in the Cloud: Data-Location-Aware VM Scheduling
6
作者 Tung Nguyen Weisong Shi 《ZTE Communications》 2013年第4期18-26,共9页
We have witnessed the fast-growing deployment of Hadoop,an open-source implementation of the MapReduce programming model,for purpose of data-intensive computing in the cloud.However,Hadoop was not originally designed ... We have witnessed the fast-growing deployment of Hadoop,an open-source implementation of the MapReduce programming model,for purpose of data-intensive computing in the cloud.However,Hadoop was not originally designed to run transient jobs in which us ers need to move data back and forth between storage and computing facilities.As a result,Hadoop is inefficient and wastes resources when operating in the cloud.This paper discusses the inefficiency of MapReduce in the cloud.We study the causes of this inefficiency and propose a solution.Inefficiency mainly occurs during data movement.Transferring large data to computing nodes is very time-con suming and also violates the rationale of Hadoop,which is to move computation to the data.To address this issue,we developed a dis tributed cache system and virtual machine scheduler.We show that our prototype can improve performance significantly when run ning different applications. 展开更多
关键词 cloud mapreduce VM scheduling data location hadoop
下载PDF
Personalized Recommendation System on Hadoop and HBase
7
作者 Shufen Zhang Yanyan Dong +1 位作者 Xuebin Chen Shi Wang 《国际计算机前沿大会会议论文集》 2015年第B12期10-11,共2页
In view of the existing recommendation system in the Big Data have two insufficiencies:poor scalability of the data storage and poor expansibility of the recommendation algorithm,research and analysis the IBCF algorit... In view of the existing recommendation system in the Big Data have two insufficiencies:poor scalability of the data storage and poor expansibility of the recommendation algorithm,research and analysis the IBCF algorithm and the working principle of Hadoop and HBase platform,a scheme for optimizing the design of personalized recommendation system based on Hadoop and HBase platform is proposed.The experimental results show that,using the HBase database can effectively solve the problem of mass data storage,using the MapReduce programming model of Hadoop platform parallel processing recommendation problem,can significantly improve the efficiency of the algorithm,so as to further improve the performance of personalized recommendation system. 展开更多
关键词 hadoop·HBase·mapreduce·Personalized RECOMMENDATION
下载PDF
可扩展存储网络空间数据信息动态分配方法研究
8
作者 李英 《周口师范学院学报》 CAS 2018年第5期125-128,共4页
现有信息分配技术不能根据数据包的具体走向建立文件索引列表,导致分配效率过低现象频繁发生.为解决此问题,提出基于可扩展存储网络空间环境的数据信息动态分配方法.通过确定Hadoop/MapReduce存储框架、增设文件存储合并模块、完善网络... 现有信息分配技术不能根据数据包的具体走向建立文件索引列表,导致分配效率过低现象频繁发生.为解决此问题,提出基于可扩展存储网络空间环境的数据信息动态分配方法.通过确定Hadoop/MapReduce存储框架、增设文件存储合并模块、完善网络空间索引文件列表三个步骤,完成可扩展存储网络空间环境的搭建.在此基础上,通过数据信息动态优先级计算、分配情况判断、参数修正三个步骤,完成新型数据信息动态分配方法的搭建.设计对比实验结果显示,应用基于可扩展存储网络空间环境数据信息动态分配方法后,可以更好地控制数据包的走向,并及时、有效地建立信息索引列表,分配效率过低现象的发生几率得到一定程度地控制. 展开更多
关键词 扩展存储 网络空间 信息动态分配 hadoop/mapreduce 合并模块
下载PDF
A comprehensive review from sequential association computing to Hadoop-MapReduce parallel computing in a retail scenario 被引量:4
9
作者 Neha Verma Jatinder Singh 《Journal of Management Analytics》 EI 2017年第4期359-392,共34页
Today,the customer’s requirements are entirely transformed.Many big retail organizations are facing sudden decline in the sales and revenues caused due to indecisive and erratic purchasing habits of recent generation... Today,the customer’s requirements are entirely transformed.Many big retail organizations are facing sudden decline in the sales and revenues caused due to indecisive and erratic purchasing habits of recent generation of users,as they get abundant preferred information such as cheaper rates,amazing offers,discounts,comparison of similar products,etc.over their smartphones or laptops hence they straightaway place order instead of walking down to showroom.As a result,large companies such as Tesco,Wal-Mart,Target,etc.have realized that it is requisite to shake hands with startup firms which already supports platform to retain customers either via deep exploration of transactional data or by offering lucrative offers in the benefit of customer and to promote market basket.The data which are generated from consumer purchase pattern,Big Data is a concern for companies as a result various big retail organizations are applying advanced and scalable data mining algorithms to precisely store and evaluate data in real-time manner to boost market basket analysis.This research work discusses various improved association rule mining(ARM)algorithms.The objective of this study is to identify gaps,providing opportunities for new research,to recognize expansion of Big Data analytics with retail environment and its future directions.This paper assimilates various aspects of parallel ARM algorithm for market basket analysis against sequential and distributed nature which are further escalated to Hadoop and MapReduce computing platform.Further various use cases highlighting the need of‘Big Data Retail Analytics’are discussed for emerging trends to promote sales and revenues,to keep check on competitor’s websites,comparison of various brands,enticing new customers. 展开更多
关键词 Big Data Big Data retail analytics hadoop and mapreduce Apriori algorithm association rule mining market basket analysis
原文传递
大数据技术在公路治超管理平台中的应用探究
10
作者 贺丽 张哲 黄林竹 《电脑知识与技术》 2022年第19期20-21,共2页
目前我国公路货运行业迅猛发展,车辆超限超载现象日益增多,传统的公路治超管理模式已经不能满足现有治超形势的需要。随着先进的数据信息技术在公路治超管理中得到应用,可以通过大数据赋能来提高非现场治超执法的精准度和效率。因此,文... 目前我国公路货运行业迅猛发展,车辆超限超载现象日益增多,传统的公路治超管理模式已经不能满足现有治超形势的需要。随着先进的数据信息技术在公路治超管理中得到应用,可以通过大数据赋能来提高非现场治超执法的精准度和效率。因此,文章在分析了公路治超的必要性及技术现状基础上,探究了基于大数据的公路超限超载管理平台架构,提出了要充分利用大数据技术等科技手段,推动公路治超技术的发展。 展开更多
关键词 大数据 公路治超 hadoopmapreduce 超限超载
下载PDF
Accelerating Iterative Big Data Computing Through MPI 被引量:5
11
作者 梁帆 鲁小亿 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第2期283-294,共12页
Current popular systems, Hadoop and Spark, cannot achieve satisfied performance because of the inefficient overlapping of computation and communication when running iterative big data applications. The pipeline of com... Current popular systems, Hadoop and Spark, cannot achieve satisfied performance because of the inefficient overlapping of computation and communication when running iterative big data applications. The pipeline of computing, data movement, and data management plays a key role for current distributed data computing systems. In this paper, we first analyze the overhead of shuffle operation in Hadoop and Spark when running PageRank workload, and then propose an event-driven pipeline and in-memory shuffle design with better overlapping of computation and communication as DataMPI- Iteration, an MPI-based library, for iterative big data computing. Our performance evaluation shows DataMPI-Iteration can achieve 9X-21X speedup over Apache Hadoop, and 2X-3X speedup over Apache Spark for PageRank and K-means. 展开更多
关键词 iterative computation DataMPI SPARK hadoop mapreduce
原文传递
A general-purpose framework for parallel processing of large-scale LiDAR data
12
作者 Zhenlong Li Michael E.Hodgson Wenwen Li 《International Journal of Digital Earth》 SCIE EI 2018年第1期26-47,共22页
Light detection and ranging(LiDAR)data are essential for scientific discoveries such as Earth and ecological sciences,environmental applications,and responding to natural disasters.While collecting LiDAR data over lar... Light detection and ranging(LiDAR)data are essential for scientific discoveries such as Earth and ecological sciences,environmental applications,and responding to natural disasters.While collecting LiDAR data over large areas is quite possible the subsequent processing steps typically involve large computational demands.Efficiently storing,managing,and processing LiDAR data are the prerequisite steps for enabling these LiDAR-based applications.However,handling LiDAR data poses grand geoprocessing challenges due to data and computational intensity.To tackle such challenges,we developed a general-purpose scalable framework coupled with a sophisticated data decomposition and parallelization strategy to efficiently handle‘big’LiDAR data collections.The contributions of this research were(1)a tile-based spatial index to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system,(2)two spatial decomposition techniques to enable efficient parallelization of different types of LiDAR processing tasks,and(3)by coupling existing LiDAR processing tools with Hadoop,a variety of LiDAR data processing tasks can be conducted in parallel in a highly scalable distributed computing environment using an online geoprocessing application.A proof-of-concept prototype is presented here to demonstrate the feasibility,performance,and scalability of the proposed framework. 展开更多
关键词 Big data online geoprocessing hadoop mapreduce spatial decomposition LAStools PARALLEL
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部