摘要
基于视频数据的分布式计算与基于文本类型数据的分布式计算存在很大的差异。视频数据本身是非结构化的,并且对于同样大小的视频,若其内容不同会导致任务执行消耗的时间也不同。对于简单的结构化数据,HDFS默认的负载均衡器能够解决负载均衡的问题。但是视频文件存在热点访问以及复杂度不一致的问题。使用HDFS默认的数据分布机制不能很好地解决计算负载均衡问题。因此提出了一种基于HDFS的海量视频数据重分布算法。首先对视频文件的访问次数以及历史视频分析对视频文件的访问时间进行记录;然后对数据进行量化之后将其加权作为该视频文件的负载度;最后使用文件置换手段将负载高的视频与低的视频进行置换,直到每个节点的负载达到均衡为止。实验结果表明,使用提出的数据重分布算法可以减少海量视频数据的处理时间。
There is a big difference between the distributed computing based on the video data and the distributed computing based on the text type data.The video data are unstructured,and the same size of the video that has different content will lead to different execution time.For simple structured data,the default load equalizer of HDFS can solve the problem of load balancing.But the video file has the problem of different access times and complexity inconsistency.Using the default data distribution mechanism of HDFS are not well solve the load balancing problem.In this paper,a new algorithm for massive video data redistribution based on HDFS was proposed.Firstly,the access times and the history analysis time of the video file are recorded.Secondly,the data are quantified and weighted as the load of the video file.Lastly,the means of file replacement are used to exchange high load video and low load video,until each node achieves load balancing.Experimental results show that using the data redistribution algorithm proposed in this paper can reduce the processing time of massive video data.
出处
《计算机科学》
CSCD
北大核心
2016年第S1期480-484,共5页
Computer Science