期刊文献+

An Improved Memory Cache Management Study Based on Spark 被引量:2

下载PDF
导出
摘要 Spark is a fast unified analysis engine for big data and machine learning,in which the memory is a crucial resource.Resilient Distribution Datasets(RDDs)are parallel data structures that allow users explicitly persist intermediate results in memory or on disk,and each one can be divided into several partitions.During task execution,Spark automatically monitors cache usage on each node.And when there is a RDD that needs to be stored in the cache where the space is insufficient,the system would drop out old data partitions in a least recently used(LRU)fashion to release more space.However,there is no mechanism specifically for caching RDD in Spark,and the dependency of RDDs and the need for future stages are not been taken into consideration with LRU.In this paper,we propose the optimization approach for RDDs cache and LRU based on the features of partitions,which includes three parts:the prediction mechanism for persistence,the weight model by using the entropy method,and the update mechanism of weight and memory based on RDDs partition feature.Finally,through the verification on the spark platform,the experiment results show that our strategy can effectively reduce the time in performing and improve the memory usage.
出处 《Computers, Materials & Continua》 SCIE EI 2018年第9期415-431,共17页 计算机、材料和连续体(英文)
  • 相关文献

参考文献3

二级参考文献22

  • 1losup A, Jan M, Sonmez O, Epema DHJ. On the dynamic resource availability in grids. In: Proc. of the 8th IEEE/ACM Int'l Conf. on Grid Computing (Grid 2007). Texas: 1EEE Computer Society, 2007.26-33. [doi: 10.1109]GRID.2009.4354112].
  • 2Khalili O, He J, Olsehanowsky C, Snavely A, Casanova H. Measuring the performance and reliability of production computational grids. In: Proc. of the 7th IEEE/ACM lnt'l Conf. on Grid Computing (Grid 2006). Barcelona: IEEE Computer Society, 2006. 293-300. [doi: 10.1109/ICGRID.2006.311028].
  • 3Xu M, Cui LZ, Wang HY, Bi YB. A multiple QoS constrained scheduling strategy of multiple workflows for cloud computing. In: Proc. of the 2009 IEEE lnt'l Symp. on Parallel and Distributed Processing with Applications. 2009. 629-634. [doi: 10.1109/ISPA. 2009.95].
  • 4Chen K, Zheng WM. Cloud computing: System instances and current research. Ruan Jian Xue Bao/Journal of Software, 2009,20(5) 1337-1345 (in Chinese with English abstract), http://www.jos.org.cn/1000-9825/3493.html [doi: 10.3724/SP.J.1001.2013.03493].
  • 5Tian WH, Zhao Y. Cloud Computing: Resource Scheduling Management. Beijing: National Defence Industry Publishing House, 2011 (in Chinese).
  • 6Figueiredo R. Adaptive predictor integration for system performance prediction. In: Proc. of the IEEE Int'l Parallel and Distributed Processing Symp. IEEE Press, 2007. [doi: 10.1109/IPDPS.2007.370277].
  • 7Diaz I, Fernandez G, Martinm M. Integrating the common information model with MDS4. In: Proc. of the 9th IEEE/ACM lnt'l Conf. on Grid Computing. 2008. [doi: 10.1109/GRID.2008.4662812].
  • 8losup A, Sonmez O, Epema D. The characteristics and performance of groups of jobs in grids. Lecture Notes in Computer Science, 2007,46(41):382-393. [doi: 10.1007/978-3-540-74466-5_42].
  • 9Bucur AID, Epema DHJ. Scheduling policies for processor collocation in multicluster system. IEEE Trans. on Parallel and Distributed Systems, 2007,18(7):958-962. [doi: 10.1109/TPDS.2007.1036].
  • 10Fu S, Xu CZ. Exploring event correlation for failure prediction in coalitions of clusters. In: Proc. of the 2007 ACM/IEEE Conf. on Super Computing (SC 2007). Nevada: IEEE Computer Society, 2007.41-52. [doi: 10.1145/1362622.1362678].

共引文献50

同被引文献6

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部