An Improved Memory Cache Management Study Based on Spark 被引量：2

下载PDF

导出

摘要 Spark is a fast unified analysis engine for big data and machine learning,in which the memory is a crucial resource.Resilient Distribution Datasets(RDDs)are parallel data structures that allow users explicitly persist intermediate results in memory or on disk,and each one can be divided into several partitions.During task execution,Spark automatically monitors cache usage on each node.And when there is a RDD that needs to be stored in the cache where the space is insufficient,the system would drop out old data partitions in a least recently used(LRU)fashion to release more space.However,there is no mechanism specifically for caching RDD in Spark,and the dependency of RDDs and the need for future stages are not been taken into consideration with LRU.In this paper,we propose the optimization approach for RDDs cache and LRU based on the features of partitions,which includes three parts:the prediction mechanism for persistence,the weight model by using the entropy method,and the update mechanism of weight and memory based on RDDs partition feature.Finally,through the verification on the spark platform,the experiment results show that our strategy can effectively reduce the time in performing and improve the memory usage.

作者 Suzhen Wang Yanpiao Zhang Lu Zhang Ning Cao Chaoyi Pang

机构地区 Hebei University of Economics and Business University College Dublin The Australian e-Health Research Centre

出处《Computers, Materials & Continua》 SCIE EI 2018年第9期415-431,共17页 计算机、材料和连续体（英文）

关键词 Resilient DISTRIBUTION datasets UPDATE mechanism WEIGHT MODE

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献3

1卞琛,于炯,英昌甜,修位蓉.并行计算框架Spark的自适应缓存管理策略[J].电子学报,2017,45(2):278-284. 被引量：19
2孟红涛,余松平,刘芳,肖侬.Spark内存管理及缓存策略研究[J].计算机科学,2017,44(6):31-35. 被引量：13
3左利云,曹志波,董守斌.云计算虚拟资源的熵优化和动态加权评估模型[J].软件学报,2013,24(8):1937-1946. 被引量：24

二级参考文献22

1losup A, Jan M, Sonmez O, Epema DHJ. On the dynamic resource availability in grids. In: Proc. of the 8th IEEE/ACM Int'l Conf. on Grid Computing (Grid 2007). Texas: 1EEE Computer Society, 2007.26-33. [doi: 10.1109]GRID.2009.4354112].
2Khalili O, He J, Olsehanowsky C, Snavely A, Casanova H. Measuring the performance and reliability of production computational grids. In: Proc. of the 7th IEEE/ACM lnt'l Conf. on Grid Computing (Grid 2006). Barcelona: IEEE Computer Society, 2006. 293-300. [doi: 10.1109/ICGRID.2006.311028].
3Xu M, Cui LZ, Wang HY, Bi YB. A multiple QoS constrained scheduling strategy of multiple workflows for cloud computing. In: Proc. of the 2009 IEEE lnt'l Symp. on Parallel and Distributed Processing with Applications. 2009. 629-634. [doi: 10.1109/ISPA. 2009.95].
4Chen K, Zheng WM. Cloud computing: System instances and current research. Ruan Jian Xue Bao/Journal of Software, 2009,20(5) 1337-1345 (in Chinese with English abstract), http://www.jos.org.cn/1000-9825/3493.html [doi: 10.3724/SP.J.1001.2013.03493].
5Tian WH, Zhao Y. Cloud Computing: Resource Scheduling Management. Beijing: National Defence Industry Publishing House, 2011 (in Chinese).
6Figueiredo R. Adaptive predictor integration for system performance prediction. In: Proc. of the IEEE Int'l Parallel and Distributed Processing Symp. IEEE Press, 2007. [doi: 10.1109/IPDPS.2007.370277].
7Diaz I, Fernandez G, Martinm M. Integrating the common information model with MDS4. In: Proc. of the 9th IEEE/ACM lnt'l Conf. on Grid Computing. 2008. [doi: 10.1109/GRID.2008.4662812].
8losup A, Sonmez O, Epema D. The characteristics and performance of groups of jobs in grids. Lecture Notes in Computer Science, 2007,46(41):382-393. [doi: 10.1007/978-3-540-74466-5_42].
9Bucur AID, Epema DHJ. Scheduling policies for processor collocation in multicluster system. IEEE Trans. on Parallel and Distributed Systems, 2007,18(7):958-962. [doi: 10.1109/TPDS.2007.1036].
10Fu S, Xu CZ. Exploring event correlation for failure prediction in coalitions of clusters. In: Proc. of the 2007 ACM/IEEE Conf. on Super Computing (SC 2007). Nevada: IEEE Computer Society, 2007.41-52. [doi: 10.1145/1362622.1362678].

共引文献50

1梁梅.云计算在多媒体教学系统中的应用[J].电子世界,2014(4):13-13. 被引量：2
2于炯,蒲勇霖,鲁亮,刘粟.分布式处理平台节能计算研究综述[J].新疆大学学报（自然科学版）,2018,35(4):389-401. 被引量：1
3王晓萍,孟坤.基于约束处理和平滑技术的改进的进化算法[J].计算机与现代化,2014(9):1-5.
4赵明.一种联邦云系统中的资源分配算法[J].科技和产业,2015,15(1):134-138.
5王金海,黄传河,王晶,何凯,史姣丽,陈希.异构云计算体系结构及其多资源联合公平分配策略[J].计算机研究与发展,2015,52(6):1288-1302. 被引量：28
6李雪竹,陈国龙.云计算虚拟化平台的内存资源全局优化研究[J].计算机工程,2015,41(7):55-59. 被引量：9
7汤龙.基于云计算的计算机与软件实验资源应用研究探讨[J].数字技术与应用,2015,33(8):90-90. 被引量：3
8向华伟,吕垚,张雪坚.基于系统架构的云计算虚拟资源动态调配研究[J].电子技术与软件工程,2016(2):41-42. 被引量：1
9孙兰芳,张曦煌.基于蜜蜂采蜜机理的云计算负载均衡策略[J].计算机应用研究,2016,33(4):1179-1182. 被引量：11
10贾炅昊,陈宁江,李湘,黄汝维.基于可用能力建模的云虚拟机动态调整策略[J].广西大学学报（自然科学版）,2016,41(3):796-803. 被引量：1

同被引文献6

1Guang Su,Fenghua Li,Wangdong Jiang.Brief Talk About Big Data Graph Analysis and Visualization[J].Journal on Big Data,2019,1(1):25-38. 被引量：3
2Xuewen Zhang,Zhonghao Li,Gongshen Liu,Jiajun Xu,Tiankai Xie,Jan Pan Nees.A Spark Scheduling Strategy for Heterogeneous Cluster[J].Computers, Materials & Continua,2018(6):405-417. 被引量：1
3Suzhen Wang,Shanshan Geng,Zhanfeng Zhang,Anshan Ye,Keming Chen,Zhaosheng Xu,Huimin Luo,Gangshan Wu,Lina Xu,Ning Cao.A Dynamic Memory Allocation Optimization Mechanism Based on Spark[J].Computers, Materials & Continua,2019(8):739-757. 被引量：2
4Tengfei Yang,Xiaojun Shi,Yangyang Li,Binbin Huang,Haiyong Xie,Yanting Shen.Workload Allocation Based on User Mobility in Mobile Edge Computing[J].Journal on Big Data,2020,2(3):105-115. 被引量：1
5Yuxin Xu,Zilong Jin,Xiaorui Zhang,Lejun Zhang.An Optimization Scheme for Task Offloading and Resource Allocation in Vehicle Edge Networks[J].Journal on Internet of Things,2020,2(4):163-173. 被引量：1
6Meiju Yu,Ru Li,Yuwen Chen.A Cache Replacement Policy Based on Multi-Factors for Named Data Networking[J].Computers, Materials & Continua,2020(10):321-336. 被引量：1

引证文献2

1Yao Zhao,Jian Dong,Hongwei Liu,Jin Wu,Yanxin Liu.mproving Cache Management with Redundant RDDs Eviction in Spark[J].Computers, Materials & Continua,2021(7):727-741. 被引量：1
2Yi Liang,Shaokang Zeng,Xiaoxian Xu,Shilu Chang,Xing Su.SMConf: One-Size-Fit-Bunch, Automated Memory Capacity Configuration for In-Memory Data Analytic Platform[J].Computers, Materials & Continua,2021(2):1697-1717.

二级引证文献1

1李玉,崔书琳,赵泉华.基于优化RDD分区的Spark并行K-means大尺度遥感图像分割[J].控制与决策,2024,39(5):1612-1619. 被引量：2

1Hsuehkuan Lu,Yixin Cao,Hou Lei,Juanzi Li.Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity[J].国际计算机前沿大会会议论文集,2019(1):436-440.
2Zeel Maheshwari,Rama Ramakumar.Intelligent Control of SIRES Using Neural Networks and Fuzzy Logic[J].Journal of Power and Energy Engineering,2017,5(9):156-171.
3Suzhen Wang,Shanshan Geng,Zhanfeng Zhang,Anshan Ye,Keming Chen,Zhaosheng Xu,Huimin Luo,Gangshan Wu,Lina Xu,Ning Cao.A Dynamic Memory Allocation Optimization Mechanism Based on Spark[J].Computers, Materials & Continua,2019(8):739-757. 被引量：2
4Lorenzo Bussoli,Christian J. Michel,Giuseppe Pirillo.On Conjugation Partitions of Sets of Trinucleotides[J].Applied Mathematics,2012,3(1):107-112.
5吴志军,王灵玉.某型机驾驶员夜视系统问题探究[J].科技经济导刊,2019,0(27):70-70.
6V. Rossi Albertini,D. Bailo,A. Generosi,B. Paci.A Hybrid Angular/Energy Dispersive Method to Improve Some Characteristics of Laboratory X-Ray Diffraction[J].Modern Instrumentation,2012,1(1):1-7.
7Too C. Janet,Wanyoko K. John,Kinyanjui Thomas,Moseti O. Kelvin,Wachira N. Francis.Effect of Seasons on Theanine Levels in Different Kenyan Commercially Released Tea Cultivars and Its Variation in Different Parts of the Tea Shoot[J].Food and Nutrition Sciences,2015,6(15):1450-1459. 被引量：1
8Wenzhong Yan,Lei Bai.Algorithms for Chromosome Classification[J].Engineering（科研）,2013,5(10):400-403.
9Wen Qing.Overlooked Icons:Chinese culture needs better promotion for its fashion industry’s rise[J].Beijing Review,2019,62(50):42-43.
10阿楠.DupScout处理重复文件的“利器”[J].电脑爱好者,2019,0(24):28-28.

Computers, Materials & Continua

2018年第9期

浏览历史

内容加载中请稍等...