摘要
随着云平台上运行任务的数量急剧增加,任务失败的概率也随之增加,数据的丢失是任务失败的主要原因。如果在任务运行前判断出是否可能发生丢失以及其丢失类型,那么就可以提前采取措施避免或减少损失。该模型基于谷歌在2019年发布的最新云集群数据,对任务的数据丢失问题进行了深入的研究,针对不同任务属性探究其与数据丢失的相关性,并选用了GMM(Gaussian Mixed Model)算法并将其改进来建立数据丢失预测模型。经过多种聚类算法的实验比较,改进后的GMM模型表现出极好的适应性和准确性,能够精准且迅速地在任务运行前判断其发生数据丢失的可能性以及判断其丢失类型。最后根据预测出的不同数据丢失类型,给出了一定的建议。
As the number of tasks running on the cloud platform increases dramatically,the probability of task failure also increases.The loss of data is the main reason for task failure.If the possibility of loss and its type is determined before the task is run,then measures can be taken in advance to avoid or reduce the loss.Based on the latest cloud cluster data released by Google in 2019,this model conducts in-depth research on the problem of task data loss,and explores its correlation with data loss for different task attributes.And the GMM(Gaussian Mixed Model)algorithm was selected and improved to establish a data loss prediction model.After the experimental comparison of various clustering algorithms,the improved GMM model shows excellent adaptability and accuracy,and can accurately and quickly judge the possibility of data loss and the type of loss before the task runs.Finally,according to the different types of data loss predicted,some suggestions are given.
作者
王晖
姜春茂
WANG Hui;JIANG Chunmao(College of Computer Science and Information Engineering,Harbin Normal University,Harbin Heilongjiang 150025,China;College of Computer Science and Mathematics,Fujian University of Technology,Fuzhou Fujian,350118,China)
出处
《长江信息通信》
2023年第3期28-34,共7页
Changjiang Information & Communications
基金
黑龙江省自然科学基金资助项目(LH2020F031)。