摘要
随着大数据技术的不断发展,数据分析越来越受到人们的关注,Spark作为大规模数据处理的快速通用的计算引擎,由于它的高速性而被各大商家应用于实际生产过程中。本文通过隐马尔科夫模型(HMM),选择在实际生产过程中,在进行海量的数据分析过程中出现的异常进行分析,以实际任务执行时的:内存溢出、垃圾回收异常、序列化异常为指标,根据实际出现异常时的提示,来确定HMM状态空间、确定相应的观测值、计算相关的参数,进而构建针对于Spark作业工作过程中的出现异常时的隐马尔科夫模型,用来揭示引发异常的类型,来对实际生产过程中出现此类问题时提供可靠的类型诊断。
With the continuous development of big data technology, data analysis has attracted more and more attention. Spark, a fast and universal computing engine for large-scale data processing, has been used by major merchants in the actual production process due to its high speed. This paper uses hidden Hidden Markov Model(HMM) to select the analysis of abnormalities that occur in the process of mass data analysis in the actual production process. When actual tasks are executed, memory overflow, garbage collection anomalies, and serialization anomalies are Indicators, according to the actual occurrence of abnormal prompts, to determine the HMM state space, determine the corresponding observations, calculate the relevant parameters, and then build a Hidden Markov model for exceptions in the Spark job process, to reveal The type of exception that is thrown to provide a reliable type diagnosis when such problems occur in the actual production process.
出处
《电脑知识与技术》
2018年第4Z期198-200,共3页
Computer Knowledge and Technology