摘要
针对云计算平台的硬盘不可靠问题,提出基于带过采样的COG(COG-OS)框架,利用硬盘自我监测分析和报告技术(SMART)日志预测故障硬盘。首先采用DBScan或K-means聚类算法将无故障硬盘样本划分成多个不相交子集;再与故障硬盘样本结合,采用少量样本合成过采样技术(SMOTE)使整体样本集趋于平衡;最后采用LIBSVM分类算法预测故障硬盘。调整参数,将COG-OS与SMOTE+支持向量机(SVM)的预测性能相比较,实验结果表明该方法具有可行性。当采用K-means方法划分无故障盘样本,并采用径向基函数(RBF)内核的LIBSVM方法预测故障盘时,COG-OS改善了SMOTE+SVM对故障硬盘的预测查全率和整体性能。
The hard disk of cloud computing platform is not reliable. This paper proposed to use Self-Monitoring Analysis and Reporting Technology (SMART) log to predict hard disk failure based on Classification using lOcal clustering with Over- Sampling (COG-OS) framework. First, faultless hard disks were divided into multiple disjoint sample subsets by using DBScan or K-means clustering algorithm. And then these subsets and another sample set of faulty hard disks were mixed, and Synthetic Minority Over-sampling TEchnique (SMOTE) was used to make the overall sample set tend to balance. At last, faulty hard disks was predicted by using LIBSVM classification algorithm. The experimental results show that the method is feasible. COG-OS improves SMOTE + Support Vector Machine (SVM) on faulty hard disks' recall and overall performance, when using K-means method to divide samples of faultless hard disks and using LIBSVM method with Radial Basis Function (RBF) kernel to predict faulty hard disks.
出处
《计算机应用》
CSCD
北大核心
2014年第1期31-35,188,共6页
journal of Computer Applications
基金
国家863计划项目(2011AA01A202)
关键词
COG-OS框架
自我监测分析和报告技术
K-均值
少量样本合成过采样技术
LIBSVM
支持向量机
Classification using lOcal clusterinG with Over-Sampling (COG-OS) framework
Self-Monitoring Analysis and Reporting Technology (SMART)
K-means
Synthetic Minority Over-sampling TEchnique (SMOTE)
LIBSVM
Support Vector Machine (SVM)