期刊文献+

基于回归树与K-最近邻交互模型的存储设备性能预测 被引量:1

An interactive model based on regression tree and K-nearest neighbor for storage device performance prediction
下载PDF
导出
摘要 存储设备性能预测在存储系统的自动化管理以及规划任务中发挥重要的作用.传统的方法是利用分析模型、仿真模型来预测存储设备性能,但这类方法需要大量领域专家知识,也无法适应越来越高端、复杂的存储系统;利用机器学习的方法构建存储设备的预测模型不需要了解存储设备的内部结构和调度算法,但缺陷是预测精度不够高.本文提出一种基于回归树与K-最近邻这两种具备潜在优劣互补特性的交互模型来预测存储设备性能,以获取更高的预测精度.通过实验表明,该混合模型较单一模型(回归树或KNN)有更好的稳定性和预测精度.此外,在工作负载特征化的设计上,考虑到一个非常重要的特征———缓存效应,该特征能够显著提高模型的预测精度. Storage device performance prediction is a significant element of self-managed storage systems and application planning tasks, such as data assignment. The traditional methods for storage device performance prediction, such as accurate simulations and analytic models, needs sufficient expertise about storages. As the storage devices are becoming more and more high-end and complex, the accurate simulations and analytic models are not available. Compared with traditional methods, the machine learning methods consider the storage devices as black boxes, and needs no information about the internal components or algorithms of those storage devices. So machine learning methods are more appropriate for the trend of current storage devices development. Classification and regression tree(CART) method for modelling storage devices is simple. This work explores an interactive model based on regression tree and K-nearest neighbor algorithm to improve the machine learning method. Experiments show that our proposed model has a higher prediction precise and a better stability than regression tree or KNN. In our experiments, we found out that the caching effect is very important. We improved the method of workload characterization considering caching effect, which makes a substantial difference on prediction accuracy.
出处 《南京大学学报(自然科学版)》 CSCD 北大核心 2012年第2期123-132,共10页 Journal of Nanjing University(Natural Science)
基金 中央高校基本科研基金
关键词 回归模型 回归树 K-最近邻 特征权重 存储设备性能预测 regression, regression tree, K nearest neighbors, feature weighting, storage device performance prediction
  • 相关文献

参考文献26

  • 1Gregory R G. Generating representative syn- thetic workloads:An unsolved problem[A].1995.1263-1269.
  • 2John W. Data services--from data to contain- ers[A].2003.
  • 3Allen N. Don't waste your storage dollars; What you need to know[Research Note COM-- 13--1217][R].Gartner Group,Stamford,2001.
  • 4Gartner Group. Total cost of storage owner- ship-A user-oriented approach[J].Research Note Gartner Group,2000.
  • 5Gray J. A conversation with Jim Gray[J].ACM Queue:Tomorrow's Computing Today,2003,(04):8-17.
  • 6Lamb E. Hardware spending sputters[J].Red Herring,2001.32-33.
  • 7Edward K L,Randy H K. An analytic perform- ance model in Minerva[Technical Report HPL -- ,2001-- 118,][R].HP Laboratories,2001.
  • 8Elizabeth S,Arif M,John W. An analytical be- havior model for disk drives with read ahead caches and request reordering[A].1998.182-191.
  • 9Mustafa U,Guillermo A A,Arif M. A modu- lar,analytical throughput model for modern disk arrays[A].2001.183-,192.
  • 10John B,Greg G. The DiskSim Simulation Envi- ronment Version 3. 0 Reference Manual. Tech- nical Report CMU -- CS- 03 -- 102[D].Carnegie Mellon University,2003.

同被引文献6

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部