期刊文献+

基于MapReduce的高铁振动数据预处理 被引量:5

MapReduce based preprocessing on vibration data of high speed rail
下载PDF
导出
摘要 从高铁数据中分析得到高速列车的运行状况对保障高铁安全至关重要.列车的振动数据就是其中之一,这些振动数据是通过多个传感器以一定的采样频率得来的.一个1~2d的测试实验将得到GB以上的数据,因此振动数据分析之前的预处理过程必不可少,包括异常点处理、消除线性趋势项等.异常点处理是指先用通用规则发现异常点,并用其邻近的数据点来恢复它的值.线性趋势项是指测试设备的原因使得采集的数据有一个线性的偏移,不处理偏移,则误差将会进一步累积.传统的振动数据预处理方法是顺序逐个处理文件,处理时间长,不能满足要求,且受内存的限制不能处理大文件.本文旨在提高振动数据的预处理效率,在研究现有高铁振动数据预处理方法和MapReduce机制的基础上,实现了包括异常点处理,线性趋势项消除方法的并行化,并在Hadoop平台上实现.同时设计了实验来验证方法的有效性和并行结果的一致性.实验在含6个节点(1个Master,5个Slaves)的集群上进行,实验结果表明所提出的方法可以处理大数据文件而且提高了处理效率.而且三个并行计算性能评价指标Speedup,Scaleup,Sizeup的实验结果也显示出本方法的优越性. Analyzing the high speed rail data and obtaining its operational states are vital to guarantee the safety of rail transportation. Vibration data is one kind of them. Vibration data is obtained by sampling with multiple sensors in a fixed frequency like 2500Hz. The volume of vibration data wilI be Gigabytes if a testing experiment lasts 1 or 2 days. Before data analysis, the vibration data preprocessing is dispensable. It includes erasing outliers and linear trend removal, etc. Erasing outliers means that we firstly decide and locate the outliers in the data file using common rules, and then we replace the outliers by using its 4 neighbor data values. Linear trend removal means we need to remove the offset since there is a linear offset in the raw data due to the test equipment. Traditional methods for processing vibration data become inefficient since they process the data files one by one serially. The processing time is long and insufferable. Moreover they cannot deal with big size files due to the limitation of memory. Then theyare forced to randomly sample the raw data and only analyze the small part data. Clearly it may lose some important information in vibration data. This paper aims to improve the efficiency of preprocessing vibration data. Cloud Computing has received much attention with idea of sharing computing capabilities and cooperatively working. Based on the analysis of the preprocessing methods of vibration data and the MapReduce architecture in cloud computing, the parallel methods of the preprocessing vibration data including erasing outliers and linear trend removal are accomplished. These methods are implemented on Hadoop platform. Experiments are designed to verify the effectiveness and the parallel consistency. We conduct performance experiments on Hadoop clusters with 6 six nodes (1 Master and 5 Slaves). The results show that the proposed methods can deal with the big-size file and improve the processing efficiency. Moreover, the experimental results on three parallel performance indexes, Speedup, Scaleup and Sizeup, demonstrate the advantage of our methods.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2012年第4期390-396,共7页 Journal of Nanjing University(Natural Science)
基金 "十一五"国家科技支撑计划(2009BAG12A01-E08-1)
关键词 并行化 MAPREDUCE 高铁 振动 预处理 parallel, mapreduee, high speed rail, vibration, preprocessing
  • 相关文献

参考文献15

  • 1Jiawei H, Kamber M. Data mining- Concepts and techniques. The 2^nd Edition. San Francisco: Morgan Kaufman, 2006, 5-12.
  • 2高阳.中国数据挖掘研究进展[J].南京大学学报(自然科学版),2011,47(4):351-353. 被引量:27
  • 3陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348. 被引量:1310
  • 4Takabi H, Joshi J B D, Ahn G J. Security and privacy challenges in cloud computing environ- ments. IEEE Security and Privacy, 2010, 8 (6) : 24-31.
  • 5Hofmann P, Woods D. Cloud computing: The limits of public clouds for business applications. IEEE Internet Computing, 2010, 14. ( 6 ) : 90-93.
  • 6Dean J, Ghemawat S. MapReduee: Simplified clara processing on large clusters. Communica- tions of the ACM, 2008, 51(1) : 107-113.
  • 7Dean J, Ghemawat S. MapReduce: A flexible data processing tool. Communications of the ACM, 2010, 53(1): 72-77.
  • 8Laemmel R. Google's MapReduce program ming model-revisited. Science of Computer Pro- gramming, 2010, 70(1):1-30.
  • 9吴斌,马超.一种旅行数据约束关联规则挖掘算法[J].计算机工程与应用,2010,46(20):129-132. 被引量:6
  • 10刘真,刘峰,张宝鹏,马飞,高石玉.云计算模型在铁路大规模数据处理中的应用[J].北京交通大学学报,2010,34(5):14-19. 被引量:22

二级参考文献52

  • 1Sims K. IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing. 2007. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
  • 2Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007. http://download.boulder.ibm.com/ ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf
  • 3Zhang YX, Zhou YZ. 4VP+: A novel meta OS approach for streaming programs in ubiquitous computing. In: Proc. of IEEE the 21st Int'l Conf. on Advanced Information Networking and Applications (AINA 2007). Los Alamitos: IEEE Computer Society, 2007. 394-403.
  • 4Zhang YX, Zhou YZ. Transparent Computing: A new paradigm for pervasive computing. In: Ma JH, Jin H, Yang LT, Tsai JJP, eds. Proc. of the 3rd Int'l Conf. on Ubiquitous Intelligence and Computing (UIC 2006). Berlin, Heidelberg: Springer-Verlag, 2006. 1-11.
  • 5Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28.
  • 6Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 1998,30(1-7): 107-117.
  • 7Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003.29-43.
  • 8Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the 6th Symp. on Operating System Design and Implementation. Berkeley: USENIX Association, 2004. 137-150.
  • 9Burrows M. The chubby lock service for loosely-coupled distributed systems. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 335-350.
  • 10Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributed storage system for structured data. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 205-218.

共引文献1359

同被引文献53

  • 1蒋淑霞,傅勤毅,文振华.小波变换在轨道静态功率谱密度获取中的应用[J].交通运输工程学报,2004,4(2):33-35. 被引量:7
  • 2候卫星,刘刚,康熊.0号高速综合检测列车[M].北京:中国铁道出版社,2010.
  • 3OWENS J D, HOUSTON M, LUEBKE D, et al. GPU computing [J]. Proceedings of the IEEE, 2008, 96(5) : 879 -899.
  • 4NICKOLLS J, BUCK I, GARLAND M, et al. Scalable parallel pro- gramming with CUDA[ J]. Queue, 2008, 6(2) : 40 - 53.
  • 5OGAWA K, ITO Y, NAKANO K. Efficient Canny edge detection using a GPU[ C]// Proceedings of the First IEEE International Conference on Networking and Computing. Piscataway: IEEE Press, 2010:279-280.
  • 6IWAI K, NISHIKAWA N, KUROKAWA T. Acceleration of AES encryption on CUDA GPU[ J]. International Journal of Networking and Computing, 2012, 2(1) : 131 - 145.
  • 7SHEN Z, CHEN X, ZHANG X, et al. A novel intelligent gear fault diagnosis model based on EMD and multi-class TSVM[ J]. Measurement, 2012, 45(1): 30-40.
  • 8de LUCA A, TERMINI S. A definition of a nonprobabilistic entro- py in the setting of fuzzy sets theory[ J]. Information and Control, 1972, 20(4): 301-312.
  • 9ZHANG M, ZHOU Z. ML-KNN: a lazy learning approach to multi-label learning[ J]. Pattern Recognition, 2007, 40(7) : 2038 - 2048.
  • 10Frank P M.Fault diagnosis in dynamic systems using analyticaland knowledge-based redundancy:a survey and some newresults[J].Automatica,1990,26(3):459-474.

引证文献5

二级引证文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部