基于MapReduce的高铁振动数据预处理被引量：5

MapReduce based preprocessing on vibration data of high speed rail

下载PDF

导出

摘要从高铁数据中分析得到高速列车的运行状况对保障高铁安全至关重要.列车的振动数据就是其中之一,这些振动数据是通过多个传感器以一定的采样频率得来的.一个1～2d的测试实验将得到GB以上的数据,因此振动数据分析之前的预处理过程必不可少,包括异常点处理、消除线性趋势项等.异常点处理是指先用通用规则发现异常点,并用其邻近的数据点来恢复它的值.线性趋势项是指测试设备的原因使得采集的数据有一个线性的偏移,不处理偏移,则误差将会进一步累积.传统的振动数据预处理方法是顺序逐个处理文件,处理时间长,不能满足要求,且受内存的限制不能处理大文件.本文旨在提高振动数据的预处理效率,在研究现有高铁振动数据预处理方法和MapReduce机制的基础上,实现了包括异常点处理,线性趋势项消除方法的并行化,并在Hadoop平台上实现.同时设计了实验来验证方法的有效性和并行结果的一致性.实验在含6个节点(1个Master,5个Slaves)的集群上进行,实验结果表明所提出的方法可以处理大数据文件而且提高了处理效率.而且三个并行计算性能评价指标Speedup,Scaleup,Sizeup的实验结果也显示出本方法的优越性. Analyzing the high speed rail data and obtaining its operational states are vital to guarantee the safety of rail transportation. Vibration data is one kind of them. Vibration data is obtained by sampling with multiple sensors in a fixed frequency like 2500Hz. The volume of vibration data wilI be Gigabytes if a testing experiment lasts 1 or 2 days. Before data analysis, the vibration data preprocessing is dispensable. It includes erasing outliers and linear trend removal, etc. Erasing outliers means that we firstly decide and locate the outliers in the data file using common rules, and then we replace the outliers by using its 4 neighbor data values. Linear trend removal means we need to remove the offset since there is a linear offset in the raw data due to the test equipment. Traditional methods for processing vibration data become inefficient since they process the data files one by one serially. The processing time is long and insufferable. Moreover they cannot deal with big size files due to the limitation of memory. Then theyare forced to randomly sample the raw data and only analyze the small part data. Clearly it may lose some important information in vibration data. This paper aims to improve the efficiency of preprocessing vibration data. Cloud Computing has received much attention with idea of sharing computing capabilities and cooperatively working. Based on the analysis of the preprocessing methods of vibration data and the MapReduce architecture in cloud computing, the parallel methods of the preprocessing vibration data including erasing outliers and linear trend removal are accomplished. These methods are implemented on Hadoop platform. Experiments are designed to verify the effectiveness and the parallel consistency. We conduct performance experiments on Hadoop clusters with 6 six nodes （1 Master and 5 Slaves）. The results show that the proposed methods can deal with the big-size file and improve the processing efficiency. Moreover, the experimental results on three parallel performance indexes, Speedup, Scaleup and Sizeup, demonstrate the advantage of our methods.

作者赵成兵李天瑞王仲刚高子喆

机构地区西南交通大学信息科学与技术学院

出处《南京大学学报（自然科学版）》 CAS CSCD 北大核心 2012年第4期390-396,共7页 Journal of Nanjing University（Natural Science）

基金 "十一五"国家科技支撑计划(2009BAG12A01-E08-1)

关键词并行化 MAPREDUCE 高铁振动预处理 parallel, mapreduee, high speed rail, vibration, preprocessing

分类号 TP274.2 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献15

1Jiawei H, Kamber M. Data mining- Concepts and techniques. The 2^nd Edition. San Francisco: Morgan Kaufman, 2006, 5-12.
2高阳.中国数据挖掘研究进展[J].南京大学学报（自然科学版）,2011,47(4):351-353. 被引量：27
3陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348. 被引量：1310
4Takabi H, Joshi J B D, Ahn G J. Security and privacy challenges in cloud computing environ- ments. IEEE Security and Privacy, 2010, 8 (6) : 24-31.
5Hofmann P, Woods D. Cloud computing: The limits of public clouds for business applications. IEEE Internet Computing, 2010, 14. ( 6 ) : 90-93.
6Dean J, Ghemawat S. MapReduee: Simplified clara processing on large clusters. Communica- tions of the ACM, 2008, 51(1) : 107-113.
7Dean J, Ghemawat S. MapReduce: A flexible data processing tool. Communications of the ACM, 2010, 53(1): 72-77.
8Laemmel R. Google's MapReduce program ming model-revisited. Science of Computer Pro- gramming, 2010, 70(1):1-30.
9吴斌,马超.一种旅行数据约束关联规则挖掘算法[J].计算机工程与应用,2010,46(20):129-132. 被引量：6
10刘真,刘峰,张宝鹏,马飞,高石玉.云计算模型在铁路大规模数据处理中的应用[J].北京交通大学学报,2010,34(5):14-19. 被引量：22

二级参考文献52

1Sims K. IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing. 2007. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
2Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007. http://download.boulder.ibm.com/ ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf
3Zhang YX, Zhou YZ. 4VP+: A novel meta OS approach for streaming programs in ubiquitous computing. In: Proc. of IEEE the 21st Int'l Conf. on Advanced Information Networking and Applications (AINA 2007). Los Alamitos: IEEE Computer Society, 2007. 394-403.
4Zhang YX, Zhou YZ. Transparent Computing: A new paradigm for pervasive computing. In: Ma JH, Jin H, Yang LT, Tsai JJP, eds. Proc. of the 3rd Int'l Conf. on Ubiquitous Intelligence and Computing (UIC 2006). Berlin, Heidelberg: Springer-Verlag, 2006. 1-11.
5Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28.
6Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 1998,30(1-7): 107-117.
7Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003.29-43.
8Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the 6th Symp. on Operating System Design and Implementation. Berkeley: USENIX Association, 2004. 137-150.
9Burrows M. The chubby lock service for loosely-coupled distributed systems. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 335-350.
10Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributed storage system for structured data. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 205-218.

共引文献1359

1查伟,孙燕琼,郑继平.基于云测试架构的FIVP解决方案[J].铁路技术创新,2021(S01):82-86.
2林少伟.人工智能法律主体资格实现路径:以商事主体为视角[J].中国政法大学学报,2021(3):165-177. 被引量：5
3胡祖林,肇杰.云计算下的网盘安全[J].计算机产品与流通,2020,0(1):164-164.
4张盛,任伟,王玉,黄金明,陈旭彤.基于Web的重力异常正演建模工具[J].地质论评,2023,69(S01):595-597.
5赵文韬.基于5G技术的黑龙江云计算产业发展[J].电子技术（上海）,2020,49(9):186-187.
6Longfei He,Mei Xue,Bin Gu.Internet-of-things enabled supply chain planning and coordination with big data services:Certain theoretic implications[J].Journal of Management Science and Engineering,2020,5(1):1-22. 被引量：5
7吴劲松,陈孚.云计算发展及应用研究[J].广西通信技术,2011(2):9-13. 被引量：5
8黄纬,温志萍,程初.云计算中基于K-均值聚类的虚拟机调度算法研究[J].南京理工大学学报,2013,37(6):807-812. 被引量：17
9孙凌宇,欧阳春娟,冷明,刘昌鑫,夏洁武.云计算与高等教育管理信息服务系统构建[J].山西财经大学学报,2012,34(S1). 被引量：9
10王荣荣.云计算技术基础上数字图书馆云服务平台的实现[J].河北北方学院学报（社会科学版）,2013,29(4):72-74. 被引量：2

同被引文献53

1蒋淑霞,傅勤毅,文振华.小波变换在轨道静态功率谱密度获取中的应用[J].交通运输工程学报,2004,4(2):33-35. 被引量：7
2候卫星,刘刚,康熊.0号高速综合检测列车[M].北京:中国铁道出版社,2010.
3OWENS J D, HOUSTON M, LUEBKE D, et al. GPU computing [J]. Proceedings of the IEEE, 2008, 96(5) : 879 -899.
4NICKOLLS J, BUCK I, GARLAND M, et al. Scalable parallel pro- gramming with CUDA[ J]. Queue, 2008, 6(2) : 40 - 53.
5OGAWA K, ITO Y, NAKANO K. Efficient Canny edge detection using a GPU[ C]// Proceedings of the First IEEE International Conference on Networking and Computing. Piscataway: IEEE Press, 2010:279-280.
6IWAI K, NISHIKAWA N, KUROKAWA T. Acceleration of AES encryption on CUDA GPU[ J]. International Journal of Networking and Computing, 2012, 2(1) : 131 - 145.
7SHEN Z, CHEN X, ZHANG X, et al. A novel intelligent gear fault diagnosis model based on EMD and multi-class TSVM[ J]. Measurement, 2012, 45(1): 30-40.
8de LUCA A, TERMINI S. A definition of a nonprobabilistic entro- py in the setting of fuzzy sets theory[ J]. Information and Control, 1972, 20(4): 301-312.
9ZHANG M, ZHOU Z. ML-KNN: a lazy learning approach to multi-label learning[ J]. Pattern Recognition, 2007, 40(7) : 2038 - 2048.
10Frank P M.Fault diagnosis in dynamic systems using analyticaland knowledge-based redundancy:a survey and some newresults[J].Automatica,1990,26(3):459-474.

引证文献5

1李再帏,雷晓燕,高亮.轨道不平顺检测数据的预处理方法分析[J].铁道科学与工程学报,2014,11(3):43-47. 被引量：20
2李贵兵,金炜东,蒋鹏,付小利,熊定鸿,谷鹏举.面向大规模监测数据的高铁故障诊断技术研究[J].系统仿真学报,2014,26(10):2458-2464. 被引量：10
3陈志,李天瑞,李明,杨燕.基于计算统一设备架构的高铁故障诊断方法[J].计算机应用,2015,35(10):2819-2823. 被引量：3
4李明,李天瑞,陈志,杨燕.基于Spark计算框架的高铁振动数据经验模态分解[J].计算机工程与应用,2016,52(20):103-107. 被引量：4
5汪鑫,高天赐,方嘉晟,王平.基于时间历程的高速铁路轨道不平顺异常值处理算法[J].铁道科学与工程学报,2018,15(12):3029-3036. 被引量：2

二级引证文献38

1赵国堂,刘秀波,高亮,蔡小培.哈大高速铁路路基冻胀区轨道不平顺特征分析[J].铁道学报,2016,38(7):105-109. 被引量：26
2李利.面向大规模监测数据的高铁故障诊断技术研究[J].城市建筑,2016,0(36):358-358.
3王刘旺,朱永利,贾亚飞.一种多源海量局部放电信号脉冲的并行提取方法[J].系统仿真学报,2017,29(1):57-66. 被引量：1
4梁胤程,袁媛,杨峰.基于Hadoop的探地雷达数据并行处理方法研究[J].系统仿真学报,2017,29(1):120-128. 被引量：3
5朱永利,王刘旺.并行EEMD算法及其在局部放电信号特征提取中的应用[J].电工技术学报,2018,33(11):2508-2519. 被引量：14
6宋晓丽,蔡涛,王振一.基于大数据的高速铁路调度指挥系统平台研究[J].铁道运输与经济,2018,40(7):58-62. 被引量：16
7汪鑫,王源,王平,王沂峰.高速铁路动检车检测数据里程误差评估与修正[J].铁道标准设计,2018,62(7):46-51. 被引量：10
8李宏升,顾才东.云计算大规模服务器故障诊断平台的设计[J].现代电子技术,2016,39(18):52-56. 被引量：3
9汪鑫,高天赐,方嘉晟,王平.基于时间历程的高速铁路轨道不平顺异常值处理算法[J].铁道科学与工程学报,2018,15(12):3029-3036. 被引量：2
10从建力,王源,杨翠平,王平,李成辉.智能手机检测车辆振动加速度数据预处理方法[J].数据采集与处理,2019,34(2):349-357. 被引量：8

1别文群,杜德慧,曹虹华.基于MDA的嵌入式软件开发平台设计[J].微计算机信息,2006(04Z):11-13. 被引量：2
2红客学堂[J].网友世界,2005(24):44-44.
3刘德才,王鼎兴,沈美明,郑纬民.SPEEDUP指标的适用性分析[J].计算机研究与发展,1995,32(5):52-56.
4陈捷,徐亦方,沈复,陈志奎,王丙申.面向对象方法在SPEEDUP中的应用[J].炼油设计,1997,27(5):51-55.
5陈严纯,梁立.大数据导入数据库的方法与实现[J].电脑编程技巧与维护,2013(18):27-29. 被引量：1
6李文华,周传杰.大数据文件编辑器的设计与实现[J].电脑开发与应用,2008,21(11):61-62. 被引量：1
7邵天会.基于Web日志挖掘的路径补充算法改进[J].中国新通信,2015,17(22):28-29.
8沈华刚,常家东,韩建海.用FTP技术管理企业大数据文件[J].河南科技大学学报（自然科学版）,2009,30(2):38-40. 被引量：3
9贺敏,王蔚韬,何光辉.数据预处理在数据仓库体系结构中的应用[J].计算机科学,2005,32(5):98-100. 被引量：1
10熊安萍,刘进进,邹洋.基于对象存储的负载均衡存储策略[J].计算机工程与设计,2012,33(7):2678-2682. 被引量：6

南京大学学报（自然科学版）

2012年第4期

浏览历史

内容加载中请稍等...

基于MapReduce的高铁振动数据预处理被引量：5

参考文献15

二级参考文献52

共引文献1359

同被引文献53

引证文献5

二级引证文献38

相关作者

相关机构

相关主题

浏览历史

基于MapReduce的高铁振动数据预处理 被引量：5

参考文献15

二级参考文献52

共引文献1359

同被引文献53

引证文献5

二级引证文献38

相关作者

相关机构

相关主题

浏览历史

基于MapReduce的高铁振动数据预处理被引量：5