摘要
从高铁数据中分析得到高速列车的运行状况对保障高铁安全至关重要.列车的振动数据就是其中之一,这些振动数据是通过多个传感器以一定的采样频率得来的.一个1~2d的测试实验将得到GB以上的数据,因此振动数据分析之前的预处理过程必不可少,包括异常点处理、消除线性趋势项等.异常点处理是指先用通用规则发现异常点,并用其邻近的数据点来恢复它的值.线性趋势项是指测试设备的原因使得采集的数据有一个线性的偏移,不处理偏移,则误差将会进一步累积.传统的振动数据预处理方法是顺序逐个处理文件,处理时间长,不能满足要求,且受内存的限制不能处理大文件.本文旨在提高振动数据的预处理效率,在研究现有高铁振动数据预处理方法和MapReduce机制的基础上,实现了包括异常点处理,线性趋势项消除方法的并行化,并在Hadoop平台上实现.同时设计了实验来验证方法的有效性和并行结果的一致性.实验在含6个节点(1个Master,5个Slaves)的集群上进行,实验结果表明所提出的方法可以处理大数据文件而且提高了处理效率.而且三个并行计算性能评价指标Speedup,Scaleup,Sizeup的实验结果也显示出本方法的优越性.
Analyzing the high speed rail data and obtaining its operational states are vital to guarantee the safety of rail transportation. Vibration data is one kind of them. Vibration data is obtained by sampling with multiple sensors in a fixed frequency like 2500Hz. The volume of vibration data wilI be Gigabytes if a testing experiment lasts 1 or 2 days. Before data analysis, the vibration data preprocessing is dispensable. It includes erasing outliers and linear trend removal, etc. Erasing outliers means that we firstly decide and locate the outliers in the data file using common rules, and then we replace the outliers by using its 4 neighbor data values. Linear trend removal means we need to remove the offset since there is a linear offset in the raw data due to the test equipment. Traditional methods for processing vibration data become inefficient since they process the data files one by one serially. The processing time is long and insufferable. Moreover they cannot deal with big size files due to the limitation of memory. Then theyare forced to randomly sample the raw data and only analyze the small part data. Clearly it may lose some important information in vibration data. This paper aims to improve the efficiency of preprocessing vibration data. Cloud Computing has received much attention with idea of sharing computing capabilities and cooperatively working. Based on the analysis of the preprocessing methods of vibration data and the MapReduce architecture in cloud computing, the parallel methods of the preprocessing vibration data including erasing outliers and linear trend removal are accomplished. These methods are implemented on Hadoop platform. Experiments are designed to verify the effectiveness and the parallel consistency. We conduct performance experiments on Hadoop clusters with 6 six nodes (1 Master and 5 Slaves). The results show that the proposed methods can deal with the big-size file and improve the processing efficiency. Moreover, the experimental results on three parallel performance indexes, Speedup, Scaleup and Sizeup, demonstrate the advantage of our methods.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2012年第4期390-396,共7页
Journal of Nanjing University(Natural Science)
基金
"十一五"国家科技支撑计划(2009BAG12A01-E08-1)