摘要
数据质量控制是智能交通系统应用建设的关键技术之一。基于对射频识别(RFID)数据特性的分析,将RFID冗余数据分为重复数据和相似数据,通过分析同一车辆的相邻过车时间来检测2类冗余数据。针对相似数据给出了冗余率曲线和冗余时间点的定义,解决了RFID交通数据中冗余数据的识别问题。针对2类冗余数据的特点,给出了2类冗余率的计算方法,提出了从基站和冗余率曲线走势2个角度出发对冗余率进行分析的方法,并给出了冗余数据的清洗方法。选取南京市区主干道上21个RFID基站的原始数据作为实例,对所提出的方法进行了验证。研究结果表明,21个基站采集重复数据的平均冗余率为0.006 2%,相似数据的平均冗余率为0.92%,说明RFID数据采集技术采集到的数据具有较高可靠性。同时,各个基站采集的数据中相似数据数量远远多于重复数据数量。观察不同形状的冗余率曲线发现,冗余率曲线呈趋于平缓和尾部上升的基站冗余率较高;冗余率曲线呈直线上升的基站冗余率较低。针对分析结果,给出了相应的质量控制措施以控制RFID冗余数据的产生。
Data quality control is one of the key technologies for intelligent transportation systems. Radio FrequencyIdentification (RFID ) data generally contain redundancy. According to the different characteristics, they can be broadlydivided into two types duplicate data and similar data. Redundancy detection is based on an analysis of the adjacent timefor one vehicle. To identify the redundant RFID data, the curve of redundancy rate and time points of redundant data areextracted. Due to different characteristics of the redundancy types, their redundancy rates are computed separately. A detectionalgorithm is proposed, and applied to analyze the redundancy rate of RFID data in two aspects: RFID stations andshapes of redundancy curves. Moreover, a cleansing method for redundant data is also proposed. As a case study, rawRFID data are collected from 21 RFID stations on the main road in the City of Nanjing. The results show that the averagerate of duplicate data is 0.006 2% ; which of similar data is 0.92% . Moreover, in each RFID station, the amount of similardata is much larger than that of duplicate data. From the shape-of-redundancy-curve point of view, it is observed that theleveling off or tail rising curves are related to the stations with high redundancy rates; while the straight up curves implylow redundancy rates. Based on the analysis, several measures are proposed to control redundant RFID data.
出处
《交通信息与安全》
2016年第3期72-80,共9页
Journal of Transport Information and Safety
基金
国家自然科学基金项目(61573106)资助
关键词
智能交通系统
RFID交通数据
冗余数据检测
冗余率
intelligent transportation system
RFID data
detection of redundant data
redundancy rate