摘要
由于数据采集设备的不完善以及数据在传输过程中的不可靠性等原因,致使加油站车辆加油数据中经常会产生数据的丢失和错误,降低了车辆加油数据的完整性,严重影响了后续的数据分析工作。目前虽然已有许多算法可以解决连续型数值数据的缺损问题,但是它们并不适用于车辆号牌这种离散型分类数据。提出一种基于改进TruthFinder算法的缺损值填充框架。基于真值发现算法,考虑到离散数据相似度的计算方式,改进原算法对数据值支持度的计算模型。通过在真实加油站车辆数据集上的实验,相较于原算法及更加通用的Voting算法,正确率分别提升了7%和23%。该方法能部分解决类似加油站车辆加油数据这种多源离散型数据的缺损值填充问题,大大提高了此数据的可用性。
Due to the imperfection of the data acquisition equipment and the unreliability of the data in the transmission process,data loss and errors often occur in the fueling data of the gas station vehicles.These problems reduce the integrity of the vehicle fueling data and seriously affect the subsequent data analysis work.Although there are many algorithms that can solve the problem of continuous numerical data defects,but they are not suitable for discrete classification data such as vehicle plates.Therefore,this paper proposed a defect-filling framework based on the improved TruthFinder algorithm.Its framework improved the calculation model of the data support by considering the calculation method of discrete data similarity.Through experiments on real gas station vehicle datasets,compared with the original algorithm and the Voting algorithm,the correct rates have increased by 7% and 23% respectively.The method can partially solve the problem of filling the defect value of the multi-source discrete data such as gas station vehicle fueling data,and greatly improves it s availability.
作者
彭新亮
程力
王轶
马博
赵凡
周喜
Peng Xinliang;Cheng Li;Wang Yi;Ma Bo;Zhao Fan;Zhou Xi(The Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi 830011,Xinjiang,China;University of the Chinese Academy of Sciences,Beijing 100049,China;Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011,Xinjiang,China)
出处
《计算机应用与软件》
北大核心
2019年第8期41-46,74,共7页
Computer Applications and Software
基金
2017“天山雪松计划”项目(2017XS05)
新疆维吾尔自治区十三五重大专项(2016A03007-2)