摘要
数据清洗是数据预处理的重要内容,现有的清洗技术存在野值漏选、受野值影响等问题。提出了一种基于回归模型的动态精细识别算法,在剔除潜在野值的基础上利用前后两段数据的回归值作为参考值,再结合采集参数变化速率限制给出野值判决,并给出了基于回归模型数据清洗的处理流程,采用野值粗识别、精细识别、回归估计的步骤提高清洗效率和效果。最后,利用一组真实的航空采集数据对该方法进行验证,处理结果表明,基于回归模型的采集数据清洗技术能够对野值进行准确的识别和估计。
Data cleaning is an important content in data preprocessing,but problems such as outlier missing and outlier influence exist in current data cleaning technology.A dynamic and fine identification algorithm for outliers based on regression model is proposed,in which the regressive values of two data segments ahead of and after the current position are set as referenced values after the elimination of potential outliers,which is used together with the limits of parameters’ change rate to give the judgement of outliers.Data cleaning procedure based on regression model is also given,in which steps of coarse identification,fine identification and regressive estimation are adopted to improve the efficiency and effects of data cleaning.A set of real aeronautical data sampled is used to certify the proposed method,and the processing results show that the data cleaning technology based on regression model is able to identify and estimate outliers accurately.
作者
李洪烈
夏栋
王倩
LI Honglie;XIA Dong;WANG Qian(Qingdao Campus,Naval Aeronautics University,Qingdao 266000,China)
出处
《电光与控制》
CSCD
北大核心
2022年第4期117-120,共4页
Electronics Optics & Control
关键词
数据清洗
采集数据
数据预处理
回归平滑
data cleaning
sampled data
data preprocessing
regressive smoothing