摘要
基于多向主元分析(multi-way principal component analysis,MPCA)(包括主元分析(principal component analysis,PCA))的统计监控模型易受建模数据中离群点影响,将数据点的k-最近邻(k-nearest neighbor,k-NN)距离dk作为离群度指标能有效地发现非线性数据集中的离群点,但现有的基于该定义的鲁棒离群点检测算法对不同尺度的中心化和标准化方法非常敏感,且需要计算每个数据点的dk,引起巨大的计算开销。提出一种改进尺度的近邻修剪(modified scale neighborhood pruning,MSNHP)高效鲁棒离群点检测算法用于对统计监控建模数据集的预处理。该算法利用改进尺度得到离线建模正常数据的均值和标准差,并对数据进行中心化和标准化处理;在每次dk查询过程中计算出其他点的dk上界用于直接修剪非离群点,以减少dk查询的次数;并通过优化搜索次序提高修剪效果和减少每次dk查询的计算开销。将该算法应用于β-甘露聚糖酶发酵间歇过程离群点检测,与其他鲁棒离群点检测算法相比,应用结果表明该算法明显减少了计算开销,对数据集数据个数和算法参数都具有更好的伸缩性。
The statistical monitoring model based on multi-way principal component analysis including principal component analysis (MPCA)is strongly affected by outlying observation data, taking the k-nearest neighbor distance ( d^k ) of data point as the outlying measure can effectively detect outliers in nonlinear data set. However existing robust outlier detection algorithms based on this definition are very sensitive to centralization and standardization approaches with various scales ,and it is necessary to calculate the d^k of each data point ,which brings huge computational cost. A high performance robust outlier detection algorithm named modified scale neighborhood pruning (MSNHP) is proposed for preprocessing the statistical monitoring modeling data set. The MSNHP algorithm utilizes the modified scale to obtain the mean and standard deviation of the off-line modeling normal data, and carries out centralization and standardization of the modeling data using the mean and standard deviation. MSNHP algorithm calculates the upper bounds of d^k for other data points in each d^k query process ,which are used for pruning the non-outliers and reducing the number of d^k queries. The searching order is optimized to increase the pruning effects and reduce the computational cost of each d^k query. The proposed algorithm was applied to the outlier detection in β -mannanase fermentation batch process. Compared with other robust outlier detection algorithms, the application results show that the proposed algorithm can obviously reduce the computational cost and has better stability when the data number in data set and algorithm parameters change.
出处
《仪器仪表学报》
EI
CAS
CSCD
北大核心
2012年第12期2742-2746,共5页
Chinese Journal of Scientific Instrument
基金
国家自然科学基金(61174123)
广东省自然科学基金(9151063101000043)资助项目
关键词
改进尺度的近邻修剪
高效鲁棒离群点检测
统计监控建模
数据预处理
modified sale nighborhood pruning (MSNHP)
high performance robust outlier detection algorithm
statistical monitoring modeling
data preprocessing