With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the pr...With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the problem of anomaly detection is a hot topic.Based on the development of anomalous trajectory detection of moving objects,this paper introduces the classical trajectory outlier detection(TRAOD) algorithm,and then proposes a density-based trajectory outlier detection(DBTOD) algorithm,which compensates the disadvantages of the TRAOD algorithm that it is unable to detect anomalous defects when the trajectory is local and dense.The results of employing the proposed algorithm to Elk1993 and Deer1995 datasets are also presented,which show the effectiveness of the algorithm.展开更多
With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorith...With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.展开更多
A novel approach for constructing robust Mamdani fuzzy system was proposed, which consisted of an efficiency robust estimator(partial robust M-regression, PRM) in the parameter learning phase of the initial fuzzy syst...A novel approach for constructing robust Mamdani fuzzy system was proposed, which consisted of an efficiency robust estimator(partial robust M-regression, PRM) in the parameter learning phase of the initial fuzzy system, and an improved subtractive clustering algorithm in the fuzzy-rule-selecting phase. The weights obtained in PRM, which gives protection against noise and outliers, were incorporated into the potential measure of the subtractive cluster algorithm to enhance the robustness of the fuzzy rule cluster process, and a compact Mamdani-type fuzzy system was established after the parameters in the consequent parts of rules were re-estimated by partial least squares(PLS). The main characteristics of the new approach were its simplicity and ability to construct fuzzy system fast and robustly. Simulation and experiment results show that the proposed approach can achieve satisfactory results in various kinds of data domains with noise and outliers. Compared with D-SVD and ARRBFN, the proposed approach yields much fewer rules and less RMSE values.展开更多
基于随机子采样的隔离森林算法没有考虑到子采样中来自不同区域样本点之间的相对密度,为此提出基于核函数的隔离森林算法K-iForest,根据概率密度函数重新采样来提高隔离森林算法的性能。在离群点检测数据库(ODDS)的Annthyroid、ForestCo...基于随机子采样的隔离森林算法没有考虑到子采样中来自不同区域样本点之间的相对密度,为此提出基于核函数的隔离森林算法K-iForest,根据概率密度函数重新采样来提高隔离森林算法的性能。在离群点检测数据库(ODDS)的Annthyroid、ForestCover、Mulcross、Shuttle和Http(KDD Cup 1999)、Smtp(KDD Cup 1999)、KDD CUP 99数据集上验证K-iForest算法的有效性和效率,并与iForest算法、EIF算法、RRCF算法、GIF算法以及HIF算法进行比较。实验结果表明,K-iForest算法的AUC值高出其他算法0.1%~100.2%。展开更多
为有效识别桥梁健康监测数据的异常,减少误预警、漏预警现象,保障桥梁监测数据的质量和有效性,针对大跨度斜拉桥长期监测数据的缺失、离群和漂移3类异常数据,提出基于时间序列压缩分割的监测数据异常识别算法。该算法将原始监测数据时...为有效识别桥梁健康监测数据的异常,减少误预警、漏预警现象,保障桥梁监测数据的质量和有效性,针对大跨度斜拉桥长期监测数据的缺失、离群和漂移3类异常数据,提出基于时间序列压缩分割的监测数据异常识别算法。该算法将原始监测数据时间序列通过基于序列重要点(Series Importance Point, SIP)的时间序列线性分段(Piecewise Linear Represent, PLR)算法(PLR_SIP)得到数条时间子序列;然后采用欧氏距离进行时间子序列的相似性分析,并基于改进的局部离群因子(Local Outlier Factor, LOF)算法计算每条时间子序列的局部离群因子;最后将其与设定的阈值相比较,从而识别出监测数据的异常。为验证该算法的准确性与工程实用性,对某公路大跨度斜拉桥健康监测数据进行异常识别。结果表明:采用PLR_SIP算法对原始时间序列压缩分割得到的时间子序列能够准确地反映原序列的变化趋势和范围;改进的LOF算法突破了传统LOF算法仅能识别离群值这类无持续时间异常的局限性,能够排除噪声的干扰,实现对离群、缺失和漂移3种异常的识别。该算法无需定义训练集,直接以原始监测数据作为算法的输入,同时能够自适应调整阈值参数,具有良好的可扩展性、实时性、准确性和高效性,适用于处理实时、大量的桥梁健康监测数据。展开更多
基金supported by the Aeronautical Science Foundation of China(20111052010)the Jiangsu Graduates Innovation Project (CXZZ120163)+1 种基金the "333" Project of Jiangsu Provincethe Qing Lan Project of Jiangsu Province
文摘With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the problem of anomaly detection is a hot topic.Based on the development of anomalous trajectory detection of moving objects,this paper introduces the classical trajectory outlier detection(TRAOD) algorithm,and then proposes a density-based trajectory outlier detection(DBTOD) algorithm,which compensates the disadvantages of the TRAOD algorithm that it is unable to detect anomalous defects when the trajectory is local and dense.The results of employing the proposed algorithm to Elk1993 and Deer1995 datasets are also presented,which show the effectiveness of the algorithm.
基金supported by the State Grid Liaoning Electric Power Supply CO, LTDthe financial support for the “Key Technology and Application Research of the Self-Service Grid Big Data Governance (No.SGLNXT00YJJS1800110)”
文摘With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.
基金Project(61473298)supported by the National Natural Science Foundation of ChinaProject(2015QNA65)supported by Fundamental Research Funds for the Central Universities,China
文摘A novel approach for constructing robust Mamdani fuzzy system was proposed, which consisted of an efficiency robust estimator(partial robust M-regression, PRM) in the parameter learning phase of the initial fuzzy system, and an improved subtractive clustering algorithm in the fuzzy-rule-selecting phase. The weights obtained in PRM, which gives protection against noise and outliers, were incorporated into the potential measure of the subtractive cluster algorithm to enhance the robustness of the fuzzy rule cluster process, and a compact Mamdani-type fuzzy system was established after the parameters in the consequent parts of rules were re-estimated by partial least squares(PLS). The main characteristics of the new approach were its simplicity and ability to construct fuzzy system fast and robustly. Simulation and experiment results show that the proposed approach can achieve satisfactory results in various kinds of data domains with noise and outliers. Compared with D-SVD and ARRBFN, the proposed approach yields much fewer rules and less RMSE values.
文摘基于随机子采样的隔离森林算法没有考虑到子采样中来自不同区域样本点之间的相对密度,为此提出基于核函数的隔离森林算法K-iForest,根据概率密度函数重新采样来提高隔离森林算法的性能。在离群点检测数据库(ODDS)的Annthyroid、ForestCover、Mulcross、Shuttle和Http(KDD Cup 1999)、Smtp(KDD Cup 1999)、KDD CUP 99数据集上验证K-iForest算法的有效性和效率,并与iForest算法、EIF算法、RRCF算法、GIF算法以及HIF算法进行比较。实验结果表明,K-iForest算法的AUC值高出其他算法0.1%~100.2%。
文摘为有效识别桥梁健康监测数据的异常,减少误预警、漏预警现象,保障桥梁监测数据的质量和有效性,针对大跨度斜拉桥长期监测数据的缺失、离群和漂移3类异常数据,提出基于时间序列压缩分割的监测数据异常识别算法。该算法将原始监测数据时间序列通过基于序列重要点(Series Importance Point, SIP)的时间序列线性分段(Piecewise Linear Represent, PLR)算法(PLR_SIP)得到数条时间子序列;然后采用欧氏距离进行时间子序列的相似性分析,并基于改进的局部离群因子(Local Outlier Factor, LOF)算法计算每条时间子序列的局部离群因子;最后将其与设定的阈值相比较,从而识别出监测数据的异常。为验证该算法的准确性与工程实用性,对某公路大跨度斜拉桥健康监测数据进行异常识别。结果表明:采用PLR_SIP算法对原始时间序列压缩分割得到的时间子序列能够准确地反映原序列的变化趋势和范围;改进的LOF算法突破了传统LOF算法仅能识别离群值这类无持续时间异常的局限性,能够排除噪声的干扰,实现对离群、缺失和漂移3种异常的识别。该算法无需定义训练集,直接以原始监测数据作为算法的输入,同时能够自适应调整阈值参数,具有良好的可扩展性、实时性、准确性和高效性,适用于处理实时、大量的桥梁健康监测数据。