With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the pr...With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the problem of anomaly detection is a hot topic.Based on the development of anomalous trajectory detection of moving objects,this paper introduces the classical trajectory outlier detection(TRAOD) algorithm,and then proposes a density-based trajectory outlier detection(DBTOD) algorithm,which compensates the disadvantages of the TRAOD algorithm that it is unable to detect anomalous defects when the trajectory is local and dense.The results of employing the proposed algorithm to Elk1993 and Deer1995 datasets are also presented,which show the effectiveness of the algorithm.展开更多
Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical ...Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical methods for anomalous cell detection cannot adapt to the evolution of networks, and data mining becomes the mainstream. In this paper, we propose a novel kernel density-based local outlier factor(KLOF) to assign a degree of being an outlier to each object. Firstly, the notion of KLOF is introduced, which captures exactly the relative degree of isolation. Then, by analyzing its properties, including the tightness of upper and lower bounds, sensitivity of density perturbation, we find that KLOF is much greater than 1 for outliers. Lastly, KLOFis applied on a real-world dataset to detect anomalous cells with abnormal key performance indicators(KPIs) to verify its reliability. The experiment shows that KLOF can find outliers efficiently. It can be a guideline for the operators to perform faster and more efficient trouble shooting.展开更多
Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outl...Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.展开更多
The paper puts forward a new method of density-based anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The N...The paper puts forward a new method of density-based anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The NIDS can find new unknown intrusion behaviors, which are used to updated the intrusion rule-base, based on which intrusion detections can be carried out online by the BM pattern match algorithm. Finally all modules of the NIDS are described by formalized language.展开更多
Focusing on controlling the press-assembly quality of high-precision servo mechanism,an intelligent early warning method based on outlier data detection and linear regression is proposed.Linear regression is used to d...Focusing on controlling the press-assembly quality of high-precision servo mechanism,an intelligent early warning method based on outlier data detection and linear regression is proposed.Linear regression is used to deal with the relationship between assembly quality and press-assembly process,then the mathematical model of displacement-force in press-assembly process is established and a qualified press-assembly force range is defined for assembly quality control.To preprocess the raw dataset of displacement-force in the press-assembly process,an improved local outlier factor based on area density and P weight(LAOPW)is designed to eliminate the outliers which will result in inaccuracy of the mathematical model.A weighted distance based on information entropy is used to measure distance,and the reachable distance is replaced with P weight.Experiments show that the detection efficiency of the algorithm is improved by 5.6 ms compared with the traditional local outlier factor(LOF)algorithm,and the detection accuracy is improved by about 2%compared with the local outlier factor based on area density(LAOF)algorithm.The application of LAOPW algorithm and the linear regression model shows that it can effectively carry out intelligent early warning of press-assembly quality of high precision servo mechanism.展开更多
离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改...离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改进K-means聚类算法,提出了一种名为KLOD(local outlier detection based on improved K-means and least-squares methods)的局部离群点检测方法,以实现对局部离群点的精确检测。首先,利用快速搜索和发现密度峰值方法计算数据点的局部密度和相对距离,并将二者相乘得到γ值。其次,将γ值降序排序,利用肘部法则选择γ值最大的k个数据点作为K-means聚类算法的初始聚类中心。然后,通过K-means聚类算法将数据集聚类成k个簇,计算数据点在每个维度上的目标函数值并进行升序排列。接着,确定数据点的每个维度的离散程度并选择适当的拟合函数和拟合点,通过最小二乘法对升序排列的每个簇的每1维目标函数值进行函数拟合并求导,以获取变化率。最后,结合信息熵,将每个数据点的每个维度目标函数值乘以相应的变化率进行加权,得到最终的异常得分,并将异常值得分较高的top-n个数据点视为离群点。通过人工数据集和UCI数据集,对KLOD、LOF和KNN方法在准确度上进行仿真实验对比。结果表明KLOD方法相较于KNN和LOF方法具有更高的准确度。本文提出的KLOD方法能够有效改善K-means聚类算法的聚类效果,并且在局部离群点检测方面具有较好的精度和性能。展开更多
基于随机子采样的隔离森林算法没有考虑到子采样中来自不同区域样本点之间的相对密度,为此提出基于核函数的隔离森林算法K-iForest,根据概率密度函数重新采样来提高隔离森林算法的性能。在离群点检测数据库(ODDS)的Annthyroid、ForestCo...基于随机子采样的隔离森林算法没有考虑到子采样中来自不同区域样本点之间的相对密度,为此提出基于核函数的隔离森林算法K-iForest,根据概率密度函数重新采样来提高隔离森林算法的性能。在离群点检测数据库(ODDS)的Annthyroid、ForestCover、Mulcross、Shuttle和Http(KDD Cup 1999)、Smtp(KDD Cup 1999)、KDD CUP 99数据集上验证K-iForest算法的有效性和效率,并与iForest算法、EIF算法、RRCF算法、GIF算法以及HIF算法进行比较。实验结果表明,K-iForest算法的AUC值高出其他算法0.1%~100.2%。展开更多
With the advancement of society and science and technology, the demand for detecting small objects in practical scenarios becomes stronger. Such objects are only represented by relatively small coverage of pixels, and...With the advancement of society and science and technology, the demand for detecting small objects in practical scenarios becomes stronger. Such objects are only represented by relatively small coverage of pixels, and the features are degraded severely after being extracted by a deep convolutional neural network, which is detrimental to the detection performance for small objects. Therefore, an intuitive solution is to increase the resolution of small objects by cropping the original image. In this paper, we propose a simple but effective object density map guided region localization module (DMGRL) to locate and crop the regions of interest where small objects may exist. Firstly, the density map of the objects is estimated by object density map estimation network, and then the coordinates of the small object regions are calculated;Secondly, the continuous differentiable affine transformation is utilized to crop these regions so that the detector with DMGRL can be trained end-to-end instead of two-stage training. Finally, the all prediction results of input image and cropped region images are merged together to output the final detection results by non maximum suppression (NMS). Extensive experiments demonstrate the superior performance of the detector incorporated DMGRL.展开更多
近年来,混合型数据的聚类问题受到广泛关注。作为处理混合型数据的一种有效方法,K-prototype聚类算法在初始化聚类中心时通常采用随机选取的策略,然而这种策略在很多实际应用中难以保证聚类结果的质量。针对上述问题,采用基于离群点检...近年来,混合型数据的聚类问题受到广泛关注。作为处理混合型数据的一种有效方法,K-prototype聚类算法在初始化聚类中心时通常采用随机选取的策略,然而这种策略在很多实际应用中难以保证聚类结果的质量。针对上述问题,采用基于离群点检测的策略来为K-prototype算法选择初始中心,并提出一种新的混合型数据聚类初始化算法(initialization of K-prototype clustering based on outlier detection and density,IKP-ODD)。给定一个候选对象,IKP-ODD通过计算其距离离群因子、加权密度以及与已有初始中心之间的加权距离来判断候选对象是否是一个初始中心。IKP-ODD通过采用距离离群因子和加权密度,防止选择离群点作为初始中心。在计算对象的加权密度以及对象之间的加权距离时,采用邻域粗糙集中的粒度邻域熵来计算每一个属性的重要性,并根据属性重要性的大小为不同属性赋予不同的权重,有效地反映不同属性之间的差异性。在多个UCI数据集上的实验表明,相对于现有的初始化方法,IKP-ODD能够更好地解决K-prototype聚类的初始化问题。展开更多
基金supported by the Aeronautical Science Foundation of China(20111052010)the Jiangsu Graduates Innovation Project (CXZZ120163)+1 种基金the "333" Project of Jiangsu Provincethe Qing Lan Project of Jiangsu Province
文摘With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the problem of anomaly detection is a hot topic.Based on the development of anomalous trajectory detection of moving objects,this paper introduces the classical trajectory outlier detection(TRAOD) algorithm,and then proposes a density-based trajectory outlier detection(DBTOD) algorithm,which compensates the disadvantages of the TRAOD algorithm that it is unable to detect anomalous defects when the trajectory is local and dense.The results of employing the proposed algorithm to Elk1993 and Deer1995 datasets are also presented,which show the effectiveness of the algorithm.
基金supported by the National Basic Research Program of China (973 Program: 2013CB329004)
文摘Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical methods for anomalous cell detection cannot adapt to the evolution of networks, and data mining becomes the mainstream. In this paper, we propose a novel kernel density-based local outlier factor(KLOF) to assign a degree of being an outlier to each object. Firstly, the notion of KLOF is introduced, which captures exactly the relative degree of isolation. Then, by analyzing its properties, including the tightness of upper and lower bounds, sensitivity of density perturbation, we find that KLOF is much greater than 1 for outliers. Lastly, KLOFis applied on a real-world dataset to detect anomalous cells with abnormal key performance indicators(KPIs) to verify its reliability. The experiment shows that KLOF can find outliers efficiently. It can be a guideline for the operators to perform faster and more efficient trouble shooting.
基金Project(61362021)supported by the National Natural Science Foundation of ChinaProject(2016GXNSFAA380149)supported by Natural Science Foundation of Guangxi Province,China+1 种基金Projects(2016YJCXB02,2017YJCX34)supported by Innovation Project of GUET Graduate Education,ChinaProject(2011KF11)supported by the Key Laboratory of Cognitive Radio and Information Processing,Ministry of Education,China
文摘Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.
基金Funded by Shaanxi Natural Science Foundation(2002G07)
文摘The paper puts forward a new method of density-based anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The NIDS can find new unknown intrusion behaviors, which are used to updated the intrusion rule-base, based on which intrusion detections can be carried out online by the BM pattern match algorithm. Finally all modules of the NIDS are described by formalized language.
文摘Focusing on controlling the press-assembly quality of high-precision servo mechanism,an intelligent early warning method based on outlier data detection and linear regression is proposed.Linear regression is used to deal with the relationship between assembly quality and press-assembly process,then the mathematical model of displacement-force in press-assembly process is established and a qualified press-assembly force range is defined for assembly quality control.To preprocess the raw dataset of displacement-force in the press-assembly process,an improved local outlier factor based on area density and P weight(LAOPW)is designed to eliminate the outliers which will result in inaccuracy of the mathematical model.A weighted distance based on information entropy is used to measure distance,and the reachable distance is replaced with P weight.Experiments show that the detection efficiency of the algorithm is improved by 5.6 ms compared with the traditional local outlier factor(LOF)algorithm,and the detection accuracy is improved by about 2%compared with the local outlier factor based on area density(LAOF)algorithm.The application of LAOPW algorithm and the linear regression model shows that it can effectively carry out intelligent early warning of press-assembly quality of high precision servo mechanism.
文摘离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改进K-means聚类算法,提出了一种名为KLOD(local outlier detection based on improved K-means and least-squares methods)的局部离群点检测方法,以实现对局部离群点的精确检测。首先,利用快速搜索和发现密度峰值方法计算数据点的局部密度和相对距离,并将二者相乘得到γ值。其次,将γ值降序排序,利用肘部法则选择γ值最大的k个数据点作为K-means聚类算法的初始聚类中心。然后,通过K-means聚类算法将数据集聚类成k个簇,计算数据点在每个维度上的目标函数值并进行升序排列。接着,确定数据点的每个维度的离散程度并选择适当的拟合函数和拟合点,通过最小二乘法对升序排列的每个簇的每1维目标函数值进行函数拟合并求导,以获取变化率。最后,结合信息熵,将每个数据点的每个维度目标函数值乘以相应的变化率进行加权,得到最终的异常得分,并将异常值得分较高的top-n个数据点视为离群点。通过人工数据集和UCI数据集,对KLOD、LOF和KNN方法在准确度上进行仿真实验对比。结果表明KLOD方法相较于KNN和LOF方法具有更高的准确度。本文提出的KLOD方法能够有效改善K-means聚类算法的聚类效果,并且在局部离群点检测方面具有较好的精度和性能。
文摘基于随机子采样的隔离森林算法没有考虑到子采样中来自不同区域样本点之间的相对密度,为此提出基于核函数的隔离森林算法K-iForest,根据概率密度函数重新采样来提高隔离森林算法的性能。在离群点检测数据库(ODDS)的Annthyroid、ForestCover、Mulcross、Shuttle和Http(KDD Cup 1999)、Smtp(KDD Cup 1999)、KDD CUP 99数据集上验证K-iForest算法的有效性和效率,并与iForest算法、EIF算法、RRCF算法、GIF算法以及HIF算法进行比较。实验结果表明,K-iForest算法的AUC值高出其他算法0.1%~100.2%。
基金Supported by the National Center ATC Surveillance and Communication System Engineering Research。
文摘With the advancement of society and science and technology, the demand for detecting small objects in practical scenarios becomes stronger. Such objects are only represented by relatively small coverage of pixels, and the features are degraded severely after being extracted by a deep convolutional neural network, which is detrimental to the detection performance for small objects. Therefore, an intuitive solution is to increase the resolution of small objects by cropping the original image. In this paper, we propose a simple but effective object density map guided region localization module (DMGRL) to locate and crop the regions of interest where small objects may exist. Firstly, the density map of the objects is estimated by object density map estimation network, and then the coordinates of the small object regions are calculated;Secondly, the continuous differentiable affine transformation is utilized to crop these regions so that the detector with DMGRL can be trained end-to-end instead of two-stage training. Finally, the all prediction results of input image and cropped region images are merged together to output the final detection results by non maximum suppression (NMS). Extensive experiments demonstrate the superior performance of the detector incorporated DMGRL.
文摘近年来,混合型数据的聚类问题受到广泛关注。作为处理混合型数据的一种有效方法,K-prototype聚类算法在初始化聚类中心时通常采用随机选取的策略,然而这种策略在很多实际应用中难以保证聚类结果的质量。针对上述问题,采用基于离群点检测的策略来为K-prototype算法选择初始中心,并提出一种新的混合型数据聚类初始化算法(initialization of K-prototype clustering based on outlier detection and density,IKP-ODD)。给定一个候选对象,IKP-ODD通过计算其距离离群因子、加权密度以及与已有初始中心之间的加权距离来判断候选对象是否是一个初始中心。IKP-ODD通过采用距离离群因子和加权密度,防止选择离群点作为初始中心。在计算对象的加权密度以及对象之间的加权距离时,采用邻域粗糙集中的粒度邻域熵来计算每一个属性的重要性,并根据属性重要性的大小为不同属性赋予不同的权重,有效地反映不同属性之间的差异性。在多个UCI数据集上的实验表明,相对于现有的初始化方法,IKP-ODD能够更好地解决K-prototype聚类的初始化问题。