摘要
基于单元的离群数据挖掘是一类典型的离群数据挖掘方法,尽管具有可以快速识别离群数据和修剪非离群数据等优点,但由于只从单元的角度修剪非离群数据,可能使一些单元无法准确的确定离群数据。给出了一种基于网格单元和P权值的离群数据挖掘算法。该算法首先将数据集的每维均分,划分网格单元,并在网格单元中,筛选出离群数据和正常数据网格单元;对既含有离群数据又有正常数据的网格单元,采用P权值的方法,来度量和确定离群数据,从而进一步提高了离群挖掘精度;最后,采用UCI数据集,实验验证了该算法的有效性和可行性。
Cell-based outlier data mining is a kind of typical outlier data mining method, although it has the advan- tage of quick identification of outlier data and trimming the data from the group, only from the unit part perspective of pruning the non-outlier data is hkely to make some cells not be determined accurately. This paper presents a cell-based and P weight of outlier data mining algorithm. The algorithm firstly divides each dimension of the data set and di,ddes the grid cell, then in the grid cell, screens out the outlier data and the normal data grid cell; both contain outlier and the normal data of grid cell by using the method of p weights to measure and determine the outlier data from the group, so as to further improve the accuracy of outlier data mining; finally, the experiment proves the algorithm of the feasibility and effectiveness by using the UCI data sets.
出处
《太原科技大学学报》
2016年第5期359-364,共6页
Journal of Taiyuan University of Science and Technology