摘要
针对基于密度的带有噪声的空间聚类(DBSCAN)算法用于交互式数据挖掘时用户经常调整算法参数以发现感兴趣的知识以及数据集相对稳定的特点,提出了一种基于DBSCAN发现高密度簇的算法—S-DBSCAN算法,确定了需调整的算法参数——对象的邻域范围8(Eps)和满足核心对象条件的£邻域内最小对象个数MinPts,阐述了参数8与MinPts的3种适合S-DBSCAN算法的变化情况,并给出了相应的证明,同时分析了算法的时间复杂度。在对真实和合成数据集的测试中,S-DBSCAN算法相比DBSCAN算法具有较好的效率。
Considering that when the algorithm based on density-based spatial clustering of applications with noise (DB- SCAN) is applied to interactive data mining, certain algorithm parameters are usually adjusted to find new knowl- edge, and the data set used in data mining is relatively stable, this paper presents an algorithm for finding high density clusters based on DBSCAN, called the S-DBSCAN algorithm, and determines the parameters needing to be adjusted, the e, neighborhood of an object, and the MinPts, minimal number of objects of e-neighborhood to form a core object. Then three different combinations of the variations of e-neighborhood and MinPts fit for the S-DB- SCAN algorithm are introduced, and the rightness is demonstrated and the time complexity is analyzed. The experi- ments on real and synthetic data were performed to verify the efficiency and the results show that the S-DBSCAN al- gorithm has a better efficiency than DBSCAN.
出处
《高技术通讯》
CAS
CSCD
北大核心
2012年第6期589-595,共7页
Chinese High Technology Letters
基金
863计划(2009AA122220,2009AA122226)资助项目