摘要
粗糙K-Means及其衍生算法在处理边界区域不确定信息时,其边界区域中的数据对象因与各类簇中心点的距离相差较小,导致难以依据距离、密度对数据点进行区分判断。提出一种新的粗糙K-Means算法,在对数据进行划分时,综合数据对象的局部密度与邻域归属信息来衡量数据点与类簇的相似性,边界数据与类簇之间的关系由其局部的空间分布所决定,使得模糊不确定信息之间的差异更明显。在人工数据集和UCI标准数据集上的实验结果表明,该算法对边界区域数据的划分具有更高的准确率。
For Rough K-Means(RKM)and its derivative clustering algorithms,the distances between the data object in the boundary area and the cluster centers vary slightly and it is difficult to cluster the data by the distance or density.This paper proposes a new rough K-Means algorithm,which integrates the local density of data objects with their neighborhood information to measure the similarity between the data points and the clusters.The relationship between boundary data and clusters is determined by their local spatial distribution,which makes the difference between fuzzy uncertain information more obvious.Experimental results on the artificial dataset and the UCI standard datasets show that the presented algorithm has a higher accuracy for the clustering of boundary data.
作者
孙静勇
马福民
SUN Jingyong;MA Fumin(College of Information Engineering,Nanjing University of Finance and Economics,Nanjing 210023,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2021年第3期109-116,共8页
Computer Engineering
基金
国家自然科学基金(61973151)
江苏省自然科学基金(BK20191376)
江苏省高校自然科学研究重大项目(17KJA120001)。
关键词
粗糙集
K-MEANS算法
局部密度
邻域信息
簇内相似
rough set
K-Means algorithm
local density
neighborhood information
intra-cluster similarity