期刊文献+

基于度量参数自学习的半监督密度聚类方法

Semi-supervised density-based clustering method based on self-learning of metric parameters
下载PDF
导出
摘要 针对密度聚类算法(DBSCAN)难以体现各维度对聚类的差异化贡献,且算法准确性依赖人工设置距离阈值等问题,文中提出基于度量参数自学习的半监督DBSCAN,即SMP-SDBSCAN。设计基于logistic回归模型的距离参数训练方法,利用少量的标记数据训练各维度的聚类贡献权重;构建数据聚簇参数计算机制,将标记数据聚簇的平均类间距离和邻域密度设置为聚类参数,提升密度聚类算法对数据集的适应性。实验表明,所提方法能够选择合理的聚类参数,可有效提升密度聚类算法聚类精度。 In order to address the issue that the Density-Based Spatial Clustering of Application with Noise(DBSCAN)fails to reflect the differentiated contributions of each dimension to the clustering,and the accuracy of the algorithm depends on the manual setting of distance threshold parameters,a semi-supervised DBSCAN clustering algorithm called SMP-SDBSCAN is proposed,which is based on the self-learning of metric parameters.A distance parameter training method based on the logistic regression model is designed to train the clustering contribution weights of each dimension using a small amount of labeled data.A mechanism for calculating the cluster parameters of data clusters is constructed,where the average inter-cluster distance and neighborhood density of the labeled data clusters are calculated as the clustering parameters,thereby improving the adaptability of the density clustering algorithm to the data set.Experiment results show that the proposed method can select reasonable clustering parameters and effectively improve the clustering accuracy of the density-based clustering algorithm.
作者 袁国泉 赵新建 张颂 陈石 徐晨维 YUAN Guo-quan;ZHAO Xin-jian;ZHANG Song;CHEN Shi;XU Chen-wei(Information&Telecommunication Branch State Grid Jiangsu Electric Power Co.,Ltd.,Nanjing 210000,China)
出处 《信息技术》 2024年第11期77-83,91,共8页 Information Technology
基金 国网江苏省电力有限公司科技项目(J2022109)。
关键词 密度聚类 距离度量 LOGISTIC回归 半监督学习 自学习 density clustering distance measurement logistic regression semi-supervised learning self-learning
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部