摘要
针对部分聚类算法对数据输入顺序敏感的问题,定义了不干涉序列指数,提出了应用不干涉序列指数对分类数据进行加权排序的方法,并基于该方法对受数据输入顺序影响的CABOSFV C分类数据高效聚类算法进行改进,提出了考虑加权排序的聚类算法(CABOSFV CSW),消除了算法对数据输入顺序的敏感性.采用UCI基准数据集进行实验,发现应用加权升序排序的CABOSFV CSW算法在处理分类数据时,聚类质量较原始CABOSFV C算法和其他受数据输入顺序影响的算法在准确性上有改善,在稳定性上有显著提高.
Aimed at solving the problem that part of clustering algorithms are sensitive to the data input order, a non-interference sequence index was defined, and an approach applying the non-interference sequence was proposed to sort categorical data by weight. Based on this approach, a new clustering algorithm considering sorting by weight (CABOSFV_CSW) was presented to improve CABOSFV^C, which is an efficient clustering algorithm for categorical data but sensitive to the data input order. This approach eliminates sensitivity to the data input order. UCI benchmark data sets were used to compare the proposed CABOSFV_CSW algorithm with traditional CABOSFV_C algorithm and other algorithms sensitive to the data input order. Empirical tests show that the new CABOSFV_CSW clustering algorithm for categorical data improves the accuracy and increases the stability effectively.
出处
《北京科技大学学报》
EI
CAS
CSCD
北大核心
2013年第8期1093-1098,共6页
Journal of University of Science and Technology Beijing
基金
国家自然科学基金资助项目(71271027)
中央高校基本科研业务费专项(FRF-TP-10-006B)
关键词
数据挖掘
聚类算法
排序
分类数据
data mining
clustering algorithm
sorting
categorical data