摘要
混合条件属性参数间的距离值存在较大的差异,导致仅聚合距离数量级较大、较规律的数值条件属性对象,而忽视数量级较小、混沌,但类别特征更加明显的分类条件属性对象。提出了一种基于平均互信息的聚类算法。通过熵量化参数类别特性的大小,再根据熵的平均互信息计算方法衡量数据对象间类别的相同、相异特征量,统一数值和分类条件属性参数间距离的数量级,最后通过优化迭代自适应过程得到最终聚类结果。实验结果表明,该算法具有良好的聚类质量和自适应性。
There is a great difference between the distances of mixed condition attributes parameter.The numeric condition attributes object with larger and law magnitude tends to be clustered only.With small and chaos magnitude,the categorical condition attributes object which has obvious category characteristics will be ignored.A clustering algorithm based on average mutual information was proposed.First,the size of parameter category characteristics is quantified through entropy.Then,the similarity and the difference between category characteristics are measured according average mutual information of entropy.The magnitude between distances of numeric and categorical condition attributes parameter is unified.At last,the final clustering result is got by optimizing iterative adaptive process.The experimental results show that the proposed algorithm was high clustering quality and good adaptability.
出处
《计算机科学》
CSCD
北大核心
2015年第3期261-265,共5页
Computer Science
基金
广东省教育部产学研结合项目(2011A090200088)
广东省茂名市科技计划项目(2012B009)
广东省石化装备故障诊断重点实验室资助
关键词
混合条件属性
平均互信息
聚类
Mixed condition attributes
Average mutual information
Clustering