摘要
针对大型数据库提出了许多聚类方法,但是这些算法往往计算量较大、对主存的要求较高;而且当数据分布不均匀时,算法的聚类质量会受影响。因此为了提高聚类算法的效率和准确性,采用了数据分区技术首先对数据进行预处理,分区后的数据具有更少的数据量和更均匀的数据分布。
People raised many algorithms, but there are many disadvantages, for example, much computing especially in large scale database, demanding for large volume of memory support and so on. Furthermore clustering quality will be affected when the cluster density and the distance between clusters are not even. In order to improve the efficiency and quality ,this paper adopt pretreatment technology named data partition before clustering. After that, the number of data points is less and the distribution of data points is even.
出处
《计算机应用研究》
CSCD
北大核心
2007年第2期203-205,共3页
Application Research of Computers
基金
国家自然科学基金重大资助项目(60271019)
国家教育部基金资助项目(20020611007)
重庆市自然科学基金资助项目(8509)
关键词
数据挖掘
聚类
数据分区
并行聚类
Data Mining
Cluster
Data Partition
Parallel Clustering