摘要
大数据处理是物联网研究和应用上不可回避的难题之一,针对常用聚类方法在大数据处理上的不足,设计了一种划分聚类新方法。该方法采用了大数据集的抽样技术,对多次抽取的规模足够大的样本进行聚类以确定自然簇质心的初始位置,在此基础上采用抽样后剩余数据样本对质心的初始位置进行更新,以便校正偏离理想位置的初始质心。该划分聚类算法具有线性空间复杂度和时间复杂度。实验结果表明所提的新聚类算法不仅能得到比常用聚类算法更理想的结果,而且运行速度快,适合处理大规模数据的聚类任务。
Large data processing is an inevitable problem for the internet of things research and application. To solve the shortcomings of large data processing with the common clustering methods,a novel partitional clustering method is designed.The new method determines the initial positions of natural cluster centroids by clustering the samples in sizes large enough,which are selected using the large data sampling method repeated-ly.Next it updates the initial positions using the remaining data to correct the centroids positions deviating from the ideal positions.The designed partitional clustering algorithm has linear space and time complexity.The ex-perimental results show that this new clustering algorithm can not only give better clustering results than com-mon clustering algorithms,but also run fast and be suitable for large data clustering processing.
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2014年第5期1010-1015,共6页
Systems Engineering and Electronics
基金
国家自然科学基金(60975042)
黑龙江省教育厅科学技术项目(12511166)资助课题
关键词
大数据
物联网
划分聚类
抽样
质心
large data
internet of things
partitional clustering
sampling
centroid