摘要
针对基于安全多方计算聚类算法的低效问题,提出了基于聚类特征树结构的隐私保护的层次k-means聚类算法。算法基于半诚信模型,在第三方内存中保留对各记录的索引信息及聚类特征树的当前层信息,减少了I/O次数和通信量,克服了难以适应多数据方和因过于信赖第三方导致隐私泄漏等缺陷。算法通过基于安全多方计算的标准化协议、距离计算协议和聚类中心计算协议,实现了数据的有效保护,综合层次和k-means聚类算法的优点,提高了计算精度和算法的可伸缩性。理论证明了算法的安全性和高效性,实验结果表明所提算法优于同类算法。
To improve the efficiency of a clustering algorithm based on secure multi party computation, a privacy preserving hierarchical k means algorithm based on clustering feature tree and semi-honest model is proposed. The algorithm stores the index information of every record and current hierarchical information of the clustering feature tree in the third party's memory and reduces the I/O times and communication cost, it also overcomes the drawbacks of applying to multi-party difficultly and leaking privacy due to depending on the third party excessively. The algorithm introduces three secure protocols such as distance computation, clustering center computation and standardization to protect data privacy effectively and accurately, and improves the precision and flexibility by combining the merits of hierarchical and k-means clustering algorithms. Theoretic argument demonstrates that the algorithm is secure and completes with good efficiency. The experimental results show that the proposed algorithm outperforms the other existing algorithms in communication cost and computation cost.
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2009年第10期2521-2526,共6页
Systems Engineering and Electronics
基金
国家自然科学基金(60773049
60603041)
江苏大学高级人才启动基金(09JDG041)资助课题
关键词
隐私保护
数据挖掘
安全多方计算
聚类特征树
相异矩阵
privacy preserving
data mining
secure multi-party computation
clustering feature tree
dissimilarity matrix