摘要
针对二元数据空间高维稀疏性的特点而提出的有限混合伯努利模型,能够快速寻找映射簇的模型框架;EM算法是数学模型进行参数迭代的重要方法,其算法的优劣很大程度上取决于其初始参数。对于运用EM算法来实现有限混合伯努利模型聚类算法已有许多研究,EM算法中参数的选取直接影响聚类算法的性能。引入Binning法和改变数据之间相似度测量方式、中心点的选取方式来进行初始化,从而大大减少聚类结果对初始参数的依赖,实验证明该算法是高效的、正确的。
Aiming at the characteristic of high-dimensionality and sparseness in binary data set, proposes the finite mixtures of Bernoulli distributions model for finding projected clusters fast. EM algorithm is the important method of iterative parameters, and the degree of good or bad with EM algorithm lies on initial parameters. As far as the finite mixtures of Bernoulli distributions model, there have been lots of researches about it. However, in EM algorithm, the initial parameters affect the clustering performance directly. Therefore, this paper introduced Binning method and computed parameters through changing the comparability measurement between dates and selection style about core-point,in order to reduce the dependence of the clustering for initial parameters. Experiment demonstrates the algorithm is efficient and accurate.
出处
《计算机应用研究》
CSCD
北大核心
2009年第1期47-49,共3页
Application Research of Computers
基金
国家"863"计划资助项目(2007AA12Z238)
关键词
子空间聚类
二元数据
有限混合伯努利模型
EM算法
subspace clustering
binary data
the finite mixtures of Bernoulli distributions model
EM algorithm