摘要
在一个给定的样本空间划分下,每个数据集是一个潜在的多项分布的抽样假设。通过对模型参数的最大似然估计,数据集的潜在分布近似于一个离散化的经验分布。根据推广的多项分布族的Fisher度量,潜在分布的信息差异可近似为经验分布间的差异,为基于MLE嵌入得到的信息流形上非监督学习创造了条件。当约简空间的维数为2或3时,原数据集之间的自然可分性可通过降维数据展现出来。实验结果表明,该方法能应用到大样本数据集或彩色图像等高维结构化数据的可视化。
The method is stemmed from the assumption that each data set is a probabilistic realization of an underlying multinomial distribution under a partition on sample space. With the MLE of model parameters, the underlying distribution of a data set can be approximated by a discretized probability distribution. With the generalized Fisher metric on multinomial manifold with boundary, the information divergence between underlying models can be approximated by the corresponding divergence between estimated distributions, it provides the necessary element for unsupervised learning on information manifold. The natural separation of original data sets can be visualized when the dimension of reduced space is two or three. Experimental result shows that the method can be applied to visualization of big sample data sets or color image data sets.
出处
《计算机工程》
CAS
CSCD
北大核心
2011年第1期4-6,共3页
Computer Engineering
基金
国家自然科学基金资助项目(9082004)
国家"863"计划基金资助项目(2006AA04Z238)
安徽自然科学基金资助项目(KJ2007B056)
关键词
多项分布
最大似然估计
流形学习
数据可视化
multinomial distribution
maximum likelihood estimation
manifold learning
data visualization