摘要
随机邻域嵌入(stochastic neighbor embedding,SNE)算法在欧氏距离基础上定义了邻域概率函数,是一种基于数据间相似度的降维方法。针对欧氏距离在高维数据空间中不能提供较大的相对距离差、无法明显体现高维数据对象之间差异性的问题,提出一种基于Manhattan距离的随机邻域嵌入(Manhattan-SNE)算法。采用Manhattan距离衡量高维数据对象之间的相异度,得到高维空间和低维空间数据对象之间相似度的条件概率,嵌入目标是使得高维空间和低维空间的分布形式尽可能一致,选择KL散度作为算法的目标函数,通过梯度下降法寻找目标函数的最小值,从而得到算法的低维嵌入。测试与实验分析结果表明:所提算法的平均分类正确率有明显提高,证明了改进算法的有效性与实用性,可以用于故障数据的特征提取。
SNE algorithm was a dimensionality reduction method based on the similarity between data points. It defined a probability distribution over all the potential neighbors of the object based on Euclidean distance. Euclidean distance did not provide a larger relative distance between high-dimensional data points,and might not express the differences between high-dimensional data points well. This paper proposed an improved Manhattan-SNE algorithm. The algorithm used Manhattan distance to measure the dissimilarities between the high-dimensional data points,and then got the conditional probabilities of the high-dimensional and the low-dimensional space data points. The aim of the embedding was to match the distributions between the two spaces as well as possible. It used a gradient descent method to minimize Kullback-Leibler divergences. Experimental results show that Manhattan-SNE has higher classification accuracy,and also demonstrates the effectiveness and practicality.The improved algorithm can solve the fault feature extraction problem.
出处
《计算机应用研究》
CSCD
北大核心
2015年第10期2992-2995,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(51075069)