摘要
随着数据量的不断增加,快速而准确的索引算法对信息检索而言变得十分重要.针对上述问题,提出了一种基于子空间学习的索引算法.首先,利用部分有标签的数据进行子空间学习,在学习过程中,为了保证语义相同的样本在索引后保持局部性,以样本近邻间的距离衡量类内聚合度;同时,为了保证不同语义的样本在索引后增强判别性,以不同语义样本中心之间的距离衡量类间离散度.通过放松限制,用类似线性判别分析的方法进行子空间学习,将子空间作为哈希函数的投影向量.利用学习到的投影向量进一步计算偏移量,得到哈希函数.分别在数据集MNIST和CIFAR-10上进行编码判别性实验和局部性保留实验,并与相关方法进行比较,得到了较好的效果.实验结果表明该方法是有效的.
With the increasing amount of data being collected, developing fast indexing methods with high accuracy becomes important for information retrieval tasks. To address this issue, this paper proposes an indexing method based on hashing mechanism with subspace learning. Firstly, the subspace is learned on a set of labeled data. To guarantee the locality preserving characteristics in the original space for the samples with similar semantic labels, the distances between the nearest neighbors are computed to measure the intra-class scatter. Besides, the distances between the centers of samples with dissimilar semantic labels are also computed to measure the inter-class scatter in order to enhance the discriminative power of the codes. The projections of the hash functions are then learned by relaxing the constraint of the formula. The biases are further learned based on the projections. Finally, the proposed method is evaluated on the datasets MNIST and CIFAR-10 to compare with the state-of-the-art methods. Experimental results show that the proposed method achieves significant performance and high effectiveness in searching semantically similar neighbors.
出处
《软件学报》
EI
CSCD
北大核心
2014年第8期1781-1793,共13页
Journal of Software
基金
国家自然科学基金(61273257
61321491
61035003)
国家重点基础研究发展计划(973)(2010CB327903)
教育部新世纪优秀人才计划(NCET-11-0213)
江苏省六大人才高峰计划(2013-XXRJ-018)
江苏省自然科学基金(BK2011005)