摘要
随着互联网,多媒体技术快速发展,互联网上的图像数量飞速增长,如何快速、有效地在海量的图像数据中找到用户需要的图像成为研究的热点。传统的图像检索系统基于单节点的架构,在处理海量图像数据时存在速度慢、并行性差、内存不足等问题。提出了一种基于Spark的海量图像检索方法,将图像检索技术与Spark计算框架相结合。图像集分布式地存储在HDFS中,能够进行分布式地特征提取、模型训练、在线检索。与单节点检索系统相比,该方法在处理大数据图像检索时,具有速度快,可扩展性强等优点,能够处理单机无法处理的海量图像数据。在holiday数据集上的实验结果表明,该方法有效地提高了算法的运行速度。
With the rapid development of Internet and multimedia technology, there has been an explosive growth of image data on the Internet. This trend has shown the increasing need to support more effective image retrieval. The traditional image retrieval system architecture is based on a single node, which will face the problems of low speed, out of memory in dealing with massive image data. In this paper, a novel image retrieval framework which combines image retrieval technology and spark computing framework is proposed for the massive image retrieval. Image datasets are stored in HDFS distributed database, which can be distributedly read for feature extraction, model training and real-time retrieval. Compared with the single-node retrieval system, our method can yield promising high speed, good expansibility and can handle massive images that a single-node retrieval system cannot process.
出处
《微型电脑应用》
2015年第11期11-13,17,共4页
Microcomputer Applications
基金
国家科技支撑计划(2013BAH09F01)
上海市科委科技创新行动计划(14511106900)