期刊文献+

基于哈希算法的异构多模态数据检索研究 被引量:11

Study on Heterogeneous Multimodal Data Retrieval Based on Hash Algorithm
下载PDF
导出
摘要 随着大数据时代的发展,网络上的文本、图像、视频、音频等异构多模态数据呈指数级增长。在海量数据中进行异构多模态数据的检索,成为了热门的研究方向。但是,异构多模态数据检索面临两大挑战:1)数据存在“语义鸿沟”,即如何表达异构多模态数据之间的相似性;2)在海量数据中,如何进行准确高效的检索。针对哈希检索算法忽略了异构多模态数据之间语义一致性的问题,文中提出了一种基于CCA(典型相关性分析)语义一致性的哈希检索算法(CCA-SCH)。该算法为了保持模态内的语义一致性,分别生成文本和图像数据的语义模型;为了保持模态间的语义一致性,通过CCA算法融合文本和图像语义,生成最大相关矩阵;同时引入2,ρ范式来减少原始数据集的噪声和冗余信息,使哈希函数具有更好的鲁棒性。实验结果表明,CCA-SCH算法在实验数据集上的均值平均准确率(Map)相较于基准算法提升了10%以上,体现了该算法更好的检索性能。 The development of the era of big data has resulted in an exponentially growing of Internet heterogeneous multimodal data including text,images,video and audio.Therefore,heterogeneous multimodal data retrieval has become a hot direction in big data research.However,heterogeneous multimodal data retrieval encounters two major challenges.The first challenge is how to express the similarity between heterogeneous data while there is a “semantic gap”.The second challenge is how to achieve accurate and efficient retrieval in massive data.To solve the problem that the hash retrieval algorithm ignores semantic similarity of heterogeneous multimodal data,this paper proposed a hash retrieval algorithm based on canonical correlation analysis-semantic consistency,named CCA-SCH.In order to keep semantic consistency within the modality,the CCA-SCH algorithm separately generates semantic models of text and image data.In order to keep semantic consistency between modalities,the CCA algorithm is used to fuse semantics of text and image data to generate the maximum correlation matrix.At the same time,the paradigm 2,ρ is introduced to overcome the noise and redundant information of original datasets,so that the hash function has better robustness.Experiment results show that the mean average precision(Map) of CCA-SCH algorithm is increased by over 10% compared to benchmark algorithms’ performances on experimental data sets,which embodies the better retrieval ability of proposed algorithm.
作者 陈凤 蒙祖强 CHEN Feng;MENG Zu-qiang(College of Computer and Electronics Information,Guangxi University,Nanning 530000,China)
出处 《计算机科学》 CSCD 北大核心 2019年第10期49-54,共6页 Computer Science
基金 国家自然科学基金项目(61762009)资助
关键词 哈希函数 语义一致性 CCA算法 异构多模态 Hash function Semantic consistency Canonical correlation analysis algorithm Heterogeneous multimodal
  • 相关文献

参考文献4

二级参考文献36

  • 1Charikar M. Similarity estimation techniques from rounding algorithms//Proceedings of the 34th Annual ACM Symposi- um on Theory of Computing. Montreal, Quebec, Canada, 2002 : 380-388.
  • 2Cappiello C, Francalanei C, Perniei B. Data quality assess- ment from user's perspeetive//Proceedings of the IQIS. Paris, France, 2004:68-73.
  • 3Aebi D, Perrochon L. Towards improving data quality// Proceedings of the International Conference on Information System and Management of Data. New Delhi, India, 1993: 273-281.
  • 4Hermans F, Dziengel N, Schiller J. Quality estimation based data fusion in wireless sensor networks//Proceedings of the Mobile Ad-hoc and Sensor Systems(MASS). Maeau, China, 2009 : 1068-1070.
  • 5Su L, Hu S, et al. Quality of information based data selec- tion and transmission in wireless sensor networks//Proceed- ings of the RTSS. San Juan, Puerto Rico, 2012:327-338.
  • 6Yates D, Nahum E, Kurose J, et al. Data quality and query cost in wireless sensor networks//Proceedings of the Perva- sive Computing and Communications Workshops. Pisa, Italy, 2007:272-278.
  • 7Yu B, Sycara K. Learning the quality of sensor data in dis- tributed decision fusion//Proceedings of the 9th International Conference on Information Fusion. Florence, Italy, 2006 : 1-8.
  • 8Klein A, Do H H, Hackenhroich G, et al. Representing data quality for streaming and static data//Proceedings of the 2007 IEEE 23rd International Conference on Data Engineer- ing Workshop. Istanbul, Turkey, 2007:3-10.
  • 9Geisler S, Weber S, Quix C. Ontology-based data quality framework for data stream applications//Proceedings of ICIQ. Adelaide, Australia, 2011 Cheng S, Li J, Cai Z. o (e)-approximation to physical world hy sensor networks//Proceedings of the INFOCOM. Turin, Italy, 2013:3084-3092.
  • 10Cheng S, Li J, Cai Z. o (e)-approximation to physical world by sensor networks//Proceedings of the INFOCOM. Turin, Italy, 2013: 3084-3092.

共引文献48

同被引文献110

引证文献11

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部