In recent years,the development of deep learning has further improved hash retrieval technology.Most of the existing hashing methods currently use Convolutional Neural Networks(CNNs)and Recurrent Neural Networks(RNNs)...In recent years,the development of deep learning has further improved hash retrieval technology.Most of the existing hashing methods currently use Convolutional Neural Networks(CNNs)and Recurrent Neural Networks(RNNs)to process image and text information,respectively.This makes images or texts subject to local constraints,and inherent label matching cannot capture finegrained information,often leading to suboptimal results.Driven by the development of the transformer model,we propose a framework called ViT2CMH mainly based on the Vision Transformer to handle deep Cross-modal Hashing tasks rather than CNNs or RNNs.Specifically,we use a BERT network to extract text features and use the vision transformer as the image network of the model.Finally,the features are transformed into hash codes for efficient and fast retrieval.We conduct extensive experiments on Microsoft COCO(MS-COCO)and Flickr30K,comparing with baselines of some hashing methods and image-text matching methods,showing that our method has better performance.展开更多
In recent years,cross-modal hash retrieval has become a popular research field because of its advantages of high efficiency and low storage.Cross-modal retrieval technology can be applied to search engines,crossmodalm...In recent years,cross-modal hash retrieval has become a popular research field because of its advantages of high efficiency and low storage.Cross-modal retrieval technology can be applied to search engines,crossmodalmedical processing,etc.The existing main method is to use amulti-label matching paradigm to finish the retrieval tasks.However,such methods do not use fine-grained information in the multi-modal data,which may lead to suboptimal results.To avoid cross-modal matching turning into label matching,this paper proposes an end-to-end fine-grained cross-modal hash retrieval method,which can focus more on the fine-grained semantic information of multi-modal data.First,the method refines the image features and no longer uses multiple labels to represent text features but uses BERT for processing.Second,this method uses the inference capabilities of the transformer encoder to generate global fine-grained features.Finally,in order to better judge the effect of the fine-grained model,this paper uses the datasets in the image text matching field instead of the traditional label-matching datasets.This article experiment on Microsoft COCO(MS-COCO)and Flickr30K datasets and compare it with the previous classicalmethods.The experimental results show that this method can obtain more advanced results in the cross-modal hash retrieval field.展开更多
In the era of big data rich inWe Media,the single mode retrieval system has been unable to meet people’s demand for information retrieval.This paper proposes a new solution to the problem of feature extraction and un...In the era of big data rich inWe Media,the single mode retrieval system has been unable to meet people’s demand for information retrieval.This paper proposes a new solution to the problem of feature extraction and unified mapping of different modes:A Cross-Modal Hashing retrieval algorithm based on Deep Residual Network(CMHR-DRN).The model construction is divided into two stages:The first stage is the feature extraction of different modal data,including the use of Deep Residual Network(DRN)to extract the image features,using the method of combining TF-IDF with the full connection network to extract the text features,and the obtained image and text features used as the input of the second stage.In the second stage,the image and text features are mapped into Hash functions by supervised learning,and the image and text features are mapped to the common binary Hamming space.In the process of mapping,the distance measurement of the original distance measurement and the common feature space are kept unchanged as far as possible to improve the accuracy of Cross-Modal Retrieval.In training the model,adaptive moment estimation(Adam)is used to calculate the adaptive learning rate of each parameter,and the stochastic gradient descent(SGD)is calculated to obtain the minimum loss function.The whole training process is completed on Caffe deep learning framework.Experiments show that the proposed algorithm CMHR-DRN based on Deep Residual Network has better retrieval performance and stronger advantages than other Cross-Modal algorithms CMFH,CMDN and CMSSH.展开更多
Existing speech retrieval systems are frequently confronted with expanding volumes of speech data.The dynamic updating strategy applied to construct the index can timely process to add or remove unnecessary speech dat...Existing speech retrieval systems are frequently confronted with expanding volumes of speech data.The dynamic updating strategy applied to construct the index can timely process to add or remove unnecessary speech data to meet users’real-time retrieval requirements.This study proposes an efficient method for retrieving encryption speech,using unsupervised deep hashing and B+ tree dynamic index,which avoid privacy leak-age of speech data and enhance the accuracy and efficiency of retrieval.The cloud’s encryption speech library is constructed by using the multi-threaded Dijk-Gentry-Halevi-Vaikuntanathan(DGHV)Fully Homomorphic Encryption(FHE)technique,which encrypts the original speech.In addition,this research employs Residual Neural Network18-Gated Recurrent Unit(ResNet18-GRU),which is used to learn the compact binary hash codes,store binary hash codes in the designed B+tree index table,and create a mapping relation of one to one between the binary hash codes and the corresponding encrypted speech.External B+tree index technology is applied to achieve dynamic index updating of the B+tree index table,thereby satisfying users’needs for real-time retrieval.The experimental results on THCHS-30 and TIMIT showed that the retrieval accuracy of the proposed method is more than 95.84%compared to the existing unsupervised hashing methods.The retrieval efficiency is greatly improved.Compared to the method of using hash index tables,and the speech data’s security is effectively guaranteed.展开更多
Searching for rare astronomical objects based on spectral data is similar to finding needles in a haystack owing to their rarity and the immense data volume gathered from large astronomical spectroscopic surveys.In th...Searching for rare astronomical objects based on spectral data is similar to finding needles in a haystack owing to their rarity and the immense data volume gathered from large astronomical spectroscopic surveys.In this paper,we propose a novel automated approximate nearest neighbor search method based on unsupervised hashing learning for rare spectra retrieval.The proposed method employs a multilayer neural network using autoencoders as the local compact feature extractors.Autoencoders are trained with a non-gradient learning algorithm with graph Laplace regularization.This algorithm also simplifies the tuning of network architecture hyperparameters and the learning control hyperparameters.Meanwhile,the graph Laplace regularization can enhance the robustness by reducing the sensibility to noise.The proposed model is data-driven;thus,it can be viewed as a general-purpose retrieval model.The proposed model is evaluated in experiments and real-world applications where rare Otype stars and their subclass are retrieved from the dataset obtained from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(Guo Shoujing Telescope).The experimental and application results show that the proposed model outperformed the baseline methods,demonstrating the effectiveness of the proposed method in rare spectra retrieval tasks.展开更多
现有无监督哈希检索算法的关注点在于哈希映射过程中的信息损失以及生成哈希的质量问题,忽略了图像特征本身对检索精度的影响。为进一步提高检索的精度,提出一种改进的基于特征共现的无监督哈希检索算法(Unsupervised Hash retrieval al...现有无监督哈希检索算法的关注点在于哈希映射过程中的信息损失以及生成哈希的质量问题,忽略了图像特征本身对检索精度的影响。为进一步提高检索的精度,提出一种改进的基于特征共现的无监督哈希检索算法(Unsupervised Hash retrieval algorithm based on Feature Co-occurrence,UHFC)。该算法共分为两个阶段:深度特征提取和无监督哈希生成。为提高图像特征的质量,UHFC在卷积神经网络(Convolutional Neural Network,CNN)结构的最后一层卷积后引入了共现层,用来提取特征之间的依赖关系。并用共现激活值的均值来表示共现程度,解决原共现操作存在相同两个通道的共现值不一致的问题;接着,在特征融合部分UHFC设计一种适用于共现特征融合的,结合空间注意力机制的注意特征融合方法(Attention Feature Fusion method based on Spatial attention,AFF-S)。通过注意力机制自主学习共现特征与深度特征融合的权重,降低特征融合过程中背景因素的干扰,提高最终图像特征的表达能力。最后,根据最优传输策略,UHFC采用双半分布哈希编码对图像特征到哈希码的映射过程进行监督,并在哈希层后添加一层分类层通过KL损失进一步提高哈希码所包含的图片信息,整个训练过程中无需数据集的标注,实现无监督哈希的生成。实验表明,UHFC对哈希编码质量改善较好,在Flickr25k和Nus-wide数据集上其平均均值精度(mean Average Precision,mAP)分别达到了87.8%和82.8%,相比于baseline方法分别提高了2.1%与1.2%,效果明显。展开更多
基金This work was partially supported by Science and Technology Project of Chongqing Education Commission of China(KJZD-K202200513)National Natural Science Foundation of China(61370205)+1 种基金Chongqing Normal University Fund(22XLB003)Chongqing Education Science Planning Project(2021-GX-320).
文摘In recent years,the development of deep learning has further improved hash retrieval technology.Most of the existing hashing methods currently use Convolutional Neural Networks(CNNs)and Recurrent Neural Networks(RNNs)to process image and text information,respectively.This makes images or texts subject to local constraints,and inherent label matching cannot capture finegrained information,often leading to suboptimal results.Driven by the development of the transformer model,we propose a framework called ViT2CMH mainly based on the Vision Transformer to handle deep Cross-modal Hashing tasks rather than CNNs or RNNs.Specifically,we use a BERT network to extract text features and use the vision transformer as the image network of the model.Finally,the features are transformed into hash codes for efficient and fast retrieval.We conduct extensive experiments on Microsoft COCO(MS-COCO)and Flickr30K,comparing with baselines of some hashing methods and image-text matching methods,showing that our method has better performance.
基金This work was partially supported by Chongqing Natural Science Foundation of China(Grant No.CSTB2022NSCQ-MSX1417)the Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJZD-K202200513)+2 种基金Chongqing Normal University Fund(Grant No.22XLB003)Chongqing Education Science Planning Project(Grant No.2021-GX-320)Humanities and Social Sciences Project of Chongqing Education Commission of China(Grant No.22SKGH100).
文摘In recent years,cross-modal hash retrieval has become a popular research field because of its advantages of high efficiency and low storage.Cross-modal retrieval technology can be applied to search engines,crossmodalmedical processing,etc.The existing main method is to use amulti-label matching paradigm to finish the retrieval tasks.However,such methods do not use fine-grained information in the multi-modal data,which may lead to suboptimal results.To avoid cross-modal matching turning into label matching,this paper proposes an end-to-end fine-grained cross-modal hash retrieval method,which can focus more on the fine-grained semantic information of multi-modal data.First,the method refines the image features and no longer uses multiple labels to represent text features but uses BERT for processing.Second,this method uses the inference capabilities of the transformer encoder to generate global fine-grained features.Finally,in order to better judge the effect of the fine-grained model,this paper uses the datasets in the image text matching field instead of the traditional label-matching datasets.This article experiment on Microsoft COCO(MS-COCO)and Flickr30K datasets and compare it with the previous classicalmethods.The experimental results show that this method can obtain more advanced results in the cross-modal hash retrieval field.
文摘In the era of big data rich inWe Media,the single mode retrieval system has been unable to meet people’s demand for information retrieval.This paper proposes a new solution to the problem of feature extraction and unified mapping of different modes:A Cross-Modal Hashing retrieval algorithm based on Deep Residual Network(CMHR-DRN).The model construction is divided into two stages:The first stage is the feature extraction of different modal data,including the use of Deep Residual Network(DRN)to extract the image features,using the method of combining TF-IDF with the full connection network to extract the text features,and the obtained image and text features used as the input of the second stage.In the second stage,the image and text features are mapped into Hash functions by supervised learning,and the image and text features are mapped to the common binary Hamming space.In the process of mapping,the distance measurement of the original distance measurement and the common feature space are kept unchanged as far as possible to improve the accuracy of Cross-Modal Retrieval.In training the model,adaptive moment estimation(Adam)is used to calculate the adaptive learning rate of each parameter,and the stochastic gradient descent(SGD)is calculated to obtain the minimum loss function.The whole training process is completed on Caffe deep learning framework.Experiments show that the proposed algorithm CMHR-DRN based on Deep Residual Network has better retrieval performance and stronger advantages than other Cross-Modal algorithms CMFH,CMDN and CMSSH.
基金supported by the NationalNatural Science Foundation of China(No.61862041).
文摘Existing speech retrieval systems are frequently confronted with expanding volumes of speech data.The dynamic updating strategy applied to construct the index can timely process to add or remove unnecessary speech data to meet users’real-time retrieval requirements.This study proposes an efficient method for retrieving encryption speech,using unsupervised deep hashing and B+ tree dynamic index,which avoid privacy leak-age of speech data and enhance the accuracy and efficiency of retrieval.The cloud’s encryption speech library is constructed by using the multi-threaded Dijk-Gentry-Halevi-Vaikuntanathan(DGHV)Fully Homomorphic Encryption(FHE)technique,which encrypts the original speech.In addition,this research employs Residual Neural Network18-Gated Recurrent Unit(ResNet18-GRU),which is used to learn the compact binary hash codes,store binary hash codes in the designed B+tree index table,and create a mapping relation of one to one between the binary hash codes and the corresponding encrypted speech.External B+tree index technology is applied to achieve dynamic index updating of the B+tree index table,thereby satisfying users’needs for real-time retrieval.The experimental results on THCHS-30 and TIMIT showed that the retrieval accuracy of the proposed method is more than 95.84%compared to the existing unsupervised hashing methods.The retrieval efficiency is greatly improved.Compared to the method of using hash index tables,and the speech data’s security is effectively guaranteed.
基金supported by the Postdoctoral Science Foundation of China(Grant No.2020M682348)the Key Research Foundation of Henan Higher Education Institutions(Grant No.21A520002)+1 种基金the National Key Research and Development Program of China(Grant No.2018AAA0100203)the Joint Research Fund in Astronomy(Grant No.U1531242)under a cooperative agreement between the National Natural Science Foundation of China and the Chinese Academy of Sciences(CAS)。
文摘Searching for rare astronomical objects based on spectral data is similar to finding needles in a haystack owing to their rarity and the immense data volume gathered from large astronomical spectroscopic surveys.In this paper,we propose a novel automated approximate nearest neighbor search method based on unsupervised hashing learning for rare spectra retrieval.The proposed method employs a multilayer neural network using autoencoders as the local compact feature extractors.Autoencoders are trained with a non-gradient learning algorithm with graph Laplace regularization.This algorithm also simplifies the tuning of network architecture hyperparameters and the learning control hyperparameters.Meanwhile,the graph Laplace regularization can enhance the robustness by reducing the sensibility to noise.The proposed model is data-driven;thus,it can be viewed as a general-purpose retrieval model.The proposed model is evaluated in experiments and real-world applications where rare Otype stars and their subclass are retrieved from the dataset obtained from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope(Guo Shoujing Telescope).The experimental and application results show that the proposed model outperformed the baseline methods,demonstrating the effectiveness of the proposed method in rare spectra retrieval tasks.
文摘现有无监督哈希检索算法的关注点在于哈希映射过程中的信息损失以及生成哈希的质量问题,忽略了图像特征本身对检索精度的影响。为进一步提高检索的精度,提出一种改进的基于特征共现的无监督哈希检索算法(Unsupervised Hash retrieval algorithm based on Feature Co-occurrence,UHFC)。该算法共分为两个阶段:深度特征提取和无监督哈希生成。为提高图像特征的质量,UHFC在卷积神经网络(Convolutional Neural Network,CNN)结构的最后一层卷积后引入了共现层,用来提取特征之间的依赖关系。并用共现激活值的均值来表示共现程度,解决原共现操作存在相同两个通道的共现值不一致的问题;接着,在特征融合部分UHFC设计一种适用于共现特征融合的,结合空间注意力机制的注意特征融合方法(Attention Feature Fusion method based on Spatial attention,AFF-S)。通过注意力机制自主学习共现特征与深度特征融合的权重,降低特征融合过程中背景因素的干扰,提高最终图像特征的表达能力。最后,根据最优传输策略,UHFC采用双半分布哈希编码对图像特征到哈希码的映射过程进行监督,并在哈希层后添加一层分类层通过KL损失进一步提高哈希码所包含的图片信息,整个训练过程中无需数据集的标注,实现无监督哈希的生成。实验表明,UHFC对哈希编码质量改善较好,在Flickr25k和Nus-wide数据集上其平均均值精度(mean Average Precision,mAP)分别达到了87.8%和82.8%,相比于baseline方法分别提高了2.1%与1.2%,效果明显。