Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input source.The contents of video captioning are limited since few studies employed external corpus information to guide...Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input source.The contents of video captioning are limited since few studies employed external corpus information to guide the generation of video captioning,which is not conducive to the accurate descrip-tion and understanding of video content.To address this issue,a novel video captioning method guided by a sentence retrieval generation network(ED-SRG)is proposed in this paper.First,a ResNeXt network model,an efficient convolutional network for online video understanding(ECO)model,and a long short-term memory(LSTM)network model are integrated to construct an encoder-decoder,which is utilized to extract the 2D features,3D features,and object features of video data respectively.These features are decoded to generate textual sentences that conform to video content for sentence retrieval.Then,a sentence-transformer network model is employed to retrieve different sentences in an external corpus that are semantically similar to the above textual sentences.The candidate sentences are screened out through similarity measurement.Finally,a novel GPT-2 network model is constructed based on GPT-2 network structure.The model introduces a designed random selector to randomly select predicted words with a high probability in the corpus,which is used to guide and generate textual sentences that are more in line with human natural language expressions.The proposed method in this paper is compared with several existing works by experiments.The results show that the indicators BLEU-4,CIDEr,ROUGE_L,and METEOR are improved by 3.1%,1.3%,0.3%,and 1.5%on a public dataset MSVD and 1.3%,0.5%,0.2%,1.9%on a public dataset MSR-VTT respectively.It can be seen that the proposed method in this paper can generate video captioning with richer semantics than several state-of-the-art approaches.展开更多
The implementation of content-based image retrieval(CBIR)mainly depends on two key technologies:image feature extraction and image feature matching.In this paper,we extract the color features based on Global Color His...The implementation of content-based image retrieval(CBIR)mainly depends on two key technologies:image feature extraction and image feature matching.In this paper,we extract the color features based on Global Color Histogram(GCH)and texture features based on Gray Level Co-occurrence Matrix(GLCM).In order to obtain the effective and representative features of the image,we adopt the fuzzy mathematical algorithm in the process of color feature extraction and texture feature extraction respectively.And we combine the fuzzy color feature vector with the fuzzy texture feature vector to form the comprehensive fuzzy feature vector of the image according to a certain way.Image feature matching mainly depends on the similarity between two image feature vectors.In this paper,we propose a novel similarity measure method based on k-Nearest Neighbors(kNN)and fuzzy mathematical algorithm(SBkNNF).Finding out the k nearest neighborhood images of the query image from the image data set according to an appropriate similarity measure method.Using the k similarity values between the query image and its k neighborhood images to constitute the new k-dimensional fuzzy feature vector corresponding to the query image.And using the k similarity values between the retrieved image and the k neighborhood images of the query image to constitute the new k-dimensional fuzzy feature vector corresponding to the retrieved image.Calculating the similarity between the two kdimensional fuzzy feature vector according to a certain fuzzy similarity algorithm to measure the similarity between the query image and the retrieved image.Extensive experiments are carried out on three data sets:WANG data set,Corel-5k data set and Corel-10k data set.The experimental results show that the outperforming retrieval performance of our proposed CBIR system with the other CBIR systems.展开更多
AIM:To present a content-based image retrieval(CBIR) system that supports the classification of breast tissue density and can be used in the processing chain to adapt parameters for lesion segmentation and classificat...AIM:To present a content-based image retrieval(CBIR) system that supports the classification of breast tissue density and can be used in the processing chain to adapt parameters for lesion segmentation and classification.METHODS:Breast density is characterized by image texture using singular value decomposition(SVD) and histograms.Pattern similarity is computed by a support vector machine(SVM) to separate the four BI-RADS tissue categories.The crucial number of remaining singular values is varied(SVD),and linear,radial,and polynomial kernels are investigated(SVM).The system is supported by a large reference database for training and evaluation.Experiments are based on 5-fold cross validation.RESULTS:Adopted from DDSM,MIAS,LLNL,and RWTH datasets,the reference database is composed of over 10000 various mammograms with unified and reliable ground truth.An average precision of 82.14% is obtained using 25 singular values(SVD),polynomial kernel and the one-against-one(SVM).CONCLUSION:Breast density characterization using SVD allied with SVM for image retrieval enable the development of a CBIR system that can effectively aid radiologists in their diagnosis.展开更多
<div style="text-align:justify;"> Digital image collection as rapidly increased along with the development of computer network. Image retrieval system was developed purposely to provide an efficient to...<div style="text-align:justify;"> Digital image collection as rapidly increased along with the development of computer network. Image retrieval system was developed purposely to provide an efficient tool for a set of images from a collection of images in the database that matches the user’s requirements in similarity evaluations such as image content similarity, edge, and color similarity. Retrieving images based on the content which is color, texture, and shape is called content based image retrieval (CBIR). The content is actually the feature of an image and these features are extracted and used as the basis for a similarity check between images. The algorithms used to calculate the similarity between extracted features. There are two kinds of content based image retrieval which are general image retrieval and application specific image retrieval. For the general image retrieval, the goal of the query is to obtain images with the same object as the query. Such CBIR imitates web search engines for images rather than for text. For application specific, the purpose tries to match a query image to a collection of images of a specific type such as fingerprints image and x-ray. In this paper, the general architecture, various functional components, and techniques of CBIR system are discussed. CBIR techniques discussed in this paper are categorized as CBIR using color, CBIR using texture, and CBIR using shape features. This paper also describe about the comparison study about color features, texture features, shape features, and combined features (hybrid techniques) in terms of several parameters. The parameters are precision, recall and response time. </div>展开更多
In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts...In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts. To facilitate the decision-making in the health-care and the related areas, in this paper, a two-step content-based medical image retrieval algorithm is proposed. Firstly, in the preprocessing step, the image segmentation is performed to distinguish image objects, and on the basis of the ...展开更多
An android-based lace image retrieval system based on content-based image retrieval (CBIR) technique is presented. This paper applies shape and texture features of lace image in our system and proposes a hierarchical ...An android-based lace image retrieval system based on content-based image retrieval (CBIR) technique is presented. This paper applies shape and texture features of lace image in our system and proposes a hierarchical multifeature scheme to facilitate coarseto-fine matching for efficient lace image retrieval in a large database. Experimental results demonstrate the feasibility and effectiveness of the proposed system meet the requirements of realtime.展开更多
The problem considered in this paper is how to detect the degree of similarity in the content of digital images useful in image retrieval,i.e.,to what extent is the content of a query image similar to content of other...The problem considered in this paper is how to detect the degree of similarity in the content of digital images useful in image retrieval,i.e.,to what extent is the content of a query image similar to content of other images.The solution to this problem results from the detection of subsets that are rough sets contained in covers of digital images determined by perceptual tolerance relations(PTRs).Such relations are defined within the context of perceptual representative spaces that hearken back to work by J.H.Poincare on representative spaces as models of physical continua.Classes determined by a PTR provide content useful in content-based image retrieval(CBIR).In addition,tolerance classes provide a means of determining when subsets of image covers are tolerance rough sets(TRSs).It is the nearness of TRSs present in image tolerance spaces that provide a promising approach to CBIR,especially in cases such as satellite images or aircraft identification where there are subtle differences between pairs of digital images,making it difficult to quantify the similarities between such images.The contribution of this article is the introduction of the nearness of tolerance rough sets as an effective means of measuring digital image similarities and,as a significant consequence,successfully carrying out CBIR.展开更多
<div style="text-align:justify;"> An image retrieval system was developed purposely to provide an efficient tool for a set of images from a collection of images in the large database that matches the u...<div style="text-align:justify;"> An image retrieval system was developed purposely to provide an efficient tool for a set of images from a collection of images in the large database that matches the user’s requirements in similarity evaluations such as image content similarity, edge, and colour similarity. Retrieving images based on the contents which are colour, texture, and shape is called content-based image retrieval (CBIR). This paper discusses and describes about the colour features technique for image retrieval systems. Several colour features technique and algorithms produced by the previous researcher are used to calculate the similarity between extracted features. This paper also describes about the specific technique about the colour basis features and combined features (hybrid techniques) between colour and shape features. </div>展开更多
Cloth image retrieval in E-Commerce is a challenging task. In this paper, we propose an effective approach to solve this problem. Our work chooses three features for retrieval: (1) description (2) category (3) color f...Cloth image retrieval in E-Commerce is a challenging task. In this paper, we propose an effective approach to solve this problem. Our work chooses three features for retrieval: (1) description (2) category (3) color features. It can handle clothes with multiple colors, complex background, and model disturbances. To evaluate the proposed method, we collect a set of women cloth images from Amazon.com. Results reported here demonstrate the robustness and effectiveness of our retrieval method.展开更多
Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature extraction.However,the training of deep neural networks requires a large number...Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature extraction.However,the training of deep neural networks requires a large number of labeled data,which limits the application.Self-supervised learning is a more general approach in unlabeled scenarios.A method of fine-tuning feature extraction networks based on masked learning is proposed.Masked autoencoders(MAE)are used in the fine-tune vision transformer(ViT)model.In addition,the scheme of extracting image descriptors is discussed.The encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area pixels.The method works well on category-level image retrieval datasets with marked improvements in instance-level datasets.For the instance-level datasets Oxford5k and Paris6k,the retrieval accuracy of the base model is improved by 7%and 17%compared to that of the original model,respectively.展开更多
To solve the problem that the existing ciphertext domain image retrieval system is challenging to balance security,retrieval efficiency,and retrieval accuracy.This research suggests a searchable encryption and deep ha...To solve the problem that the existing ciphertext domain image retrieval system is challenging to balance security,retrieval efficiency,and retrieval accuracy.This research suggests a searchable encryption and deep hashing-based secure image retrieval technique that extracts more expressive image features and constructs a secure,searchable encryption scheme.First,a deep learning framework based on residual network and transfer learn-ing model is designed to extract more representative image deep features.Secondly,the central similarity is used to quantify and construct the deep hash sequence of features.The Paillier homomorphic encryption encrypts the deep hash sequence to build a high-security and low-complexity searchable index.Finally,according to the additive homomorphic property of Paillier homomorphic encryption,a similarity measurement method suitable for com-puting in the retrieval system’s security is ensured by the encrypted domain.The experimental results,which were obtained on Web Image Database from the National University of Singapore(NUS-WIDE),Microsoft Common Objects in Context(MS COCO),and ImageNet data sets,demonstrate the system’s robust security and precise retrieval,the proposed scheme can achieve efficient image retrieval without revealing user privacy.The retrieval accuracy is improved by at least 37%compared to traditional hashing schemes.At the same time,the retrieval time is saved by at least 9.7%compared to the latest deep hashing schemes.展开更多
Content-based copy detection (CBCD) is widely used in copyright control for protecting unauthorized use of digital video and its key issue is to extract robust fingerprint against different attacked versions of the sa...Content-based copy detection (CBCD) is widely used in copyright control for protecting unauthorized use of digital video and its key issue is to extract robust fingerprint against different attacked versions of the same video. In this paper, the “natural parts” (coarse scales) of the Shearlet coefficients are used to generate robust video fingerprints for content-based video copy detection applications. The proposed Shearlet-based video fingerprint (SBVF) is constructed by the Shearlet coefficients in Scale 1 (lowest coarse scale) for revealing the spatial features and Scale 2 (second lowest coarse scale) for revealing the directional features. To achieve spatiotemporal natural, the proposed SBVF is applied to Temporal Informative Representative Image (TIRI) of the video sequences for final fingerprints generation. A TIRI-SBVF based CBCD system is constructed with use of Invert Index File (IIF) hash searching approach for performance evaluation and comparison using TRECVID 2010 dataset. Common attacks are imposed in the queries such as luminance attacks (luminance change, salt and pepper noise, Gaussian noise, text insertion);geometry attacks (letter box and rotation);and temporal attacks (dropping frame, time shifting). The experimental results demonstrate that the proposed TIRI-SBVF fingerprinting algorithm is robust on CBCD applications on most of the attacks. It can achieve an average F1 score of about 0.99, less than 0.01% of false positive rate (FPR) and 97% accuracy of localization.展开更多
Medical video repositories play important roles for many health-related issues such as medical imaging, medical research and education, medical diagnostics and training of medical professionals. Due to the increasing ...Medical video repositories play important roles for many health-related issues such as medical imaging, medical research and education, medical diagnostics and training of medical professionals. Due to the increasing availability of the digital video data, indexing, annotating and the retrieval of the information are crucial. Since performing these processes are both computationally expensive and time consuming, automated systems are needed. In this paper, we present a medical video segmentation and retrieval research initiative. We describe the key components of the system including video segmentation engine, image retrieval engine and image quality assessment module. The aim of this research is to provide an online tool for indexing, browsing and retrieving the neurosurgical videotapes. This tool will allow people to retrieve the specific information in a long video tape they are interested in instead of looking through the entire content.展开更多
There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information...There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information. Thus, there is an urgent need to retrieve, organize, and automate videos. Video retrieval is a vital process in multimedia applications such as video search engines, digital museums, and video-on-demand broadcasting. In this paper, the different approaches of video retrieval are outlined and briefly categorized. Moreover, the different methods that bridge the semantic gap in video retrieval are discussed in more details.展开更多
In this paper, we present machine learning algorithms and systems for similar video retrieval. Here, the query is itself a video. For the similarity measurement, exemplars, or representative frames in each video, are ...In this paper, we present machine learning algorithms and systems for similar video retrieval. Here, the query is itself a video. For the similarity measurement, exemplars, or representative frames in each video, are extracted by unsupervised learning. For this learning, we chose the order-aware competitive learning. After obtaining a set of exemplars for each video, the similarity is computed. Because the numbers and positions of the exemplars are different in each video, we use a similarity computing method called M-distance, which generalizes existing global and local alignment methods using followers to the exemplars. To represent each frame in the video, this paper emphasizes the Frame Signature of the ISO/IEC standard so that the total system, along with its graphical user interface, becomes practical. Experiments on the detection of inserted plagiaristic scenes showed excellent precision-recall curves, with precision values very close to 1. Thus, the proposed system can work as a plagiarism detector for videos. In addition, this method can be regarded as the structuring of unstructured data via numerical labeling by exemplars. Finally, further sophistication of this labeling is discussed.展开更多
Recently, 3D display technology, and content creation tools have been undergone rigorous development and as a result they have been widely adopted by home and professional users. 3D digital repositories are increasing...Recently, 3D display technology, and content creation tools have been undergone rigorous development and as a result they have been widely adopted by home and professional users. 3D digital repositories are increasing and becoming available ubiquitously. However, searching and visualizing 3D content remains a great challenge. In this paper, we propose and present the development of a novel approach for creating hypervideos, which ease the 3D content search and retrieval. It is called the dynamic hyperlinker for 3D content search and retrieval process. It advances 3D multimedia navigability and searchability by creating dynamic links for selectable and clickable objects in the video scene whilst the user consumes the 3D video clip. The proposed system involves 3D video processing, such as detecting/tracking clickable objects, annotating objects, and metadata engineering including 3D content descriptive protocol. Such system attracts the attention from both home and professional users and more specifically broadcasters and digital content providers. The experiment is conducted on full parallax holoscopic 3D videos “also known as integral images”.展开更多
基金supported in part by the National Natural Science Foundation of China under Grants 62273272 and 61873277in part by the Chinese Postdoctoral Science Foundation under Grant 2020M673446+1 种基金in part by the Key Research and Development Program of Shaanxi Province under Grant 2023-YBGY-243in part by the Youth Innovation Team of Shaanxi Universities.
文摘Currently,the video captioning models based on an encoder-decoder mainly rely on a single video input source.The contents of video captioning are limited since few studies employed external corpus information to guide the generation of video captioning,which is not conducive to the accurate descrip-tion and understanding of video content.To address this issue,a novel video captioning method guided by a sentence retrieval generation network(ED-SRG)is proposed in this paper.First,a ResNeXt network model,an efficient convolutional network for online video understanding(ECO)model,and a long short-term memory(LSTM)network model are integrated to construct an encoder-decoder,which is utilized to extract the 2D features,3D features,and object features of video data respectively.These features are decoded to generate textual sentences that conform to video content for sentence retrieval.Then,a sentence-transformer network model is employed to retrieve different sentences in an external corpus that are semantically similar to the above textual sentences.The candidate sentences are screened out through similarity measurement.Finally,a novel GPT-2 network model is constructed based on GPT-2 network structure.The model introduces a designed random selector to randomly select predicted words with a high probability in the corpus,which is used to guide and generate textual sentences that are more in line with human natural language expressions.The proposed method in this paper is compared with several existing works by experiments.The results show that the indicators BLEU-4,CIDEr,ROUGE_L,and METEOR are improved by 3.1%,1.3%,0.3%,and 1.5%on a public dataset MSVD and 1.3%,0.5%,0.2%,1.9%on a public dataset MSR-VTT respectively.It can be seen that the proposed method in this paper can generate video captioning with richer semantics than several state-of-the-art approaches.
基金This research was supported by the National Natural Science Foundation of China(Grant Number:61702310)the National Natural Science Foundation of China(Grant Number:61401260).
文摘The implementation of content-based image retrieval(CBIR)mainly depends on two key technologies:image feature extraction and image feature matching.In this paper,we extract the color features based on Global Color Histogram(GCH)and texture features based on Gray Level Co-occurrence Matrix(GLCM).In order to obtain the effective and representative features of the image,we adopt the fuzzy mathematical algorithm in the process of color feature extraction and texture feature extraction respectively.And we combine the fuzzy color feature vector with the fuzzy texture feature vector to form the comprehensive fuzzy feature vector of the image according to a certain way.Image feature matching mainly depends on the similarity between two image feature vectors.In this paper,we propose a novel similarity measure method based on k-Nearest Neighbors(kNN)and fuzzy mathematical algorithm(SBkNNF).Finding out the k nearest neighborhood images of the query image from the image data set according to an appropriate similarity measure method.Using the k similarity values between the query image and its k neighborhood images to constitute the new k-dimensional fuzzy feature vector corresponding to the query image.And using the k similarity values between the retrieved image and the k neighborhood images of the query image to constitute the new k-dimensional fuzzy feature vector corresponding to the retrieved image.Calculating the similarity between the two kdimensional fuzzy feature vector according to a certain fuzzy similarity algorithm to measure the similarity between the query image and the retrieved image.Extensive experiments are carried out on three data sets:WANG data set,Corel-5k data set and Corel-10k data set.The experimental results show that the outperforming retrieval performance of our proposed CBIR system with the other CBIR systems.
基金Supported by CNPq-Brazil,Grants 306193/2007-8,471518/ 2007-7,307373/2006-1 and 484893/2007-6,by FAPEMIG,Grant PPM 347/08,and by CAPESThe IRMA project is funded by the German Research Foundation(DFG),Le 1108/4 and Le 1108/9
文摘AIM:To present a content-based image retrieval(CBIR) system that supports the classification of breast tissue density and can be used in the processing chain to adapt parameters for lesion segmentation and classification.METHODS:Breast density is characterized by image texture using singular value decomposition(SVD) and histograms.Pattern similarity is computed by a support vector machine(SVM) to separate the four BI-RADS tissue categories.The crucial number of remaining singular values is varied(SVD),and linear,radial,and polynomial kernels are investigated(SVM).The system is supported by a large reference database for training and evaluation.Experiments are based on 5-fold cross validation.RESULTS:Adopted from DDSM,MIAS,LLNL,and RWTH datasets,the reference database is composed of over 10000 various mammograms with unified and reliable ground truth.An average precision of 82.14% is obtained using 25 singular values(SVD),polynomial kernel and the one-against-one(SVM).CONCLUSION:Breast density characterization using SVD allied with SVM for image retrieval enable the development of a CBIR system that can effectively aid radiologists in their diagnosis.
文摘<div style="text-align:justify;"> Digital image collection as rapidly increased along with the development of computer network. Image retrieval system was developed purposely to provide an efficient tool for a set of images from a collection of images in the database that matches the user’s requirements in similarity evaluations such as image content similarity, edge, and color similarity. Retrieving images based on the content which is color, texture, and shape is called content based image retrieval (CBIR). The content is actually the feature of an image and these features are extracted and used as the basis for a similarity check between images. The algorithms used to calculate the similarity between extracted features. There are two kinds of content based image retrieval which are general image retrieval and application specific image retrieval. For the general image retrieval, the goal of the query is to obtain images with the same object as the query. Such CBIR imitates web search engines for images rather than for text. For application specific, the purpose tries to match a query image to a collection of images of a specific type such as fingerprints image and x-ray. In this paper, the general architecture, various functional components, and techniques of CBIR system are discussed. CBIR techniques discussed in this paper are categorized as CBIR using color, CBIR using texture, and CBIR using shape features. This paper also describe about the comparison study about color features, texture features, shape features, and combined features (hybrid techniques) in terms of several parameters. The parameters are precision, recall and response time. </div>
文摘In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts. To facilitate the decision-making in the health-care and the related areas, in this paper, a two-step content-based medical image retrieval algorithm is proposed. Firstly, in the preprocessing step, the image segmentation is performed to distinguish image objects, and on the basis of the ...
基金the Innovation Fund Projects of Cooperation among Industries,Universities & Research Institutes of Jiangsu Province,China(Nos.BY2015019-11,BY2015019-20)the Fundamental Research Funds for the Central Universities,China(Nos.JUSRP51404A,JUSRP211A38)Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD),China(No.[2014].37)
文摘An android-based lace image retrieval system based on content-based image retrieval (CBIR) technique is presented. This paper applies shape and texture features of lace image in our system and proposes a hierarchical multifeature scheme to facilitate coarseto-fine matching for efficient lace image retrieval in a large database. Experimental results demonstrate the feasibility and effectiveness of the proposed system meet the requirements of realtime.
基金supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) research grants 194376 and 185986Manitoba Centre of Excellence Fund(MCEF) grant and Canadian Network Centre of Excellence(NCE) and Canadian Arthritis Network(CAN) grant SRI-BIO-05.
文摘The problem considered in this paper is how to detect the degree of similarity in the content of digital images useful in image retrieval,i.e.,to what extent is the content of a query image similar to content of other images.The solution to this problem results from the detection of subsets that are rough sets contained in covers of digital images determined by perceptual tolerance relations(PTRs).Such relations are defined within the context of perceptual representative spaces that hearken back to work by J.H.Poincare on representative spaces as models of physical continua.Classes determined by a PTR provide content useful in content-based image retrieval(CBIR).In addition,tolerance classes provide a means of determining when subsets of image covers are tolerance rough sets(TRSs).It is the nearness of TRSs present in image tolerance spaces that provide a promising approach to CBIR,especially in cases such as satellite images or aircraft identification where there are subtle differences between pairs of digital images,making it difficult to quantify the similarities between such images.The contribution of this article is the introduction of the nearness of tolerance rough sets as an effective means of measuring digital image similarities and,as a significant consequence,successfully carrying out CBIR.
文摘<div style="text-align:justify;"> An image retrieval system was developed purposely to provide an efficient tool for a set of images from a collection of images in the large database that matches the user’s requirements in similarity evaluations such as image content similarity, edge, and colour similarity. Retrieving images based on the contents which are colour, texture, and shape is called content-based image retrieval (CBIR). This paper discusses and describes about the colour features technique for image retrieval systems. Several colour features technique and algorithms produced by the previous researcher are used to calculate the similarity between extracted features. This paper also describes about the specific technique about the colour basis features and combined features (hybrid techniques) between colour and shape features. </div>
文摘Cloth image retrieval in E-Commerce is a challenging task. In this paper, we propose an effective approach to solve this problem. Our work chooses three features for retrieval: (1) description (2) category (3) color features. It can handle clothes with multiple colors, complex background, and model disturbances. To evaluate the proposed method, we collect a set of women cloth images from Amazon.com. Results reported here demonstrate the robustness and effectiveness of our retrieval method.
基金the Project of Introducing Urgently Needed Talents in Key Supporting Regions of Shandong Province,China(No.SDJQP20221805)。
文摘Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature extraction.However,the training of deep neural networks requires a large number of labeled data,which limits the application.Self-supervised learning is a more general approach in unlabeled scenarios.A method of fine-tuning feature extraction networks based on masked learning is proposed.Masked autoencoders(MAE)are used in the fine-tune vision transformer(ViT)model.In addition,the scheme of extracting image descriptors is discussed.The encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area pixels.The method works well on category-level image retrieval datasets with marked improvements in instance-level datasets.For the instance-level datasets Oxford5k and Paris6k,the retrieval accuracy of the base model is improved by 7%and 17%compared to that of the original model,respectively.
基金supported by the National Natural Science Foundation of China(No.61862041).
文摘To solve the problem that the existing ciphertext domain image retrieval system is challenging to balance security,retrieval efficiency,and retrieval accuracy.This research suggests a searchable encryption and deep hashing-based secure image retrieval technique that extracts more expressive image features and constructs a secure,searchable encryption scheme.First,a deep learning framework based on residual network and transfer learn-ing model is designed to extract more representative image deep features.Secondly,the central similarity is used to quantify and construct the deep hash sequence of features.The Paillier homomorphic encryption encrypts the deep hash sequence to build a high-security and low-complexity searchable index.Finally,according to the additive homomorphic property of Paillier homomorphic encryption,a similarity measurement method suitable for com-puting in the retrieval system’s security is ensured by the encrypted domain.The experimental results,which were obtained on Web Image Database from the National University of Singapore(NUS-WIDE),Microsoft Common Objects in Context(MS COCO),and ImageNet data sets,demonstrate the system’s robust security and precise retrieval,the proposed scheme can achieve efficient image retrieval without revealing user privacy.The retrieval accuracy is improved by at least 37%compared to traditional hashing schemes.At the same time,the retrieval time is saved by at least 9.7%compared to the latest deep hashing schemes.
文摘Content-based copy detection (CBCD) is widely used in copyright control for protecting unauthorized use of digital video and its key issue is to extract robust fingerprint against different attacked versions of the same video. In this paper, the “natural parts” (coarse scales) of the Shearlet coefficients are used to generate robust video fingerprints for content-based video copy detection applications. The proposed Shearlet-based video fingerprint (SBVF) is constructed by the Shearlet coefficients in Scale 1 (lowest coarse scale) for revealing the spatial features and Scale 2 (second lowest coarse scale) for revealing the directional features. To achieve spatiotemporal natural, the proposed SBVF is applied to Temporal Informative Representative Image (TIRI) of the video sequences for final fingerprints generation. A TIRI-SBVF based CBCD system is constructed with use of Invert Index File (IIF) hash searching approach for performance evaluation and comparison using TRECVID 2010 dataset. Common attacks are imposed in the queries such as luminance attacks (luminance change, salt and pepper noise, Gaussian noise, text insertion);geometry attacks (letter box and rotation);and temporal attacks (dropping frame, time shifting). The experimental results demonstrate that the proposed TIRI-SBVF fingerprinting algorithm is robust on CBCD applications on most of the attacks. It can achieve an average F1 score of about 0.99, less than 0.01% of false positive rate (FPR) and 97% accuracy of localization.
文摘Medical video repositories play important roles for many health-related issues such as medical imaging, medical research and education, medical diagnostics and training of medical professionals. Due to the increasing availability of the digital video data, indexing, annotating and the retrieval of the information are crucial. Since performing these processes are both computationally expensive and time consuming, automated systems are needed. In this paper, we present a medical video segmentation and retrieval research initiative. We describe the key components of the system including video segmentation engine, image retrieval engine and image quality assessment module. The aim of this research is to provide an online tool for indexing, browsing and retrieving the neurosurgical videotapes. This tool will allow people to retrieve the specific information in a long video tape they are interested in instead of looking through the entire content.
文摘There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information. Thus, there is an urgent need to retrieve, organize, and automate videos. Video retrieval is a vital process in multimedia applications such as video search engines, digital museums, and video-on-demand broadcasting. In this paper, the different approaches of video retrieval are outlined and briefly categorized. Moreover, the different methods that bridge the semantic gap in video retrieval are discussed in more details.
文摘In this paper, we present machine learning algorithms and systems for similar video retrieval. Here, the query is itself a video. For the similarity measurement, exemplars, or representative frames in each video, are extracted by unsupervised learning. For this learning, we chose the order-aware competitive learning. After obtaining a set of exemplars for each video, the similarity is computed. Because the numbers and positions of the exemplars are different in each video, we use a similarity computing method called M-distance, which generalizes existing global and local alignment methods using followers to the exemplars. To represent each frame in the video, this paper emphasizes the Frame Signature of the ISO/IEC standard so that the total system, along with its graphical user interface, becomes practical. Experiments on the detection of inserted plagiaristic scenes showed excellent precision-recall curves, with precision values very close to 1. Thus, the proposed system can work as a plagiarism detector for videos. In addition, this method can be regarded as the structuring of unstructured data via numerical labeling by exemplars. Finally, further sophistication of this labeling is discussed.
文摘Recently, 3D display technology, and content creation tools have been undergone rigorous development and as a result they have been widely adopted by home and professional users. 3D digital repositories are increasing and becoming available ubiquitously. However, searching and visualizing 3D content remains a great challenge. In this paper, we propose and present the development of a novel approach for creating hypervideos, which ease the 3D content search and retrieval. It is called the dynamic hyperlinker for 3D content search and retrieval process. It advances 3D multimedia navigability and searchability by creating dynamic links for selectable and clickable objects in the video scene whilst the user consumes the 3D video clip. The proposed system involves 3D video processing, such as detecting/tracking clickable objects, annotating objects, and metadata engineering including 3D content descriptive protocol. Such system attracts the attention from both home and professional users and more specifically broadcasters and digital content providers. The experiment is conducted on full parallax holoscopic 3D videos “also known as integral images”.