Every day,websites and personal archives create more and more photos.The size of these archives is immeasurable.The comfort of use of these huge digital image gatherings donates to their admiration.However,not all of ...Every day,websites and personal archives create more and more photos.The size of these archives is immeasurable.The comfort of use of these huge digital image gatherings donates to their admiration.However,not all of these folders deliver relevant indexing information.From the outcomes,it is dif-ficult to discover data that the user can be absorbed in.Therefore,in order to determine the significance of the data,it is important to identify the contents in an informative manner.Image annotation can be one of the greatest problematic domains in multimedia research and computer vision.Hence,in this paper,Adap-tive Convolutional Deep Learning Model(ACDLM)is developed for automatic image annotation.Initially,the databases are collected from the open-source system which consists of some labelled images(for training phase)and some unlabeled images{Corel 5 K,MSRC v2}.After that,the images are sent to the pre-processing step such as colour space quantization and texture color class map.The pre-processed images are sent to the segmentation approach for efficient labelling technique using J-image segmentation(JSEG).Thefinal step is an auto-matic annotation using ACDLM which is a combination of Convolutional Neural Network(CNN)and Honey Badger Algorithm(HBA).Based on the proposed classifier,the unlabeled images are labelled.The proposed methodology is imple-mented in MATLAB and performance is evaluated by performance metrics such as accuracy,precision,recall and F1_Measure.With the assistance of the pro-posed methodology,the unlabeled images are labelled.展开更多
In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficie...In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.展开更多
Automatic image annotation has been an active topic of research in computer vision and pattern recognition for decades.A two stage automatic image annotation method based on Gaussian mixture model(GMM) and random walk...Automatic image annotation has been an active topic of research in computer vision and pattern recognition for decades.A two stage automatic image annotation method based on Gaussian mixture model(GMM) and random walk model(abbreviated as GMM-RW) is presented.To start with,GMM fitted by the rival penalized expectation maximization(RPEM) algorithm is employed to estimate the posterior probabilities of each annotation keyword.Subsequently,a random walk process over the constructed label similarity graph is implemented to further mine the potential correlations of the candidate annotations so as to capture the refining results,which plays a crucial role in semantic based image retrieval.The contributions exhibited in this work are multifold.First,GMM is exploited to capture the initial semantic annotations,especially the RPEM algorithm is utilized to train the model that can determine the number of components in GMM automatically.Second,a label similarity graph is constructed by a weighted linear combination of label similarity and visual similarity of images associated with the corresponding labels,which is able to avoid the phenomena of polysemy and synonym efficiently during the image annotation process.Third,the random walk is implemented over the constructed label graph to further refine the candidate set of annotations generated by GMM.Conducted experiments on the standard Corel5 k demonstrate that GMM-RW is significantly more effective than several state-of-the-arts regarding their effectiveness and efficiency in the task of automatic image annotation.展开更多
This paper presents a new method for refining image annotation by integrating probabilistic la- tent semantic analysis (PLSA) with conditional random field (CRF). First a PLSA model with asymmetric modalities is c...This paper presents a new method for refining image annotation by integrating probabilistic la- tent semantic analysis (PLSA) with conditional random field (CRF). First a PLSA model with asymmetric modalities is constructed to predict a candidate set of annotations with confidence scores, and then model semantic relationship among the candidate annotations by leveraging conditional ran- dom field. In CRF, the confidence scores generated lay the PLSA model and the Fliekr distance be- tween pairwise candidate annotations are considered as local evidences and contextual potentials re- spectively. The novelty of our method mainly lies in two aspects : exploiting PLSA to predict a candi- date set of annotations with confidence scores as well as CRF to further explore the semantic context among candidate annotations for precise image annotation. To demonstrate the effectiveness of the method proposed in this paper, an experiment is conducted on the standard Corel dataset and its re- sults are 'compared favorably with several state-of-the-art approaches.展开更多
A novel image auto-annotation method is presented based on probabilistic latent semantic analysis(PLSA) model and multiple Markov random fields(MRF).A PLSA model with asymmetric modalities is first constructed to esti...A novel image auto-annotation method is presented based on probabilistic latent semantic analysis(PLSA) model and multiple Markov random fields(MRF).A PLSA model with asymmetric modalities is first constructed to estimate the joint probability between images and semantic concepts,then a subgraph is extracted served as the corresponding structure of Markov random fields and inference over it is performed by the iterative conditional modes so as to capture the final annotation for the image.The novelty of our method mainly lies in two aspects:exploiting PLSA to estimate the joint probability between images and semantic concepts as well as multiple MRF to further explore the semantic context among keywords for accurate image annotation.To demonstrate the effectiveness of this approach,an experiment on the Corel5 k dataset is conducted and its results are compared favorably with the current state-of-the-art approaches.展开更多
Automatic image annotation(AIA)has become an important and challenging problem in computer vision due to the existence of semantic gap.In this paper,a novel support vector machine with mixture of kernels(SVM-MK)for au...Automatic image annotation(AIA)has become an important and challenging problem in computer vision due to the existence of semantic gap.In this paper,a novel support vector machine with mixture of kernels(SVM-MK)for automatic image annotation is proposed.On one hand,the combined global and local block-based image features are extracted in order to reflect the intrinsic content of images as complete as possible.On the other hand,SVM-MK is constructed to shoot for better annotating performance.Experimental results on Corel dataset show that the proposed image feature representation method as well as automatic image annotation classifier,SVM-MK,can achieve higher annotating accuracy than SVM with any single kernel and mi-SVM for semantic image annotation.展开更多
The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images. The prevalent way is to provide a keyword interface for users to submit queries....The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images. The prevalent way is to provide a keyword interface for users to submit queries. However, the amount of images without any tags or annotations are beyond the reach of manual efforts. To overcome this, automatic image annotation techniques emerge, which are generally a process of selecting a suitable set of tags for a given image without user intervention. However, there are three main challenges with respect to Web-scale image annotation: scalability, noise- resistance and diversity. Scalability has a twofold meaning: first an automatic image annotation system should be scalable with respect to billions of images on the Web; second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster. Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags. Diversity represents that image content may include both scenes and objects, which are further described by multiple different image features constituting different facets in annotation. In this paper, we propose a unified framework to tackle the above three challenges for automatic Web image annotation. It mainly involves two components: tag candidate retrieval and multi-facet annotation. In the former content-based indexing and concept-based eodebook are leveraged to solve scalability and noise-resistance issues. In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets. Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map. Millions of images from Flickr are used in our evaluation. Experimental results show that we have achieved 33% performance improvements compared with those single facet approaches in terms of three metrics: precision, recall and F1 score.展开更多
At present days,object detection and tracking concepts have gained more importance among researchers and business people.Presently,deep learning(DL)approaches have been used for object tracking as it increases the per...At present days,object detection and tracking concepts have gained more importance among researchers and business people.Presently,deep learning(DL)approaches have been used for object tracking as it increases the perfor-mance and speed of the tracking process.This paper presents a novel robust DL based object detection and tracking algorithm using Automated Image Anno-tation with ResNet based Faster regional convolutional neural network(R-CNN)named(AIA-FRCNN)model.The AIA-RFRCNN method performs image anno-tation using a Discriminative Correlation Filter(DCF)with Channel and Spatial Reliability tracker(CSR)called DCF-CSRT model.The AIA-RFRCNN model makes use of Faster RCNN as an object detector and tracker,which involves region proposal network(RPN)and Fast R-CNN.The RPN is a full convolution network that concurrently predicts the bounding box and score of different objects.The RPN is a trained model used for the generation of the high-quality region proposals,which are utilized by Fast R-CNN for detection process.Besides,Residual Network(ResNet 101)model is used as a shared convolutional neural network(CNN)for the generation of feature maps.The performance of the ResNet 101 model is further improved by the use of Adam optimizer,which tunes the hyperparameters namely learning rate,batch size,momentum,and weight decay.Finally,softmax layer is applied to classify the images.The performance of the AIA-RFRCNN method has been assessed using a benchmark dataset and a detailed comparative analysis of the results takes place.The outcome of the experiments indicated the superior characteristics of the AIA-RFRCNN model under diverse aspects.展开更多
The paper proposes a novel probabilistic generative model for simultaneous image classification and annotation. The model considers the fact that the category information can provide valuable information for image ann...The paper proposes a novel probabilistic generative model for simultaneous image classification and annotation. The model considers the fact that the category information can provide valuable information for image annotation. Once the category of an image is ascertained, the scope of annotation words can be narrowed, and the probability of generating irrelevant annotation words can be reduced. To this end, the idea that annotates images according to class is introduced in the model. Using variational methods, the approximate inference and parameters estimation algorithms of the model are derived, and efficient approximations for classifying and annotating new images are also given. The power of our model is demonstrated on two real world datasets: a 1 600-images LabelMe dataset and a 1 791-images UIUC-Sport dataset. The experiment results show that the classification performance is on par with several state-of-the-art classification models, while the annotation performance is better than that of several state-of-the-art annotation models.展开更多
Recently,big data becomes evitable due to massive increase in the generation of data in real time application.Presently,object detection and tracking applications becomes popular among research communities and finds u...Recently,big data becomes evitable due to massive increase in the generation of data in real time application.Presently,object detection and tracking applications becomes popular among research communities and finds useful in different applications namely vehicle navigation,augmented reality,surveillance,etc.This paper introduces an effective deep learning based object tracker using Automated Image Annotation with Inception v2 based Faster RCNN(AIA-IFRCNN)model in big data environment.The AIA-IFRCNN model annotates the images by Discriminative Correlation Filter(DCF)with Channel and Spatial Reliability tracker(CSR),named DCF-CSRT model.The AIA-IFRCNN technique employs Faster RCNN for object detection and tracking,which comprises region proposal network(RPN)and Fast R-CNN.In addition,inception v2 model is applied as a shared convolution neural network(CNN)to generate the feature map.Lastly,softmax layer is applied to perform classification task.The effectiveness of the AIA-IFRCNN method undergoes experimentation against a benchmark dataset and the results are assessed under diverse aspects with maximum detection accuracy of 97.77%.展开更多
The proposed deep learning algorithm will be integrated as a binary classifier under the umbrella of a multi-class classification tool to facilitate the automated detection of non-healthy deformities, anatomical landm...The proposed deep learning algorithm will be integrated as a binary classifier under the umbrella of a multi-class classification tool to facilitate the automated detection of non-healthy deformities, anatomical landmarks, pathological findings, other anomalies and normal cases, by examining medical endoscopic images of GI tract. Each binary classifier is trained to detect one specific non-healthy condition. The algorithm analyzed in the present work expands the ability of detection of this tool by classifying GI tract image snapshots into two classes, depicting haemorrhage and non-haemorrhage state. The proposed algorithm is the result of the collaboration between interdisciplinary specialists on AI and Data Analysis, Computer Vision, Gastroenterologists of four University Gastroenterology Departments of Greek Medical Schools. The data used are 195 videos (177 from non-healthy cases and 18 from healthy cases) videos captured from the PillCam<sup>(R)</sup> Medronics device, originated from 195 patients, all diagnosed with different forms of angioectasia, haemorrhages and other diseases from different sites of the gastrointestinal (GI), mainly including difficult cases of diagnosis. Our AI algorithm is based on convolutional neural network (CNN) trained on annotated images at image level, using a semantic tag indicating whether the image contains angioectasia and haemorrhage traces or not. At least 22 CNN architectures were created and evaluated some of which pre-trained applying transfer learning on ImageNet data. All the CNN variations were introduced, trained to a prevalence dataset of 50%, and evaluated of unseen data. On test data, the best results were obtained from our CNN architectures which do not utilize backbone of transfer learning. Across a balanced dataset from no-healthy images and healthy images from 39 videos from different patients, identified correct diagnosis with sensitivity 90%, specificity 92%, precision 91.8%, FPR 8%, FNR 10%. Besides, we compared the performance of our best CNN algorithm versus our same goal algorithm based on HSV colorimetric lesions features extracted of pixel-level annotations, both algorithms trained and tested on the same data. It is evaluated that the CNN trained on image level annotated images, is 9% less sensitive, achieves 2.6% less precision, 1.2% less FPR, and 7% less FNR, than that based on HSV filters, extracted from on pixel-level annotated training data.展开更多
Image categorization in massive image database is an important problem. This paper proposes an approach for image categorization, using sparse set of salient semantic information and hierarchy semantic label tree (H...Image categorization in massive image database is an important problem. This paper proposes an approach for image categorization, using sparse set of salient semantic information and hierarchy semantic label tree (HSLT) model. First, to provide more critical image semantics, the proposed sparse set of salient regions only at the focuses of visual attention instead of the entire scene was formed by our proposed saliency detection model with incorporating low and high level feature and Shotton's semantic texton forests (STFs) method. Second, we also propose a new HSLT model in terms of the sparse regional semantic information to automatically build a semantic image hierarchy, which explicitly encodes a general to specific image relationship. And last, we archived image dataset using image hierarchical semantic, which is help to improve the performance of image organizing and browsing. Extension experimefital results showed that the use of semantic hierarchies as a hierarchical organizing frame- work provides a better image annotation and organization, improves the accuracy and reduces human's effort.展开更多
Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntact...Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntactic correlation among the annotated words, therefore making it very difficult to render natural language interpretation for images such as "pandas eat bamboo". In this paper, we propose an approach to interpret image semantics through mining the visible and textual information hidden in images. This approach mainly consists of two parts: first the annotated words of target images are ranked according to two factors, namely the visual correlation and the pairwise co-occurrence; then the statement-level syntactic correlation among annotated words is explored and natural language interpretation for the target image is obtained. Experiments conducted on real-world web images show the effectiveness of the proposed approach.展开更多
Radiology doctors perform text-based image retrieval when they want to retrieve medical images.However,the accuracy and efficiency of such retrieval cannot keep up with the requirements.An innovative algorithm is bein...Radiology doctors perform text-based image retrieval when they want to retrieve medical images.However,the accuracy and efficiency of such retrieval cannot keep up with the requirements.An innovative algorithm is being proposed to retrieve similar medical images.First,we extract the professional terms from the ontology structure and use them to annotate the CT images.Second,the semantic similarity matrix of ontology terms is calculated according to the structure of the ontology.Lastly,the corresponding semantic distance is calculated according to the marked vector,which contains different annotations.We use 120 real liver CT images(divided into six categories)of a top three-hospital to run the algorithm of the program.Result shows that the retrieval index"Precision"is 80.81%,and the classification index"AUC(Area Under Curve)"under the"ROC curve"(Receiver Operating Characteristic)is 0.945.展开更多
文摘Every day,websites and personal archives create more and more photos.The size of these archives is immeasurable.The comfort of use of these huge digital image gatherings donates to their admiration.However,not all of these folders deliver relevant indexing information.From the outcomes,it is dif-ficult to discover data that the user can be absorbed in.Therefore,in order to determine the significance of the data,it is important to identify the contents in an informative manner.Image annotation can be one of the greatest problematic domains in multimedia research and computer vision.Hence,in this paper,Adap-tive Convolutional Deep Learning Model(ACDLM)is developed for automatic image annotation.Initially,the databases are collected from the open-source system which consists of some labelled images(for training phase)and some unlabeled images{Corel 5 K,MSRC v2}.After that,the images are sent to the pre-processing step such as colour space quantization and texture color class map.The pre-processed images are sent to the segmentation approach for efficient labelling technique using J-image segmentation(JSEG).Thefinal step is an auto-matic annotation using ACDLM which is a combination of Convolutional Neural Network(CNN)and Honey Badger Algorithm(HBA).Based on the proposed classifier,the unlabeled images are labelled.The proposed methodology is imple-mented in MATLAB and performance is evaluated by performance metrics such as accuracy,precision,recall and F1_Measure.With the assistance of the pro-posed methodology,the unlabeled images are labelled.
基金Supported by the National Program on Key Basic Research Project(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘In recent years,multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas,especially for automatic image annotation,whose purpose is to provide an efficient and effective searching environment for users to query their images more easily. In this paper,a semi-supervised learning based probabilistic latent semantic analysis( PLSA) model for automatic image annotation is presenred. Since it's often hard to obtain or create labeled images in large quantities while unlabeled ones are easier to collect,a transductive support vector machine( TSVM) is exploited to enhance the quality of the training image data. Then,different image features with different magnitudes will result in different performance for automatic image annotation. To this end,a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible. Finally,a PLSA model with asymmetric modalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores. Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PLSA for the task of automatic image annotation.
基金Supported by the National Basic Research Program of China(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘Automatic image annotation has been an active topic of research in computer vision and pattern recognition for decades.A two stage automatic image annotation method based on Gaussian mixture model(GMM) and random walk model(abbreviated as GMM-RW) is presented.To start with,GMM fitted by the rival penalized expectation maximization(RPEM) algorithm is employed to estimate the posterior probabilities of each annotation keyword.Subsequently,a random walk process over the constructed label similarity graph is implemented to further mine the potential correlations of the candidate annotations so as to capture the refining results,which plays a crucial role in semantic based image retrieval.The contributions exhibited in this work are multifold.First,GMM is exploited to capture the initial semantic annotations,especially the RPEM algorithm is utilized to train the model that can determine the number of components in GMM automatically.Second,a label similarity graph is constructed by a weighted linear combination of label similarity and visual similarity of images associated with the corresponding labels,which is able to avoid the phenomena of polysemy and synonym efficiently during the image annotation process.Third,the random walk is implemented over the constructed label graph to further refine the candidate set of annotations generated by GMM.Conducted experiments on the standard Corel5 k demonstrate that GMM-RW is significantly more effective than several state-of-the-arts regarding their effectiveness and efficiency in the task of automatic image annotation.
基金Supported by the National Basic Research Priorities Programme(No.2013CB329502)the National High Technology Research and Development Programme of China(No.2012AA011003)+1 种基金the Natural Science Basic Research Plan in Shanxi Province of China(No.2014JQ2-6036)the Science and Technology R&D Program of Baoji City(No.203020013,2013R2-2)
文摘This paper presents a new method for refining image annotation by integrating probabilistic la- tent semantic analysis (PLSA) with conditional random field (CRF). First a PLSA model with asymmetric modalities is constructed to predict a candidate set of annotations with confidence scores, and then model semantic relationship among the candidate annotations by leveraging conditional ran- dom field. In CRF, the confidence scores generated lay the PLSA model and the Fliekr distance be- tween pairwise candidate annotations are considered as local evidences and contextual potentials re- spectively. The novelty of our method mainly lies in two aspects : exploiting PLSA to predict a candi- date set of annotations with confidence scores as well as CRF to further explore the semantic context among candidate annotations for precise image annotation. To demonstrate the effectiveness of the method proposed in this paper, an experiment is conducted on the standard Corel dataset and its re- sults are 'compared favorably with several state-of-the-art approaches.
基金Supported by the National Basic Research Priorities Program(No.2013CB329502)the National High-tech R&D Program of China(No.2012AA011003)+1 种基金National Natural Science Foundation of China(No.61035003,61072085,60933004,60903141)the National Scienceand Technology Support Program of China(No.2012BA107B02)
文摘A novel image auto-annotation method is presented based on probabilistic latent semantic analysis(PLSA) model and multiple Markov random fields(MRF).A PLSA model with asymmetric modalities is first constructed to estimate the joint probability between images and semantic concepts,then a subgraph is extracted served as the corresponding structure of Markov random fields and inference over it is performed by the iterative conditional modes so as to capture the final annotation for the image.The novelty of our method mainly lies in two aspects:exploiting PLSA to estimate the joint probability between images and semantic concepts as well as multiple MRF to further explore the semantic context among keywords for accurate image annotation.To demonstrate the effectiveness of this approach,an experiment on the Corel5 k dataset is conducted and its results are compared favorably with the current state-of-the-art approaches.
基金Supported by the National Basic Research Priorities Programme(No.2007CB311004)the National Natural Science Foundation of China(No.61035003,60933004,60903141,60970088,61072085)
文摘Automatic image annotation(AIA)has become an important and challenging problem in computer vision due to the existence of semantic gap.In this paper,a novel support vector machine with mixture of kernels(SVM-MK)for automatic image annotation is proposed.On one hand,the combined global and local block-based image features are extracted in order to reflect the intrinsic content of images as complete as possible.On the other hand,SVM-MK is constructed to shoot for better annotating performance.Experimental results on Corel dataset show that the proposed image feature representation method as well as automatic image annotation classifier,SVM-MK,can achieve higher annotating accuracy than SVM with any single kernel and mi-SVM for semantic image annotation.
基金supported by the National Natural Science Foundation of China under Grant No. 60931160445
文摘The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images. The prevalent way is to provide a keyword interface for users to submit queries. However, the amount of images without any tags or annotations are beyond the reach of manual efforts. To overcome this, automatic image annotation techniques emerge, which are generally a process of selecting a suitable set of tags for a given image without user intervention. However, there are three main challenges with respect to Web-scale image annotation: scalability, noise- resistance and diversity. Scalability has a twofold meaning: first an automatic image annotation system should be scalable with respect to billions of images on the Web; second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster. Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags. Diversity represents that image content may include both scenes and objects, which are further described by multiple different image features constituting different facets in annotation. In this paper, we propose a unified framework to tackle the above three challenges for automatic Web image annotation. It mainly involves two components: tag candidate retrieval and multi-facet annotation. In the former content-based indexing and concept-based eodebook are leveraged to solve scalability and noise-resistance issues. In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets. Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map. Millions of images from Flickr are used in our evaluation. Experimental results show that we have achieved 33% performance improvements compared with those single facet approaches in terms of three metrics: precision, recall and F1 score.
文摘At present days,object detection and tracking concepts have gained more importance among researchers and business people.Presently,deep learning(DL)approaches have been used for object tracking as it increases the perfor-mance and speed of the tracking process.This paper presents a novel robust DL based object detection and tracking algorithm using Automated Image Anno-tation with ResNet based Faster regional convolutional neural network(R-CNN)named(AIA-FRCNN)model.The AIA-RFRCNN method performs image anno-tation using a Discriminative Correlation Filter(DCF)with Channel and Spatial Reliability tracker(CSR)called DCF-CSRT model.The AIA-RFRCNN model makes use of Faster RCNN as an object detector and tracker,which involves region proposal network(RPN)and Fast R-CNN.The RPN is a full convolution network that concurrently predicts the bounding box and score of different objects.The RPN is a trained model used for the generation of the high-quality region proposals,which are utilized by Fast R-CNN for detection process.Besides,Residual Network(ResNet 101)model is used as a shared convolutional neural network(CNN)for the generation of feature maps.The performance of the ResNet 101 model is further improved by the use of Adam optimizer,which tunes the hyperparameters namely learning rate,batch size,momentum,and weight decay.Finally,softmax layer is applied to classify the images.The performance of the AIA-RFRCNN method has been assessed using a benchmark dataset and a detailed comparative analysis of the results takes place.The outcome of the experiments indicated the superior characteristics of the AIA-RFRCNN model under diverse aspects.
基金supported by the Major Research Plan of the National Natural Science Foundation of China (90920006)
文摘The paper proposes a novel probabilistic generative model for simultaneous image classification and annotation. The model considers the fact that the category information can provide valuable information for image annotation. Once the category of an image is ascertained, the scope of annotation words can be narrowed, and the probability of generating irrelevant annotation words can be reduced. To this end, the idea that annotates images according to class is introduced in the model. Using variational methods, the approximate inference and parameters estimation algorithms of the model are derived, and efficient approximations for classifying and annotating new images are also given. The power of our model is demonstrated on two real world datasets: a 1 600-images LabelMe dataset and a 1 791-images UIUC-Sport dataset. The experiment results show that the classification performance is on par with several state-of-the-art classification models, while the annotation performance is better than that of several state-of-the-art annotation models.
文摘Recently,big data becomes evitable due to massive increase in the generation of data in real time application.Presently,object detection and tracking applications becomes popular among research communities and finds useful in different applications namely vehicle navigation,augmented reality,surveillance,etc.This paper introduces an effective deep learning based object tracker using Automated Image Annotation with Inception v2 based Faster RCNN(AIA-IFRCNN)model in big data environment.The AIA-IFRCNN model annotates the images by Discriminative Correlation Filter(DCF)with Channel and Spatial Reliability tracker(CSR),named DCF-CSRT model.The AIA-IFRCNN technique employs Faster RCNN for object detection and tracking,which comprises region proposal network(RPN)and Fast R-CNN.In addition,inception v2 model is applied as a shared convolution neural network(CNN)to generate the feature map.Lastly,softmax layer is applied to perform classification task.The effectiveness of the AIA-IFRCNN method undergoes experimentation against a benchmark dataset and the results are assessed under diverse aspects with maximum detection accuracy of 97.77%.
文摘The proposed deep learning algorithm will be integrated as a binary classifier under the umbrella of a multi-class classification tool to facilitate the automated detection of non-healthy deformities, anatomical landmarks, pathological findings, other anomalies and normal cases, by examining medical endoscopic images of GI tract. Each binary classifier is trained to detect one specific non-healthy condition. The algorithm analyzed in the present work expands the ability of detection of this tool by classifying GI tract image snapshots into two classes, depicting haemorrhage and non-haemorrhage state. The proposed algorithm is the result of the collaboration between interdisciplinary specialists on AI and Data Analysis, Computer Vision, Gastroenterologists of four University Gastroenterology Departments of Greek Medical Schools. The data used are 195 videos (177 from non-healthy cases and 18 from healthy cases) videos captured from the PillCam<sup>(R)</sup> Medronics device, originated from 195 patients, all diagnosed with different forms of angioectasia, haemorrhages and other diseases from different sites of the gastrointestinal (GI), mainly including difficult cases of diagnosis. Our AI algorithm is based on convolutional neural network (CNN) trained on annotated images at image level, using a semantic tag indicating whether the image contains angioectasia and haemorrhage traces or not. At least 22 CNN architectures were created and evaluated some of which pre-trained applying transfer learning on ImageNet data. All the CNN variations were introduced, trained to a prevalence dataset of 50%, and evaluated of unseen data. On test data, the best results were obtained from our CNN architectures which do not utilize backbone of transfer learning. Across a balanced dataset from no-healthy images and healthy images from 39 videos from different patients, identified correct diagnosis with sensitivity 90%, specificity 92%, precision 91.8%, FPR 8%, FNR 10%. Besides, we compared the performance of our best CNN algorithm versus our same goal algorithm based on HSV colorimetric lesions features extracted of pixel-level annotations, both algorithms trained and tested on the same data. It is evaluated that the CNN trained on image level annotated images, is 9% less sensitive, achieves 2.6% less precision, 1.2% less FPR, and 7% less FNR, than that based on HSV filters, extracted from on pixel-level annotated training data.
基金Acknowledgements This work was supported by National Natural Science Foundation of China (Grant Nos. 61272258, 61170124, 61170020, 61070223), and Application Foundation Research Plan of Suzhou City, China (SYG201116).
文摘Image categorization in massive image database is an important problem. This paper proposes an approach for image categorization, using sparse set of salient semantic information and hierarchy semantic label tree (HSLT) model. First, to provide more critical image semantics, the proposed sparse set of salient regions only at the focuses of visual attention instead of the entire scene was formed by our proposed saliency detection model with incorporating low and high level feature and Shotton's semantic texton forests (STFs) method. Second, we also propose a new HSLT model in terms of the sparse regional semantic information to automatically build a semantic image hierarchy, which explicitly encodes a general to specific image relationship. And last, we archived image dataset using image hierarchical semantic, which is help to improve the performance of image organizing and browsing. Extension experimefital results showed that the use of semantic hierarchies as a hierarchical organizing frame- work provides a better image annotation and organization, improves the accuracy and reduces human's effort.
基金Project supported by the National Natural Science Foundation of China (Nos 60533090 and 60603096)the National High-Tech Research and Development Program (863) of China (No 2006AA 010107)
文摘Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntactic correlation among the annotated words, therefore making it very difficult to render natural language interpretation for images such as "pandas eat bamboo". In this paper, we propose an approach to interpret image semantics through mining the visible and textual information hidden in images. This approach mainly consists of two parts: first the annotated words of target images are ranked according to two factors, namely the visual correlation and the pairwise co-occurrence; then the statement-level syntactic correlation among annotated words is explored and natural language interpretation for the target image is obtained. Experiments conducted on real-world web images show the effectiveness of the proposed approach.
文摘Radiology doctors perform text-based image retrieval when they want to retrieve medical images.However,the accuracy and efficiency of such retrieval cannot keep up with the requirements.An innovative algorithm is being proposed to retrieve similar medical images.First,we extract the professional terms from the ontology structure and use them to annotate the CT images.Second,the semantic similarity matrix of ontology terms is calculated according to the structure of the ontology.Lastly,the corresponding semantic distance is calculated according to the marked vector,which contains different annotations.We use 120 real liver CT images(divided into six categories)of a top three-hospital to run the algorithm of the program.Result shows that the retrieval index"Precision"is 80.81%,and the classification index"AUC(Area Under Curve)"under the"ROC curve"(Receiver Operating Characteristic)is 0.945.