Low-light images suffer from low quality due to poor lighting conditions,noise pollution,and improper settings of cameras.To enhance low-light images,most existing methods rely on normal-light images for guidance but ...Low-light images suffer from low quality due to poor lighting conditions,noise pollution,and improper settings of cameras.To enhance low-light images,most existing methods rely on normal-light images for guidance but the collection of suitable normal-light images is difficult.In contrast,a self-supervised method breaks free from the reliance on normal-light data,resulting in more convenience and better generalization.Existing self-supervised methods primarily focus on illumination adjustment and design pixel-based adjustment methods,resulting in remnants of other degradations,uneven brightness and artifacts.In response,this paper proposes a self-supervised enhancement method,termed as SLIE.It can handle multiple degradations including illumination attenuation,noise pollution,and color shift,all in a self-supervised manner.Illumination attenuation is estimated based on physical principles and local neighborhood information.The removal and correction of noise and color shift removal are solely realized with noisy images and images with color shifts.Finally,the comprehensive and fully self-supervised approach can achieve better adaptability and generalization.It is applicable to various low light conditions,and can reproduce the original color of scenes in natural light.Extensive experiments conducted on four public datasets demonstrate the superiority of SLIE to thirteen state-of-the-art methods.Our code is available at https://github.com/hanna-xu/SLIE.展开更多
Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibilit...Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibility to use mobile platforms to detect the location and motion of the vehicle over a larger area.To this end,different models have shown the ability to recognize and track vehicles.However,these methods are not mature enough to produce accurate results in complex road scenes.Therefore,this paper presents an algorithm that combines state-of-the-art techniques for identifying and tracking vehicles in conjunction with image bursts.The extracted frames were converted to grayscale,followed by the application of a georeferencing algorithm to embed coordinate information into the images.The masking technique eliminated irrelevant data and reduced the computational cost of the overall monitoring system.Next,Sobel edge detection combined with Canny edge detection and Hough line transform has been applied for noise reduction.After preprocessing,the blob detection algorithm helped detect the vehicles.Vehicles of varying sizes have been detected by implementing a dynamic thresholding scheme.Detection was done on the first image of every burst.Then,to track vehicles,the model of each vehicle was made to find its matches in the succeeding images using the template matching algorithm.To further improve the tracking accuracy by incorporating motion information,Scale Invariant Feature Transform(SIFT)features have been used to find the best possible match among multiple matches.An accuracy rate of 87%for detection and 80%accuracy for tracking in the A1 Motorway Netherland dataset has been achieved.For the Vehicle Aerial Imaging from Drone(VAID)dataset,an accuracy rate of 86%for detection and 78%accuracy for tracking has been achieved.展开更多
The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients ...The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients who test positive for Covid-19 are diagnosed via a nasal PCR test.In comparison,polymerase chain reaction(PCR)findings take a few hours to a few days.The PCR test is expensive,although the government may bear expenses in certain places.Furthermore,subsets of the population resist invasive testing like swabs.Therefore,chest X-rays or Computerized Vomography(CT)scans are preferred in most cases,and more importantly,they are non-invasive,inexpensive,and provide a faster response time.Recent advances in Artificial Intelligence(AI),in combination with state-of-the-art methods,have allowed for the diagnosis of COVID-19 using chest x-rays.This article proposes a method for classifying COVID-19 as positive or negative on a decentralized dataset that is based on the Federated learning scheme.In order to build a progressive global COVID-19 classification model,two edge devices are employed to train the model on their respective localized dataset,and a 3-layered custom Convolutional Neural Network(CNN)model is used in the process of training the model,which can be deployed from the server.These two edge devices then communicate their learned parameter and weight to the server,where it aggregates and updates the globalmodel.The proposed model is trained using an image dataset that can be found on Kaggle.There are more than 13,000 X-ray images in Kaggle Database collection,from that collection 9000 images of Normal and COVID-19 positive images are used.Each edge node possesses a different number of images;edge node 1 has 3200 images,while edge node 2 has 5800.There is no association between the datasets of the various nodes that are included in the network.By doing it in this manner,each of the nodes will have access to a separate image collection that has no correlation with each other.The diagnosis of COVID-19 has become considerably more efficient with the installation of the suggested algorithm and dataset,and the findings that we have obtained are quite encouraging.展开更多
Low-light image enhancement methods have limitations in addressing issues such as color distortion,lack of vibrancy,and uneven light distribution and often require paired training data.To address these issues,we propo...Low-light image enhancement methods have limitations in addressing issues such as color distortion,lack of vibrancy,and uneven light distribution and often require paired training data.To address these issues,we propose a two-stage unsupervised low-light image enhancement algorithm called Retinex and Exposure Fusion Network(RFNet),which can overcome the problems of over-enhancement of the high dynamic range and under-enhancement of the low dynamic range in existing enhancement algorithms.This algorithm can better manage the challenges brought about by complex environments in real-world scenarios by training with unpaired low-light images and regular-light images.In the first stage,we design a multi-scale feature extraction module based on Retinex theory,capable of extracting details and structural information at different scales to generate high-quality illumination and reflection images.In the second stage,an exposure image generator is designed through the camera response mechanism function to acquire exposure images containing more dark features,and the generated images are fused with the original input images to complete the low-light image enhancement.Experiments show the effectiveness and rationality of each module designed in this paper.And the method reconstructs the details of contrast and color distribution,outperforms the current state-of-the-art methods in both qualitative and quantitative metrics,and shows excellent performance in the real world.展开更多
This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed ...This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed images reflecting a highly challenging and unconstraint environment.The methodology for building the dataset consists of four core phases;that include acquisition of videos,extraction of frames,localization of face regions,and cropping and resizing of detected face regions.The raw images in the dataset consist of a total of 4613 frames obtained fromvideo sequences.The processed images in the dataset consist of the face regions of 250 persons extracted from raw data images to ensure the authenticity of the presented data.The dataset further consists of 8 images corresponding to each of the 250 subjects(persons)for a total of 2000 images.It portrays a highly unconstrained and challenging environment with human faces of varying sizes and pixel quality(resolution).Since the face regions in video sequences are severely degraded due to various unavoidable factors,it can be used as a benchmark to test and evaluate face detection and recognition algorithms for research purposes.We have also gathered and displayed records of the presence of subjects who appear in presented frames;in a temporal context.This can also be used as a temporal benchmark for tracking,finding persons,activity monitoring,and crowd counting in large crowd scenarios.展开更多
In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a visi...In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.展开更多
With the development of artificial intelligence-related technologies such as deep learning,various organizations,including the government,are making various efforts to generate and manage big data for use in artificia...With the development of artificial intelligence-related technologies such as deep learning,various organizations,including the government,are making various efforts to generate and manage big data for use in artificial intelligence.However,it is difficult to acquire big data due to various social problems and restrictions such as personal information leakage.There are many problems in introducing technology in fields that do not have enough training data necessary to apply deep learning technology.Therefore,this study proposes a mixed contour data augmentation technique,which is a data augmentation technique using contour images,to solve a problem caused by a lack of data.ResNet,a famous convolutional neural network(CNN)architecture,and CIFAR-10,a benchmark data set,are used for experimental performance evaluation to prove the superiority of the proposed method.And to prove that high performance improvement can be achieved even with a small training dataset,the ratio of the training dataset was divided into 70%,50%,and 30%for comparative analysis.As a result of applying the mixed contour data augmentation technique,it was possible to achieve a classification accuracy improvement of up to 4.64%and high accuracy even with a small amount of data set.In addition,it is expected that the mixed contour data augmentation technique can be applied in various fields by proving the excellence of the proposed data augmentation technique using benchmark datasets.展开更多
Deep learning is considered one of the most efficient and reliable methods through which the legitimacy of a digital image can be verified.In the current cyber world where deepfakes have shaken the global community,co...Deep learning is considered one of the most efficient and reliable methods through which the legitimacy of a digital image can be verified.In the current cyber world where deepfakes have shaken the global community,confirming the legitimacy of a digital image is of great importance.With the advancements made in deep learning techniques,now we can efficiently train and develop state-of-the-art digital image forensic models.The most traditional and widely used method by researchers is convolution neural networks(CNN)for verification of image authenticity but it consumes a considerable number of resources and requires a large dataset for training.Therefore,in this study,a transfer learning based deep learning technique for image forgery detection is proposed.The proposed methodology consists of three modules namely;preprocessing module,convolutional module,and the classification module.By using our proposed technique,the training time is drastically reduced by utilizing the pre-trained weights.The performance of the proposed technique is evaluated by using benchmark datasets,i.e.,BOW and BOSSBase that detect five forensic types which include JPEG compression,contrast enhancement(CE),median filtering(MF),additive Gaussian noise,and resampling.We evaluated the performance of our proposed technique by conducting various experiments and case scenarios and achieved an accuracy of 99.92%.The results show the superiority of the proposed system.展开更多
This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which ...This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging.展开更多
Image retrieval for food ingredients is important work,tremendously tiring,uninteresting,and expensive.Computer vision systems have extraordinary advancements in image retrieval with CNNs skills.But it is not feasible...Image retrieval for food ingredients is important work,tremendously tiring,uninteresting,and expensive.Computer vision systems have extraordinary advancements in image retrieval with CNNs skills.But it is not feasible for small-size food datasets using convolutional neural networks directly.In this study,a novel image retrieval approach is presented for small and medium-scale food datasets,which both augments images utilizing image transformation techniques to enlarge the size of datasets,and promotes the average accuracy of food recognition with state-of-the-art deep learning technologies.First,typical image transformation techniques are used to augment food images.Then transfer learning technology based on deep learning is applied to extract image features.Finally,a food recognition algorithm is leveraged on extracted deepfeature vectors.The presented image-retrieval architecture is analyzed based on a smallscale food dataset which is composed of forty-one categories of food ingredients and one hundred pictures for each category.Extensive experimental results demonstrate the advantages of image-augmentation architecture for small and medium datasets using deep learning.The novel approach combines image augmentation,ResNet feature vectors,and SMO classification,and shows its superiority for food detection of small/medium-scale datasets with comprehensive experiments.展开更多
Historically,yarn-dyed plaid fabrics(YDPFs)have enjoyed enduring popularity with many rich plaid patterns,but production data are still classified and searched only according to production parameters.The process does ...Historically,yarn-dyed plaid fabrics(YDPFs)have enjoyed enduring popularity with many rich plaid patterns,but production data are still classified and searched only according to production parameters.The process does not satisfy the visual needs of sample order production,fabric design,and stock management.This study produced an image dataset for YDPFs,collected from 10,661 fabric samples.The authors believe that the dataset will have significant utility in further research into YDPFs.Convolutional neural networks,such as VGG,ResNet,and DenseNet,with different hyperparameter groups,seemed themost promising tools for the study.This paper reports on the authors’exhaustive evaluation of the YDPF dataset.With an overall accuracy of 88.78%,CNNs proved to be effective in YDPF image classification.This was true even for the low accuracy of Windowpane fabrics,which often mistakenly includes the Prince ofWales pattern.Image classification of traditional patterns is also improved by utilizing the strip pooling model to extract local detail features and horizontal and vertical directions.The strip pooling model characterizes the horizontal and vertical crisscross patterns of YDPFs with considerable success.The proposed method using the strip pooling model(SPM)improves the classification performance on the YDPF dataset by 2.64%for ResNet18,by 3.66%for VGG16,and by 3.54%for DenseNet121.The results reveal that the SPM significantly improves YDPF classification accuracy and reduces the error rate of Windowpane patterns as well.展开更多
Recently,many researchers have tried to develop a robust,fast,and accurate algorithm.This algorithm is for eye-tracking and detecting pupil position in many applications such as head-mounted eye tracking,gaze-based hu...Recently,many researchers have tried to develop a robust,fast,and accurate algorithm.This algorithm is for eye-tracking and detecting pupil position in many applications such as head-mounted eye tracking,gaze-based human-computer interaction,medical applications(such as deaf and diabetes patients),and attention analysis.Many real-world conditions challenge the eye appearance,such as illumination,reflections,and occasions.On the other hand,individual differences in eye physiology and other sources of noise,such as contact lenses or make-up.The present work introduces a robust pupil detection algorithm with and higher accuracy than the previous attempts for real-time analytics applications.The proposed circular hough transform with morphing canny edge detection for Pupillometery(CHMCEP)algorithm can detect even the blurred or noisy images by using different filtering methods in the pre-processing or start phase to remove the blur and noise and finally the second filtering process before the circular Hough transform for the center fitting to make sure better accuracy.The performance of the proposed CHMCEP algorithm was tested against recent pupil detection methods.Simulations and results show that the proposed CHMCEP algorithm achieved detection rates of 87.11,78.54,58,and 78 according to´Swirski,ExCuSe,Else,and labeled pupils in the wild(LPW)data sets,respectively.These results show that the proposed approach performs better than the other pupil detection methods by a large margin by providing exact and robust pupil positions on challenging ordinary eye pictures.展开更多
Artificial intelligence,which has recently emerged with the rapid development of information technology,is drawing attention as a tool for solving various problems demanded by society and industry.In particular,convol...Artificial intelligence,which has recently emerged with the rapid development of information technology,is drawing attention as a tool for solving various problems demanded by society and industry.In particular,convolutional neural networks(CNNs),a type of deep learning technology,are highlighted in computer vision fields,such as image classification and recognition and object tracking.Training these CNN models requires a large amount of data,and a lack of data can lead to performance degradation problems due to overfitting.As CNN architecture development and optimization studies become active,ensemble techniques have emerged to perform image classification by combining features extracted from multiple CNN models.In this study,data augmentation and contour image extraction were performed to overcome the data shortage problem.In addition,we propose a hierarchical ensemble technique to achieve high image classification accuracy,even if trained from a small amount of data.First,we trained the UCMerced land use dataset and the contour images for each image on pretrained VGGNet,GoogLeNet,ResNet,DenseNet,and EfficientNet.We then apply a hierarchical ensemble technique to the number of cases in which each model can be deployed.These experiments were performed in cases where the proportion of training datasets was 30%,50%,and 70%,resulting in a performance improvement of up to 4.68%compared to the average accuracy of the entire model.展开更多
基金supported by the National Natural Science Foundation of China(62276192)。
文摘Low-light images suffer from low quality due to poor lighting conditions,noise pollution,and improper settings of cameras.To enhance low-light images,most existing methods rely on normal-light images for guidance but the collection of suitable normal-light images is difficult.In contrast,a self-supervised method breaks free from the reliance on normal-light data,resulting in more convenience and better generalization.Existing self-supervised methods primarily focus on illumination adjustment and design pixel-based adjustment methods,resulting in remnants of other degradations,uneven brightness and artifacts.In response,this paper proposes a self-supervised enhancement method,termed as SLIE.It can handle multiple degradations including illumination attenuation,noise pollution,and color shift,all in a self-supervised manner.Illumination attenuation is estimated based on physical principles and local neighborhood information.The removal and correction of noise and color shift removal are solely realized with noisy images and images with color shifts.Finally,the comprehensive and fully self-supervised approach can achieve better adaptability and generalization.It is applicable to various low light conditions,and can reproduce the original color of scenes in natural light.Extensive experiments conducted on four public datasets demonstrate the superiority of SLIE to thirteen state-of-the-art methods.Our code is available at https://github.com/hanna-xu/SLIE.
基金supported by a grant from the Basic Science Research Program through the National Research Foundation(NRF)(2021R1F1A1063634)funded by the Ministry of Science and ICT(MSIT),Republic of KoreaThe authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding Program Grant Code(NU/RG/SERC/13/40)+2 种基金Also,the authors are thankful to Prince Satam bin Abdulaziz University for supporting this study via funding from Prince Satam bin Abdulaziz University project number(PSAU/2024/R/1445)This work was also supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2023R54)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibility to use mobile platforms to detect the location and motion of the vehicle over a larger area.To this end,different models have shown the ability to recognize and track vehicles.However,these methods are not mature enough to produce accurate results in complex road scenes.Therefore,this paper presents an algorithm that combines state-of-the-art techniques for identifying and tracking vehicles in conjunction with image bursts.The extracted frames were converted to grayscale,followed by the application of a georeferencing algorithm to embed coordinate information into the images.The masking technique eliminated irrelevant data and reduced the computational cost of the overall monitoring system.Next,Sobel edge detection combined with Canny edge detection and Hough line transform has been applied for noise reduction.After preprocessing,the blob detection algorithm helped detect the vehicles.Vehicles of varying sizes have been detected by implementing a dynamic thresholding scheme.Detection was done on the first image of every burst.Then,to track vehicles,the model of each vehicle was made to find its matches in the succeeding images using the template matching algorithm.To further improve the tracking accuracy by incorporating motion information,Scale Invariant Feature Transform(SIFT)features have been used to find the best possible match among multiple matches.An accuracy rate of 87%for detection and 80%accuracy for tracking in the A1 Motorway Netherland dataset has been achieved.For the Vehicle Aerial Imaging from Drone(VAID)dataset,an accuracy rate of 86%for detection and 78%accuracy for tracking has been achieved.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2023R66)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients who test positive for Covid-19 are diagnosed via a nasal PCR test.In comparison,polymerase chain reaction(PCR)findings take a few hours to a few days.The PCR test is expensive,although the government may bear expenses in certain places.Furthermore,subsets of the population resist invasive testing like swabs.Therefore,chest X-rays or Computerized Vomography(CT)scans are preferred in most cases,and more importantly,they are non-invasive,inexpensive,and provide a faster response time.Recent advances in Artificial Intelligence(AI),in combination with state-of-the-art methods,have allowed for the diagnosis of COVID-19 using chest x-rays.This article proposes a method for classifying COVID-19 as positive or negative on a decentralized dataset that is based on the Federated learning scheme.In order to build a progressive global COVID-19 classification model,two edge devices are employed to train the model on their respective localized dataset,and a 3-layered custom Convolutional Neural Network(CNN)model is used in the process of training the model,which can be deployed from the server.These two edge devices then communicate their learned parameter and weight to the server,where it aggregates and updates the globalmodel.The proposed model is trained using an image dataset that can be found on Kaggle.There are more than 13,000 X-ray images in Kaggle Database collection,from that collection 9000 images of Normal and COVID-19 positive images are used.Each edge node possesses a different number of images;edge node 1 has 3200 images,while edge node 2 has 5800.There is no association between the datasets of the various nodes that are included in the network.By doing it in this manner,each of the nodes will have access to a separate image collection that has no correlation with each other.The diagnosis of COVID-19 has become considerably more efficient with the installation of the suggested algorithm and dataset,and the findings that we have obtained are quite encouraging.
基金supported by the National Key Research and Development Program Topics(Grant No.2021YFB4000905)the National Natural Science Foundation of China(Grant Nos.62101432 and 62102309)in part by Shaanxi Natural Science Fundamental Research Program Project(No.2022JM-508).
文摘Low-light image enhancement methods have limitations in addressing issues such as color distortion,lack of vibrancy,and uneven light distribution and often require paired training data.To address these issues,we propose a two-stage unsupervised low-light image enhancement algorithm called Retinex and Exposure Fusion Network(RFNet),which can overcome the problems of over-enhancement of the high dynamic range and under-enhancement of the low dynamic range in existing enhancement algorithms.This algorithm can better manage the challenges brought about by complex environments in real-world scenarios by training with unpaired low-light images and regular-light images.In the first stage,we design a multi-scale feature extraction module based on Retinex theory,capable of extracting details and structural information at different scales to generate high-quality illumination and reflection images.In the second stage,an exposure image generator is designed through the camera response mechanism function to acquire exposure images containing more dark features,and the generated images are fused with the original input images to complete the low-light image enhancement.Experiments show the effectiveness and rationality of each module designed in this paper.And the method reconstructs the details of contrast and color distribution,outperforms the current state-of-the-art methods in both qualitative and quantitative metrics,and shows excellent performance in the real world.
基金This research was supported by the Deanship of Scientific Research,Islamic University of Madinah,Madinah(KSA),under Tammayuz program Grant Number 1442/505.
文摘This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed images reflecting a highly challenging and unconstraint environment.The methodology for building the dataset consists of four core phases;that include acquisition of videos,extraction of frames,localization of face regions,and cropping and resizing of detected face regions.The raw images in the dataset consist of a total of 4613 frames obtained fromvideo sequences.The processed images in the dataset consist of the face regions of 250 persons extracted from raw data images to ensure the authenticity of the presented data.The dataset further consists of 8 images corresponding to each of the 250 subjects(persons)for a total of 2000 images.It portrays a highly unconstrained and challenging environment with human faces of varying sizes and pixel quality(resolution).Since the face regions in video sequences are severely degraded due to various unavoidable factors,it can be used as a benchmark to test and evaluate face detection and recognition algorithms for research purposes.We have also gathered and displayed records of the presence of subjects who appear in presented frames;in a temporal context.This can also be used as a temporal benchmark for tracking,finding persons,activity monitoring,and crowd counting in large crowd scenarios.
基金supported by the National Natural Science Foundation of China (61702528,61806212)。
文摘In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.
文摘With the development of artificial intelligence-related technologies such as deep learning,various organizations,including the government,are making various efforts to generate and manage big data for use in artificial intelligence.However,it is difficult to acquire big data due to various social problems and restrictions such as personal information leakage.There are many problems in introducing technology in fields that do not have enough training data necessary to apply deep learning technology.Therefore,this study proposes a mixed contour data augmentation technique,which is a data augmentation technique using contour images,to solve a problem caused by a lack of data.ResNet,a famous convolutional neural network(CNN)architecture,and CIFAR-10,a benchmark data set,are used for experimental performance evaluation to prove the superiority of the proposed method.And to prove that high performance improvement can be achieved even with a small training dataset,the ratio of the training dataset was divided into 70%,50%,and 30%for comparative analysis.As a result of applying the mixed contour data augmentation technique,it was possible to achieve a classification accuracy improvement of up to 4.64%and high accuracy even with a small amount of data set.In addition,it is expected that the mixed contour data augmentation technique can be applied in various fields by proving the excellence of the proposed data augmentation technique using benchmark datasets.
基金supported by Security Research Center at Naif Arab University for Security Sciences(Project No.SRC-PR2-01).
文摘Deep learning is considered one of the most efficient and reliable methods through which the legitimacy of a digital image can be verified.In the current cyber world where deepfakes have shaken the global community,confirming the legitimacy of a digital image is of great importance.With the advancements made in deep learning techniques,now we can efficiently train and develop state-of-the-art digital image forensic models.The most traditional and widely used method by researchers is convolution neural networks(CNN)for verification of image authenticity but it consumes a considerable number of resources and requires a large dataset for training.Therefore,in this study,a transfer learning based deep learning technique for image forgery detection is proposed.The proposed methodology consists of three modules namely;preprocessing module,convolutional module,and the classification module.By using our proposed technique,the training time is drastically reduced by utilizing the pre-trained weights.The performance of the proposed technique is evaluated by using benchmark datasets,i.e.,BOW and BOSSBase that detect five forensic types which include JPEG compression,contrast enhancement(CE),median filtering(MF),additive Gaussian noise,and resampling.We evaluated the performance of our proposed technique by conducting various experiments and case scenarios and achieved an accuracy of 99.92%.The results show the superiority of the proposed system.
基金National Natural Science Foundation of China(No.61971036)Fundamental Research Funds for the Central Universities(No.2023CX01011)Beijing Nova Program(No.20230484361)。
文摘This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging.
文摘Image retrieval for food ingredients is important work,tremendously tiring,uninteresting,and expensive.Computer vision systems have extraordinary advancements in image retrieval with CNNs skills.But it is not feasible for small-size food datasets using convolutional neural networks directly.In this study,a novel image retrieval approach is presented for small and medium-scale food datasets,which both augments images utilizing image transformation techniques to enlarge the size of datasets,and promotes the average accuracy of food recognition with state-of-the-art deep learning technologies.First,typical image transformation techniques are used to augment food images.Then transfer learning technology based on deep learning is applied to extract image features.Finally,a food recognition algorithm is leveraged on extracted deepfeature vectors.The presented image-retrieval architecture is analyzed based on a smallscale food dataset which is composed of forty-one categories of food ingredients and one hundred pictures for each category.Extensive experimental results demonstrate the advantages of image-augmentation architecture for small and medium datasets using deep learning.The novel approach combines image augmentation,ResNet feature vectors,and SMO classification,and shows its superiority for food detection of small/medium-scale datasets with comprehensive experiments.
基金This work was supported by China Social Science Foundation under Grant[17CG209]The fabric samples were supported by Jiangsu Sunshine Group and Jiangsu Lianfa Textile Group.
文摘Historically,yarn-dyed plaid fabrics(YDPFs)have enjoyed enduring popularity with many rich plaid patterns,but production data are still classified and searched only according to production parameters.The process does not satisfy the visual needs of sample order production,fabric design,and stock management.This study produced an image dataset for YDPFs,collected from 10,661 fabric samples.The authors believe that the dataset will have significant utility in further research into YDPFs.Convolutional neural networks,such as VGG,ResNet,and DenseNet,with different hyperparameter groups,seemed themost promising tools for the study.This paper reports on the authors’exhaustive evaluation of the YDPF dataset.With an overall accuracy of 88.78%,CNNs proved to be effective in YDPF image classification.This was true even for the low accuracy of Windowpane fabrics,which often mistakenly includes the Prince ofWales pattern.Image classification of traditional patterns is also improved by utilizing the strip pooling model to extract local detail features and horizontal and vertical directions.The strip pooling model characterizes the horizontal and vertical crisscross patterns of YDPFs with considerable success.The proposed method using the strip pooling model(SPM)improves the classification performance on the YDPF dataset by 2.64%for ResNet18,by 3.66%for VGG16,and by 3.54%for DenseNet121.The results reveal that the SPM significantly improves YDPF classification accuracy and reduces the error rate of Windowpane patterns as well.
基金This research was funded by“TAIF UNIVERSITY RESEARCHERS SUPPORTING PROJECT,grant number TURSP-2020/345”,Taif University,Taif,Saudi Arabia.
文摘Recently,many researchers have tried to develop a robust,fast,and accurate algorithm.This algorithm is for eye-tracking and detecting pupil position in many applications such as head-mounted eye tracking,gaze-based human-computer interaction,medical applications(such as deaf and diabetes patients),and attention analysis.Many real-world conditions challenge the eye appearance,such as illumination,reflections,and occasions.On the other hand,individual differences in eye physiology and other sources of noise,such as contact lenses or make-up.The present work introduces a robust pupil detection algorithm with and higher accuracy than the previous attempts for real-time analytics applications.The proposed circular hough transform with morphing canny edge detection for Pupillometery(CHMCEP)algorithm can detect even the blurred or noisy images by using different filtering methods in the pre-processing or start phase to remove the blur and noise and finally the second filtering process before the circular Hough transform for the center fitting to make sure better accuracy.The performance of the proposed CHMCEP algorithm was tested against recent pupil detection methods.Simulations and results show that the proposed CHMCEP algorithm achieved detection rates of 87.11,78.54,58,and 78 according to´Swirski,ExCuSe,Else,and labeled pupils in the wild(LPW)data sets,respectively.These results show that the proposed approach performs better than the other pupil detection methods by a large margin by providing exact and robust pupil positions on challenging ordinary eye pictures.
文摘Artificial intelligence,which has recently emerged with the rapid development of information technology,is drawing attention as a tool for solving various problems demanded by society and industry.In particular,convolutional neural networks(CNNs),a type of deep learning technology,are highlighted in computer vision fields,such as image classification and recognition and object tracking.Training these CNN models requires a large amount of data,and a lack of data can lead to performance degradation problems due to overfitting.As CNN architecture development and optimization studies become active,ensemble techniques have emerged to perform image classification by combining features extracted from multiple CNN models.In this study,data augmentation and contour image extraction were performed to overcome the data shortage problem.In addition,we propose a hierarchical ensemble technique to achieve high image classification accuracy,even if trained from a small amount of data.First,we trained the UCMerced land use dataset and the contour images for each image on pretrained VGGNet,GoogLeNet,ResNet,DenseNet,and EfficientNet.We then apply a hierarchical ensemble technique to the number of cases in which each model can be deployed.These experiments were performed in cases where the proportion of training datasets was 30%,50%,and 70%,resulting in a performance improvement of up to 4.68%compared to the average accuracy of the entire model.