Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms...Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image,improving the effectiveness of identifying relevant image regions at each step of caption generation.However,providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features.Consequently,this leads to enhanced captioning network performance.In light of this,we present an image captioning framework that efficiently exploits the extracted representations of the image.Our framework comprises three key components:the Visual Feature Detector module(VFD),the Visual Feature Visual Attention module(VFVA),and the language model.The VFD module is responsible for detecting a subset of the most pertinent features from the local visual features,creating an updated visual features matrix.Subsequently,the VFVA directs its attention to the visual features matrix generated by the VFD,resulting in an updated context vector employed by the language model to generate an informative description.Integrating the VFD and VFVA modules introduces an additional layer of processing for the visual features,thereby contributing to enhancing the image captioning model’s performance.Using the MS-COCO dataset,our experiments show that the proposed framework competes well with state-of-the-art methods,effectively leveraging visual representations to improve performance.The implementation code can be found here:https://github.com/althobhani/VFDICM(accessed on 30 July 2024).展开更多
Applying machine learning to lemon defect recognition can improve the efficiency of lemon quality detection. This paper proposes a deep learning-based classification method with visual feature extraction and transfer ...Applying machine learning to lemon defect recognition can improve the efficiency of lemon quality detection. This paper proposes a deep learning-based classification method with visual feature extraction and transfer learning to recognize defect lemons (</span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;">, green and mold defects). First, the data enhancement and brightness compensation techniques are used for data prepossessing. The visual feature extraction is used to quantify the defects and determine the feature variables as the bandit basis for classification. Then we construct a convolutional neural network with an embedded Visual Geome</span><span style="font-family:Verdana;">try Group 16 based (VGG16-based) network using transfer learning. The proposed model is compared with many benchmark models such as</span><span style="font-family:Verdana;"> K-</span></span><span style="font-family:Verdana;">n</span><span style="font-family:Verdana;">earest</span><span style="font-family:""> </span><span style="font-family:Verdana;">Neighbor (KNN) and Support Vector Machine (SVM). Result</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;"> show that the proposed model achieves the highest accuracy (95.44%) in the testing data set. The research provides a new solution for lemon defect recognition.展开更多
As a well-known urban landscape concept to describe urban space quality,urban street vitality is a subjective human perception of the urban environment but difficult to evaluate directly from the physical space.The st...As a well-known urban landscape concept to describe urban space quality,urban street vitality is a subjective human perception of the urban environment but difficult to evaluate directly from the physical space.The study utilized a modern machine learning computer vision algorithm in the urban build environment to simulate the process,which starts with the visual perception of the urban street landscape and ends with the human reaction to street vitality.By analyzing the optimized trained model,we tried to identify urban street vitality’s visual features and evaluate their importance.A region around the Mochou Lake in Nanjing,China,was set as our study area.Seven investigators surveyed the area,recorded their evaluation score on each site’s vitality level with a corresponding picture taken on site.A total of 370 pictures and recorded score pairs from 231 valid survey sites were used to train a convolutional neural network.After optimization,a deep neural network model with 43 layers,including 11 convolutional ones,was created.Heat maps were then used to identify the features which lead to high vitality score outputs.The spatial distributions of different types of feature entities were also analyzed to help identify the spatial effects.The study found that visual features,including human,construction site,shop front,and roadside/walking pavement,are vital ones that correspond to the vitality of the urban street.The consistency of these critical features with traditional urban vitality features indicates the model had learned useful knowledge from the training process.Applying the trained model in urban planning practices can help to improve the city environment for better attraction of residents’activities and communications.展开更多
Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseli...Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques.In this work,we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view.To fully explore the precious information from source LF captures,we propose a novel occlusion-aware source sampler(OSS)module which efficiently transfers the pixels of source views to the target view′s frustum in an occlusion-aware manner.An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF.The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles,but also proves to be able to effectively enhance the visual rendering quality.Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods.展开更多
Deception detection plays a crucial role in criminal investigation.Videos contain a wealth of information regarding apparent and physiological changes in individuals,and thus can serve as an effective means of decepti...Deception detection plays a crucial role in criminal investigation.Videos contain a wealth of information regarding apparent and physiological changes in individuals,and thus can serve as an effective means of deception detection.In this paper,we investigate video-based deception detection considering both apparent visual features such as eye gaze,head pose and facial action unit(AU),and non-contact heart rate detected by remote photoplethysmography(rPPG)technique.Multiple wrapper-based feature selection methods combined with the K-nearest neighbor(KNN)and support vector machine(SVM)classifiers are employed to screen the most effective features for deception detection.We evaluate the performance of the proposed method on both a self-collected physiological-assisted visual deception detection(PV3D)dataset and a public bag-oflies(BOL)dataset.Experimental results demonstrate that the SVM classifier with symbiotic organisms search(SOS)feature selection yields the best overall performance,with an area under the curve(AUC)of 83.27%and accuracy(ACC)of 83.33%for PV3D,and an AUC of 71.18%and ACC of 70.33%for BOL.This demonstrates the stability and effectiveness of the proposed method in video-based deception detection tasks.展开更多
The quality of oranges is grounded on their appearance and diameter.Appearance refers to the skin’s smoothness and surface cleanliness;diameter refers to the transverse diameter size.They are visual attributes that v...The quality of oranges is grounded on their appearance and diameter.Appearance refers to the skin’s smoothness and surface cleanliness;diameter refers to the transverse diameter size.They are visual attributes that visual perception technologies can automatically identify.Nonetheless,the current orange quality assessment needs to address two issues:1)There are no image datasets for orange quality grading;2)It is challenging to effectively learn the fine-grained and distinct visual semantics of oranges from diverse angles.This study collected 12522 images from 2087 oranges for multi-grained grading tasks.In addition,it presented a visual learning graph convolution approach for multi-grained orange quality grading,including a backbone network and a graph convolutional network(GCN).The backbone network’s object detection,data augmentation,and feature extraction can remove extraneous visual information.GCN was utilized to learn the topological semantics of orange feature maps.Finally,evaluation results proved that the recognition accuracy of diameter size,appearance,and fine-grained orange quality were 99.50,97.27,and 97.99%,respectively,indicating that the proposed approach is superior to others.展开更多
The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the ...The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the problem of semantic gap that low level features extracted by computers always fail to coincide with high-level concepts interpreted by humans. In this paper, we present a generic scheme for the detection video semantic concepts based on multiple visual features machine learning. Various global and local low-level visual features are systelrtically investigated, and kernelbased learning method equips the concept detection system to explore the potential of these features. Then we combine the different features and sub-systen on both classifier-level and kernel-level fusion that contribute to a more robust system Our proposed system is tested on the TRECVID dataset. The resulted Mean Average Precision (MAP) score is rmch better than the benchmark perforrmnce, which proves that our concepts detection engine develops a generic model and perforrrs well on both object and scene type concepts.展开更多
With the rapid development of the Internet,the types of webpages are more abundant than in previous decades.However,it becomes severe that people are facing more and more significant network security risks and enormou...With the rapid development of the Internet,the types of webpages are more abundant than in previous decades.However,it becomes severe that people are facing more and more significant network security risks and enormous losses caused by phishing webpages,which imitate the interface of real webpages and deceive the victims.To better identify and distinguish phishing webpages,a visual feature extraction method and a visual similarity algorithm are proposed.First,the visual feature extraction method improves the Visionbased Page Segmentation(VIPS)algorithm to extract the visual block and calculate its signature by perceptual hash technology.Second,the visual similarity algorithm presents a one-to-one correspondence based on the visual blocks’coordinates and thresholds.Then the weights are assigned according to the tree structure,and the similarity of the visual blocks is calculated on the basis of the measurement of the visual features’Hamming distance.Further,the visual similarity of webpages is generated by integrating the similarity and weight of different visual blocks.Finally,multiple pairs of phishing webpages and legitimate webpages are evaluated to verify the feasibility of the algorithm.The experimental results achieve excellent performance and demonstrate that our method can achieve 94%accuracy.展开更多
In the paper a referral system to assist the medical experts in the screening/referral of diabetic retinopathy is suggested. The system has been developed by a sequential use of different existing mathematical techniq...In the paper a referral system to assist the medical experts in the screening/referral of diabetic retinopathy is suggested. The system has been developed by a sequential use of different existing mathematical techniques. These techniques involve speeded up robust features(SURF), K-means clustering and visual dictionaries(VD). Three databases are mixed to test the working of the system when the sources are dissimilar. When experiments were performed an area under the curve(AUC) of 0.9343 was attained. The results acquired from the system are promising.展开更多
Target tracking is one typical application of visual servoing technology. It is still a difficult task to track high speed target with current visual servo system. The improvement of visual servoing scheme is strongly...Target tracking is one typical application of visual servoing technology. It is still a difficult task to track high speed target with current visual servo system. The improvement of visual servoing scheme is strongly required. A position-based visual servo parallel system is presented for tracking target with high speed. A local Frenet frame is assigned to the sampling point of spatial trajectory. Position estimation is formed by the differential features of intrinsic geometry, and orientation estimation is formed by homogenous transformation. The time spent for searching and processing can be greatly reduced by shifting the window according to features location prediction. The simulation results have demonstrated the ability of the system to track spatial moving object.展开更多
Localization plays a vital role in the mobile robot navigation system and is a fundamental capability for autonomous movement.In an indoor environment,the current mainstream localization scheme uses two-dimensional(2D...Localization plays a vital role in the mobile robot navigation system and is a fundamental capability for autonomous movement.In an indoor environment,the current mainstream localization scheme uses two-dimensional(2D)laser light detection and ranging(LiDAR)to build an occupancy grid map with simultaneous localization and mapping(SLAM)technology;it then locates the robot based on the known grid map.However,such solutions work effectively only in those areas with salient geometrical features.For areas with repeated,symmetrical,or similar structures,such as a long corridor,the conventional particle filtering method will fail.To solve this crucial problem,this paper presents a novel coarse-to-fine paradigm that uses visual features to assist mobile robot localization in a long corridor.First,the mobile robot is remote-controlled to move from the starting position to the end along a middle line.In the moving process,a grid map is built using the laser-based SLAM method.At the same time,a visual map consisting of special images which are keyframes is created according to a keyframe selection strategy.The keyframes are associated with the robot’s poses through timestamps.Second,a moving strategy is proposed,based on the extracted range features of the laser scans,to decide on an initial rough position.This is vital for the mobile robot because it gives instructions on where the robot needs to move to adjust its pose.Third,the mobile robot captures images in a proper perspective according to the moving strategy and matches them with the image map to achieve a coarse localization.Finally,an improved particle filtering method is presented to achieve fine localization.Experimental results show that our method is effective and robust for global localization.The localization success rate reaches 98.8%while the average moving distance is only 0.31 m.In addition,the method works well when the mobile robot is kidnapped to another position in the corridor.展开更多
Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of eff...Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of efforts trying to automate the classification operation and retrieve similar images accurately.To reach this goal,we developed a VGG19 deep convolutional neural network to extract the visual features from the images automatically.Then,the distances among the extracted features vectors are measured and a similarity score is generated using a Siamese deep neural network.The Siamese model built and trained at first from scratch but,it didn’t generated high evaluation metrices.Thus,we re-built it from VGG19 pre-trained deep learning model to generate higher evaluation metrices.Afterward,three different distance metrics combined with the Sigmoid activation function are experimented looking for the most accurate method formeasuring the similarities among the retrieved images.Reaching that the highest evaluation parameters generated using the Cosine distance metric.Moreover,the Graphics Processing Unit(GPU)utilized to run the code instead of running it on the Central Processing Unit(CPU).This step optimized the execution further since it expedited both the training and the retrieval time efficiently.After extensive experimentation,we reached satisfactory solution recording 0.98 and 0.99 F-score for the classification and for the retrieval,respectively.展开更多
It is the development trend of library information management,which applies the mature and cutting-edge information technology to library information retrieval.In order to realize the rapid retrieval of massive book i...It is the development trend of library information management,which applies the mature and cutting-edge information technology to library information retrieval.In order to realize the rapid retrieval of massive book information,this paper proposes a book retrieval method combining QR code with image retrieval technology.This method analyzes the visual features of book images,design a book image retrieval method based on boundary contour and regional pixel distribution features,and realizes the association retrieval of book information combined with the QR code,so as to improve the efficiency of book retrieval.The experimental results show that,the books can be retrieved effectively through the boundary contour and regional pixel distribution features,the book information can be displayed through QR code,readers can be provided with fast and intelligent massive book retrieval services.展开更多
The dosage of gold-antimony flotation process of 5 main drugs,including Copper Sulfate,Lead Nitrate,Yellow Medicine,No.2 Oil,Black Medicine,with corresponding visual features of foam images,including Stability,Gray Sc...The dosage of gold-antimony flotation process of 5 main drugs,including Copper Sulfate,Lead Nitrate,Yellow Medicine,No.2 Oil,Black Medicine,with corresponding visual features of foam images,including Stability,Gray Scale,Mean R,Mean G,Mean B,Mean Average,Dimension and Degree Variance,were recorded.Parameter correlation analysis showed that the correlation among Copper Sulfate,Yellow Medicine,Black Medicine,as well as the correlation among Gray Scale,Mean R,Mean G,Mean B,is strong,and the correlation among Dimension,Gray Scale,Mean R,Mean G,Mean B,as well as the correlation between Stability and each dosing parameter,is week.It also indicated a feasible way to decrease the complexity of flotation control system by reducing some parameters.展开更多
Aerodynamic surrogate modeling mostly relies only on integrated loads data obtained from simulation or experiment,while neglecting and wasting the valuable distributed physical information on the surface.To make full ...Aerodynamic surrogate modeling mostly relies only on integrated loads data obtained from simulation or experiment,while neglecting and wasting the valuable distributed physical information on the surface.To make full use of both integrated and distributed loads,a modeling paradigm,called the heterogeneous data-driven aerodynamic modeling,is presented.The essential concept is to incorporate the physical information of distributed loads as additional constraints within the end-to-end aerodynamic modeling.Towards heterogenous data,a novel and easily applicable physical feature embedding modeling framework is designed.This framework extracts lowdimensional physical features from pressure distribution and then effectively enhances the modeling of the integrated loads via feature embedding.The proposed framework can be coupled with multiple feature extraction methods,and the well-performed generalization capabilities over different airfoils are verified through a transonic case.Compared with traditional direct modeling,the proposed framework can reduce testing errors by almost 50%.Given the same prediction accuracy,it can save more than half of the training samples.Furthermore,the visualization analysis has revealed a significant correlation between the discovered low-dimensional physical features and the heterogeneous aerodynamic loads,which shows the interpretability and credibility of the superior performance offered by the proposed deep learning framework.展开更多
Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate eval...Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate evaluator for visual experience,thus the modeling of human visual system(HVS)is a core issue for objective IQA and visual experience optimization.The traditional model based on black box fitting has low interpretability and it is difficult to guide the experience optimization effectively,while the model based on physiological simulation is hard to integrate into practical visual communication services due to its high computational complexity.For bridging the gap between signal distortion and visual experience,in this paper,we propose a novel perceptual no-reference(NR)IQA algorithm based on structural computational modeling of HVS.According to the mechanism of the human brain,we divide the visual signal processing into a low-level visual layer,a middle-level visual layer and a high-level visual layer,which conduct pixel information processing,primitive information processing and global image information processing,respectively.The natural scene statistics(NSS)based features,deep features and free-energy based features are extracted from these three layers.The support vector regression(SVR)is employed to aggregate features to the final quality prediction.Extensive experimental comparisons on three widely used benchmark IQA databases(LIVE,CSIQ and TID2013)demonstrate that our proposed metric is highly competitive with or outperforms the state-of-the-art NR IQA measures.展开更多
基金supported by the National Natural Science Foundation of China(Nos.U22A2034,62177047)High Caliber Foreign Experts Introduction Plan funded by MOST,and Central South University Research Programme of Advanced Interdisciplinary Studies(No.2023QYJC020).
文摘Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image,improving the effectiveness of identifying relevant image regions at each step of caption generation.However,providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features.Consequently,this leads to enhanced captioning network performance.In light of this,we present an image captioning framework that efficiently exploits the extracted representations of the image.Our framework comprises three key components:the Visual Feature Detector module(VFD),the Visual Feature Visual Attention module(VFVA),and the language model.The VFD module is responsible for detecting a subset of the most pertinent features from the local visual features,creating an updated visual features matrix.Subsequently,the VFVA directs its attention to the visual features matrix generated by the VFD,resulting in an updated context vector employed by the language model to generate an informative description.Integrating the VFD and VFVA modules introduces an additional layer of processing for the visual features,thereby contributing to enhancing the image captioning model’s performance.Using the MS-COCO dataset,our experiments show that the proposed framework competes well with state-of-the-art methods,effectively leveraging visual representations to improve performance.The implementation code can be found here:https://github.com/althobhani/VFDICM(accessed on 30 July 2024).
文摘Applying machine learning to lemon defect recognition can improve the efficiency of lemon quality detection. This paper proposes a deep learning-based classification method with visual feature extraction and transfer learning to recognize defect lemons (</span><i><span style="font-family:Verdana;">i.e.</span></i><span style="font-family:Verdana;">, green and mold defects). First, the data enhancement and brightness compensation techniques are used for data prepossessing. The visual feature extraction is used to quantify the defects and determine the feature variables as the bandit basis for classification. Then we construct a convolutional neural network with an embedded Visual Geome</span><span style="font-family:Verdana;">try Group 16 based (VGG16-based) network using transfer learning. The proposed model is compared with many benchmark models such as</span><span style="font-family:Verdana;"> K-</span></span><span style="font-family:Verdana;">n</span><span style="font-family:Verdana;">earest</span><span style="font-family:""> </span><span style="font-family:Verdana;">Neighbor (KNN) and Support Vector Machine (SVM). Result</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;"> show that the proposed model achieves the highest accuracy (95.44%) in the testing data set. The research provides a new solution for lemon defect recognition.
基金This work was supported by the China Scholarship Council[grant number 201706195004]the National Natural Science Foundation of China[grant numbers 41001093 and 51778278]the Social Science Foundation of Jiangsu Province,China[grant number 18GLB014].
文摘As a well-known urban landscape concept to describe urban space quality,urban street vitality is a subjective human perception of the urban environment but difficult to evaluate directly from the physical space.The study utilized a modern machine learning computer vision algorithm in the urban build environment to simulate the process,which starts with the visual perception of the urban street landscape and ends with the human reaction to street vitality.By analyzing the optimized trained model,we tried to identify urban street vitality’s visual features and evaluate their importance.A region around the Mochou Lake in Nanjing,China,was set as our study area.Seven investigators surveyed the area,recorded their evaluation score on each site’s vitality level with a corresponding picture taken on site.A total of 370 pictures and recorded score pairs from 231 valid survey sites were used to train a convolutional neural network.After optimization,a deep neural network model with 43 layers,including 11 convolutional ones,was created.Heat maps were then used to identify the features which lead to high vitality score outputs.The spatial distributions of different types of feature entities were also analyzed to help identify the spatial effects.The study found that visual features,including human,construction site,shop front,and roadside/walking pavement,are vital ones that correspond to the vitality of the urban street.The consistency of these critical features with traditional urban vitality features indicates the model had learned useful knowledge from the training process.Applying the trained model in urban planning practices can help to improve the city environment for better attraction of residents’activities and communications.
基金the Theme-based Research Scheme,Research Grants Council of Hong Kong(No.T45-205/21-N).
文摘Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques.In this work,we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view.To fully explore the precious information from source LF captures,we propose a novel occlusion-aware source sampler(OSS)module which efficiently transfers the pixels of source views to the target view′s frustum in an occlusion-aware manner.An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF.The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles,but also proves to be able to effectively enhance the visual rendering quality.Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods.
基金National Natural Science Foundation of China(No.62271186)Anhui Key Project of Research and Development Plan(No.202104d07020005)。
文摘Deception detection plays a crucial role in criminal investigation.Videos contain a wealth of information regarding apparent and physiological changes in individuals,and thus can serve as an effective means of deception detection.In this paper,we investigate video-based deception detection considering both apparent visual features such as eye gaze,head pose and facial action unit(AU),and non-contact heart rate detected by remote photoplethysmography(rPPG)technique.Multiple wrapper-based feature selection methods combined with the K-nearest neighbor(KNN)and support vector machine(SVM)classifiers are employed to screen the most effective features for deception detection.We evaluate the performance of the proposed method on both a self-collected physiological-assisted visual deception detection(PV3D)dataset and a public bag-oflies(BOL)dataset.Experimental results demonstrate that the SVM classifier with symbiotic organisms search(SOS)feature selection yields the best overall performance,with an area under the curve(AUC)of 83.27%and accuracy(ACC)of 83.33%for PV3D,and an AUC of 71.18%and ACC of 70.33%for BOL.This demonstrates the stability and effectiveness of the proposed method in video-based deception detection tasks.
基金supported by the National Natural Science Foundation of China(31901240,31971792)the Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences(CAAS-ASTIP-2016-AⅡ)the Central Public-interest Scientific Institution Basal Research Funds,China(Y2022QC17,CAAS-ZDRW202107).
文摘The quality of oranges is grounded on their appearance and diameter.Appearance refers to the skin’s smoothness and surface cleanliness;diameter refers to the transverse diameter size.They are visual attributes that visual perception technologies can automatically identify.Nonetheless,the current orange quality assessment needs to address two issues:1)There are no image datasets for orange quality grading;2)It is challenging to effectively learn the fine-grained and distinct visual semantics of oranges from diverse angles.This study collected 12522 images from 2087 oranges for multi-grained grading tasks.In addition,it presented a visual learning graph convolution approach for multi-grained orange quality grading,including a backbone network and a graph convolutional network(GCN).The backbone network’s object detection,data augmentation,and feature extraction can remove extraneous visual information.GCN was utilized to learn the topological semantics of orange feature maps.Finally,evaluation results proved that the recognition accuracy of diameter size,appearance,and fine-grained orange quality were 99.50,97.27,and 97.99%,respectively,indicating that the proposed approach is superior to others.
基金Acknowledgements This paper was supported by the coUabomtive Research Project SEV under Cant No. 01100474 between Beijing University of Posts and Telecorrrcnications and France Telecom R&D Beijing the National Natural Science Foundation of China under Cant No. 90920001 the Caduate Innovation Fund of SICE, BUPT, 2011.
文摘The rapid growth of multimedia content necessitates powerful technologies to filter, classify, index and retrieve video documents more efficiently. However, the essential bottleneck of image and video analysis is the problem of semantic gap that low level features extracted by computers always fail to coincide with high-level concepts interpreted by humans. In this paper, we present a generic scheme for the detection video semantic concepts based on multiple visual features machine learning. Various global and local low-level visual features are systelrtically investigated, and kernelbased learning method equips the concept detection system to explore the potential of these features. Then we combine the different features and sub-systen on both classifier-level and kernel-level fusion that contribute to a more robust system Our proposed system is tested on the TRECVID dataset. The resulted Mean Average Precision (MAP) score is rmch better than the benchmark perforrmnce, which proves that our concepts detection engine develops a generic model and perforrrs well on both object and scene type concepts.
基金This work is supported by the National Key R&D Program of China(2016QY05X1000)the National Natural Science Foundation of China(201561402137).
文摘With the rapid development of the Internet,the types of webpages are more abundant than in previous decades.However,it becomes severe that people are facing more and more significant network security risks and enormous losses caused by phishing webpages,which imitate the interface of real webpages and deceive the victims.To better identify and distinguish phishing webpages,a visual feature extraction method and a visual similarity algorithm are proposed.First,the visual feature extraction method improves the Visionbased Page Segmentation(VIPS)algorithm to extract the visual block and calculate its signature by perceptual hash technology.Second,the visual similarity algorithm presents a one-to-one correspondence based on the visual blocks’coordinates and thresholds.Then the weights are assigned according to the tree structure,and the similarity of the visual blocks is calculated on the basis of the measurement of the visual features’Hamming distance.Further,the visual similarity of webpages is generated by integrating the similarity and weight of different visual blocks.Finally,multiple pairs of phishing webpages and legitimate webpages are evaluated to verify the feasibility of the algorithm.The experimental results achieve excellent performance and demonstrate that our method can achieve 94%accuracy.
文摘In the paper a referral system to assist the medical experts in the screening/referral of diabetic retinopathy is suggested. The system has been developed by a sequential use of different existing mathematical techniques. These techniques involve speeded up robust features(SURF), K-means clustering and visual dictionaries(VD). Three databases are mixed to test the working of the system when the sources are dissimilar. When experiments were performed an area under the curve(AUC) of 0.9343 was attained. The results acquired from the system are promising.
基金This project is supported by National Electric Power Corporation Foundation of China(No.SPKJ010-27).
文摘Target tracking is one typical application of visual servoing technology. It is still a difficult task to track high speed target with current visual servo system. The improvement of visual servoing scheme is strongly required. A position-based visual servo parallel system is presented for tracking target with high speed. A local Frenet frame is assigned to the sampling point of spatial trajectory. Position estimation is formed by the differential features of intrinsic geometry, and orientation estimation is formed by homogenous transformation. The time spent for searching and processing can be greatly reduced by shifting the window according to features location prediction. The simulation results have demonstrated the ability of the system to track spatial moving object.
基金supported by the National Natural Science Foundation of China(Nos.61703067,61803058,51604056,and 51775076)the Science and Technology Research Project of Chongqing Education Commission,China(No.KJ1704072)the Doctoral Talent Train Project of Chongqing University of Posts and Telecommunications,China(No.BYJS202006)。
文摘Localization plays a vital role in the mobile robot navigation system and is a fundamental capability for autonomous movement.In an indoor environment,the current mainstream localization scheme uses two-dimensional(2D)laser light detection and ranging(LiDAR)to build an occupancy grid map with simultaneous localization and mapping(SLAM)technology;it then locates the robot based on the known grid map.However,such solutions work effectively only in those areas with salient geometrical features.For areas with repeated,symmetrical,or similar structures,such as a long corridor,the conventional particle filtering method will fail.To solve this crucial problem,this paper presents a novel coarse-to-fine paradigm that uses visual features to assist mobile robot localization in a long corridor.First,the mobile robot is remote-controlled to move from the starting position to the end along a middle line.In the moving process,a grid map is built using the laser-based SLAM method.At the same time,a visual map consisting of special images which are keyframes is created according to a keyframe selection strategy.The keyframes are associated with the robot’s poses through timestamps.Second,a moving strategy is proposed,based on the extracted range features of the laser scans,to decide on an initial rough position.This is vital for the mobile robot because it gives instructions on where the robot needs to move to adjust its pose.Third,the mobile robot captures images in a proper perspective according to the moving strategy and matches them with the image map to achieve a coarse localization.Finally,an improved particle filtering method is presented to achieve fine localization.Experimental results show that our method is effective and robust for global localization.The localization success rate reaches 98.8%while the average moving distance is only 0.31 m.In addition,the method works well when the mobile robot is kidnapped to another position in the corridor.
基金The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4400271DSR01).
文摘Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of efforts trying to automate the classification operation and retrieve similar images accurately.To reach this goal,we developed a VGG19 deep convolutional neural network to extract the visual features from the images automatically.Then,the distances among the extracted features vectors are measured and a similarity score is generated using a Siamese deep neural network.The Siamese model built and trained at first from scratch but,it didn’t generated high evaluation metrices.Thus,we re-built it from VGG19 pre-trained deep learning model to generate higher evaluation metrices.Afterward,three different distance metrics combined with the Sigmoid activation function are experimented looking for the most accurate method formeasuring the similarities among the retrieved images.Reaching that the highest evaluation parameters generated using the Cosine distance metric.Moreover,the Graphics Processing Unit(GPU)utilized to run the code instead of running it on the Central Processing Unit(CPU).This step optimized the execution further since it expedited both the training and the retrieval time efficiently.After extensive experimentation,we reached satisfactory solution recording 0.98 and 0.99 F-score for the classification and for the retrieval,respectively.
文摘It is the development trend of library information management,which applies the mature and cutting-edge information technology to library information retrieval.In order to realize the rapid retrieval of massive book information,this paper proposes a book retrieval method combining QR code with image retrieval technology.This method analyzes the visual features of book images,design a book image retrieval method based on boundary contour and regional pixel distribution features,and realizes the association retrieval of book information combined with the QR code,so as to improve the efficiency of book retrieval.The experimental results show that,the books can be retrieved effectively through the boundary contour and regional pixel distribution features,the book information can be displayed through QR code,readers can be provided with fast and intelligent massive book retrieval services.
基金This work is supported by the Natural Science Foundation of China with Nos.61621062,61773407 and 61872408Hunan Province Science Foundation of China with No.2016JJ6136.
文摘The dosage of gold-antimony flotation process of 5 main drugs,including Copper Sulfate,Lead Nitrate,Yellow Medicine,No.2 Oil,Black Medicine,with corresponding visual features of foam images,including Stability,Gray Scale,Mean R,Mean G,Mean B,Mean Average,Dimension and Degree Variance,were recorded.Parameter correlation analysis showed that the correlation among Copper Sulfate,Yellow Medicine,Black Medicine,as well as the correlation among Gray Scale,Mean R,Mean G,Mean B,is strong,and the correlation among Dimension,Gray Scale,Mean R,Mean G,Mean B,as well as the correlation between Stability and each dosing parameter,is week.It also indicated a feasible way to decrease the complexity of flotation control system by reducing some parameters.
基金supported by the National Natural Science Foundation of China(Nos.92152301,12072282)。
文摘Aerodynamic surrogate modeling mostly relies only on integrated loads data obtained from simulation or experiment,while neglecting and wasting the valuable distributed physical information on the surface.To make full use of both integrated and distributed loads,a modeling paradigm,called the heterogeneous data-driven aerodynamic modeling,is presented.The essential concept is to incorporate the physical information of distributed loads as additional constraints within the end-to-end aerodynamic modeling.Towards heterogenous data,a novel and easily applicable physical feature embedding modeling framework is designed.This framework extracts lowdimensional physical features from pressure distribution and then effectively enhances the modeling of the integrated loads via feature embedding.The proposed framework can be coupled with multiple feature extraction methods,and the well-performed generalization capabilities over different airfoils are verified through a transonic case.Compared with traditional direct modeling,the proposed framework can reduce testing errors by almost 50%.Given the same prediction accuracy,it can save more than half of the training samples.Furthermore,the visualization analysis has revealed a significant correlation between the discovered low-dimensional physical features and the heterogeneous aerodynamic loads,which shows the interpretability and credibility of the superior performance offered by the proposed deep learning framework.
基金This work was supported by National Natural Science Foundation of China(Nos.61831015 and 61901260)Key Research and Development Program of China(No.2019YFB1405902).
文摘Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate evaluator for visual experience,thus the modeling of human visual system(HVS)is a core issue for objective IQA and visual experience optimization.The traditional model based on black box fitting has low interpretability and it is difficult to guide the experience optimization effectively,while the model based on physiological simulation is hard to integrate into practical visual communication services due to its high computational complexity.For bridging the gap between signal distortion and visual experience,in this paper,we propose a novel perceptual no-reference(NR)IQA algorithm based on structural computational modeling of HVS.According to the mechanism of the human brain,we divide the visual signal processing into a low-level visual layer,a middle-level visual layer and a high-level visual layer,which conduct pixel information processing,primitive information processing and global image information processing,respectively.The natural scene statistics(NSS)based features,deep features and free-energy based features are extracted from these three layers.The support vector regression(SVR)is employed to aggregate features to the final quality prediction.Extensive experimental comparisons on three widely used benchmark IQA databases(LIVE,CSIQ and TID2013)demonstrate that our proposed metric is highly competitive with or outperforms the state-of-the-art NR IQA measures.