A salient scene is an area within an image that contains visual elements that stand out from surrounding areas.They are important for distinguishing landmarks in first-person-view(FPV)applications and determining spat...A salient scene is an area within an image that contains visual elements that stand out from surrounding areas.They are important for distinguishing landmarks in first-person-view(FPV)applications and determining spatial relations in images.The relative spatial relation between salient scenes acts as a visual guide that is easily accepted and understood by users in FPV applications.However,current digitally navigable maps and location-based services fall short of providing information on visual spatial relations for users.This shortcoming has a critical influence on the popularity and innovation of FPV applications.This paper addresses the issue by proposing a method for detecting visually salient scene areas(SSAs)and deriving their relative spatial relationships from continuous panoramas.This method includes three critical steps.First,an SSA detection approach is introduced by fusing region-based saliency derived from super-pixel segmentation and the frequency-tuned saliency model.The method focuses on a segmented landmark area in a panorama.Secondly,a street-view-oriented SSA generation method is introduced by matching and merging the visual SSAs from continuous panoramas.Thirdly,a continuous geotagged panorama-based referencing approach is introduced to derive the relative spatial relationships of SSAs from continuous panoramas.This information includes the relative azimuth,elevation angle,and the relative distance.Experiment results show that the error for the SSA relative azimuth angle is approximately±6°(with an average error of 2.67°),and the SSA relative elevation angle is approximately±4°(with an average error of 1.32°)when using Baidu street-view panoramas.These results demonstrate the feasibility of the proposed approach.The method proposed in this study can facilitate the development of FPV applications such as augmented reality(AR)and pedestrian navigation using proper spatial relation.展开更多
A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can ...A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition.展开更多
The long-term goal of artificial intelligence (AI) is to make machines learn and think like human beings. Due to the high levels of uncertainty and vulnerability in human life and the open-ended nature of problems t...The long-term goal of artificial intelligence (AI) is to make machines learn and think like human beings. Due to the high levels of uncertainty and vulnerability in human life and the open-ended nature of problems that humans are facing, no matter how intelligent machines are, they are unable to completely replace humans. Therefore, it is necessary to introduce human cognitive capabilities or human-like cognitive models into AI systems to develop a new form of AI, that is, hybrid-augmented intelligence. This form of AI or machine intelligence is a feasible and important developing model. Hybrid-augmented intelligence can be divided into two basic models: one is human-in-the-loop augmented intelligence with human-computer collaboration, and the other is cognitive computing based augmented intelligence, in which a cognitive model is embedded in the machine learning system. This survey describes a basic framework for human-computer collaborative hybrid-augmented intelligence, and the basic elements of hybrid-augmented intelligence based on cognitive computing. These elements include intuitive reasoning, causal models, evolution of memory and knowledge, especially the role and basic principles of intuitive reasoning for complex problem solving, and the cognitive learning framework for visual scene understanding based on memory and reasoning. Several typical applications of hybrid-augmented intelligence in related fields are given.展开更多
基金supported in part by the National Natural Science Foundation of China(Grants 41771473,41231171)National Key Research Development Program of China(Grant 2017YFB0503802).
文摘A salient scene is an area within an image that contains visual elements that stand out from surrounding areas.They are important for distinguishing landmarks in first-person-view(FPV)applications and determining spatial relations in images.The relative spatial relation between salient scenes acts as a visual guide that is easily accepted and understood by users in FPV applications.However,current digitally navigable maps and location-based services fall short of providing information on visual spatial relations for users.This shortcoming has a critical influence on the popularity and innovation of FPV applications.This paper addresses the issue by proposing a method for detecting visually salient scene areas(SSAs)and deriving their relative spatial relationships from continuous panoramas.This method includes three critical steps.First,an SSA detection approach is introduced by fusing region-based saliency derived from super-pixel segmentation and the frequency-tuned saliency model.The method focuses on a segmented landmark area in a panorama.Secondly,a street-view-oriented SSA generation method is introduced by matching and merging the visual SSAs from continuous panoramas.Thirdly,a continuous geotagged panorama-based referencing approach is introduced to derive the relative spatial relationships of SSAs from continuous panoramas.This information includes the relative azimuth,elevation angle,and the relative distance.Experiment results show that the error for the SSA relative azimuth angle is approximately±6°(with an average error of 2.67°),and the SSA relative elevation angle is approximately±4°(with an average error of 1.32°)when using Baidu street-view panoramas.These results demonstrate the feasibility of the proposed approach.The method proposed in this study can facilitate the development of FPV applications such as augmented reality(AR)and pedestrian navigation using proper spatial relation.
文摘A two-stage algorithm based on deep learning for the detection and recognition of can bottom spray codes and numbers is proposed to address the problems of small character areas and fast production line speeds in can bottom spray code number recognition.In the coding number detection stage,Differentiable Binarization Network is used as the backbone network,combined with the Attention and Dilation Convolutions Path Aggregation Network feature fusion structure to enhance the model detection effect.In terms of text recognition,using the Scene Visual Text Recognition coding number recognition network for end-to-end training can alleviate the problem of coding recognition errors caused by image color distortion due to variations in lighting and background noise.In addition,model pruning and quantization are used to reduce the number ofmodel parameters to meet deployment requirements in resource-constrained environments.A comparative experiment was conducted using the dataset of tank bottom spray code numbers collected on-site,and a transfer experiment was conducted using the dataset of packaging box production date.The experimental results show that the algorithm proposed in this study can effectively locate the coding of cans at different positions on the roller conveyor,and can accurately identify the coding numbers at high production line speeds.The Hmean value of the coding number detection is 97.32%,and the accuracy of the coding number recognition is 98.21%.This verifies that the algorithm proposed in this paper has high accuracy in coding number detection and recognition.
基金Project supported by the Chinese Academy of Engi- neering, the National Natural Science Foundation of China (No. L1522023), the National Basic Research Program (973) of China (No. 2015CB351703), and the National Key Research and Development Plan (Nos. 2016YFB1001004 and 2016YFB1000903)
文摘The long-term goal of artificial intelligence (AI) is to make machines learn and think like human beings. Due to the high levels of uncertainty and vulnerability in human life and the open-ended nature of problems that humans are facing, no matter how intelligent machines are, they are unable to completely replace humans. Therefore, it is necessary to introduce human cognitive capabilities or human-like cognitive models into AI systems to develop a new form of AI, that is, hybrid-augmented intelligence. This form of AI or machine intelligence is a feasible and important developing model. Hybrid-augmented intelligence can be divided into two basic models: one is human-in-the-loop augmented intelligence with human-computer collaboration, and the other is cognitive computing based augmented intelligence, in which a cognitive model is embedded in the machine learning system. This survey describes a basic framework for human-computer collaborative hybrid-augmented intelligence, and the basic elements of hybrid-augmented intelligence based on cognitive computing. These elements include intuitive reasoning, causal models, evolution of memory and knowledge, especially the role and basic principles of intuitive reasoning for complex problem solving, and the cognitive learning framework for visual scene understanding based on memory and reasoning. Several typical applications of hybrid-augmented intelligence in related fields are given.