The ability to recognize novel objects with a few visual samples is critical in the robotic applications.Existing methods mainly concern the recognition of inter-category objects,however,the object recognition fromdif...The ability to recognize novel objects with a few visual samples is critical in the robotic applications.Existing methods mainly concern the recognition of inter-category objects,however,the object recognition fromdifferent sub-classes within the same category remains challenging due to their similar appearances.In thispaper,we propose a key-part attention retrieval solution to distinguish novel objects of different sub-classesaccording to a few samples without re-training.Especially,an object encoder,including convolutional neuralnetwork with attention and key-part aggregation,is designed to generate object attention map and extract theobject-level embedding,where object attention map from the middle stage of the backbone is used to guide thekey-part aggregation.Besides,to overcome the non-differentiability drawback of key-part attention,the objectencoder is trained in a two-step scheme,and a more stable object-level embedding is obtained.On this basis,the potential objects are located from a scene image by mining connected domains of the attention map.Bymatching the embedding of each potential object and embeddings from support data,the recognition of thepotential objects is achieved.The effectiveness of the proposed method is verified by experiments.展开更多
基金supported by the National Natural Science Foundation of China(Nos.62073322 and 61973302)the CIE-Tencent Robotics X Rhino-Bird Focused Research Program(No.2022-07)the Beijing Natural Science Foundation(No.2022MQ05).
文摘The ability to recognize novel objects with a few visual samples is critical in the robotic applications.Existing methods mainly concern the recognition of inter-category objects,however,the object recognition fromdifferent sub-classes within the same category remains challenging due to their similar appearances.In thispaper,we propose a key-part attention retrieval solution to distinguish novel objects of different sub-classesaccording to a few samples without re-training.Especially,an object encoder,including convolutional neuralnetwork with attention and key-part aggregation,is designed to generate object attention map and extract theobject-level embedding,where object attention map from the middle stage of the backbone is used to guide thekey-part aggregation.Besides,to overcome the non-differentiability drawback of key-part attention,the objectencoder is trained in a two-step scheme,and a more stable object-level embedding is obtained.On this basis,the potential objects are located from a scene image by mining connected domains of the attention map.Bymatching the embedding of each potential object and embeddings from support data,the recognition of thepotential objects is achieved.The effectiveness of the proposed method is verified by experiments.