In view of the problems of the high time cost and low accuracy ofmanual supervision in traditional classroom teaching, this paper proposes a humanbody pose recognition system based on teaching interaction. The enhance...In view of the problems of the high time cost and low accuracy ofmanual supervision in traditional classroom teaching, this paper proposes a humanbody pose recognition system based on teaching interaction. The enhanced basicnetwork (ResNext-101+FPN)was used in Mask R-CNN to extract the features ofthe input images. Then based on the behavior analysis algorithm and face detectiondata, the behavior data of each student in the classroom were obtained. Moreover,the behavior data were applied to support multi-dimensional visualization. Theexperimental results show that the system can timely and effectively reflect thelearning status of students, and help teachers accurately grasp the classroom learningstate of students, so as to adjust teaching strategies in a targeted way and helpimprove the quality of teaching.展开更多
Cities are in constant change and city managers aim to keep an updated digital model of the city for city governance. There are a lot of images uploaded daily on image sharing platforms (as “Flickr”, “Twitter”, et...Cities are in constant change and city managers aim to keep an updated digital model of the city for city governance. There are a lot of images uploaded daily on image sharing platforms (as “Flickr”, “Twitter”, etc.). These images feature a rough localization and no orientation information. Nevertheless, they can help to populate an active collaborative database of street images usable to maintain a city 3D model, but their localization and orientation need to be known. Based on these images, we propose the Data Gathering system for image Pose Estimation (DGPE) that helps to find the pose (position and orientation) of the camera used to shoot them with better accuracy than the sole GPS localization that may be embedded in the image header. DGPE uses both visual and semantic information, existing in a single image processed by a fully automatic chain composed of three main layers: Data retrieval and preprocessing layer, Features extraction layer, Decision Making layer. In this article, we present the whole system details and compare its detection results with a state of the art method. Finally, we show the obtained localization, and often orientation results, combining both semantic and visual information processing on 47 images. Our multilayer system succeeds in 26% of our test cases in finding a better localization and orientation of the original photo. This is achieved by using only the image content and associated metadata. The use of semantic information found on social media such as comments, hash tags, etc. has doubled the success rate to 59%. It has reduced the search area and thus made the visual search more accurate.展开更多
In view of the increase in the number of people participating in dance rating assessments,this paper proposes a dance assessment technology based on human body posture recognition.This technique adopts the human targe...In view of the increase in the number of people participating in dance rating assessments,this paper proposes a dance assessment technology based on human body posture recognition.This technique adopts the human target detection of the dance video,extracts bone key points,and then uses the video data set col-lected by professional dancers to conduct PoseC3D model training,enabling the model to classify the basic movements of the dance;then,the dynamic time nor-malization algorithm is used to evaluate the classified movements.The experimen-tal results show that this technology can accurately identify the basic movements of various dances and accurately give the evaluation score of the corresponding movements,thus reducing the work intensity of the assessment staff.展开更多
The application of high-performance imaging sensors in space-based space surveillance systems makes it possible to recognize space objects and estimate their poses using vision-based methods. In this paper, we propose...The application of high-performance imaging sensors in space-based space surveillance systems makes it possible to recognize space objects and estimate their poses using vision-based methods. In this paper, we proposed a kernel regression-based method for joint multi-view space object recognition and pose estimation. We built a new simulated satellite image dataset named BUAA-SID 1.5 to test our method using different image representations. We evaluated our method for recognition-only tasks, pose estimation-only tasks, and joint recognition and pose estimation tasks. Experimental results show that our method outperforms the state-of-the-arts in space object recognition, and can recognize space objects and estimate their poses effectively and robustly against noise and lighting conditions.展开更多
Conventionally, image object recognition and pose estimation are two independent components in machine vision. This paper presented a simple but effective method KNNSNG, which tightly couples these two com ponents wit...Conventionally, image object recognition and pose estimation are two independent components in machine vision. This paper presented a simple but effective method KNNSNG, which tightly couples these two com ponents within a single algorithm framework. The basic idea of this method came from the bionic pattern recog nition and the manifold ways of perception. Firstly, the shortest neighborhood graphs (SNG) are established for each registered object. SNG can be regarded as a covering and triangulation for a hypersurface on which the training data are distributed. Then for recognition task, the deter mined test image lies on which SNG by employing the parameter "k", which can be calculated adaptively. Finally, the local linear approximation method is adopted to build a local map between highdimensional image space and lowdimensional manifold for pose estimation. The projective coordinates on manifold can depict the pose of object. Experiment results manifested the effectiveness of the method.展开更多
We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video.First,we combine two standard convolutional neural network models for face detection and m...We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video.First,we combine two standard convolutional neural network models for face detection and mean shape learning to generate initial estimations of alignment and pose.Then,we design a bi-objective optimization strategy to iteratively refine the obtained estimations.This strategy achieves faster speed and more accurate outputs.Finally,we further apply algebraic filtering processing,including Gaussian filter for background removal and extended Kalman filter for target prediction,to maintain real-time tracking superiority.Only general RGB photos or videos are required,which are captured by a commodity monocular camera without any priori or label.We demonstrate the advantages of our approach by comparing it with the most recent work in terms of performance and accuracy.展开更多
Recent years have witnessed significant progress in image-based 3D face reconstruction using deep convolutional neural networks.However,current reconstruction methods often perform improperly in self-occluded regions ...Recent years have witnessed significant progress in image-based 3D face reconstruction using deep convolutional neural networks.However,current reconstruction methods often perform improperly in self-occluded regions and can lead to inaccurate correspondences between a 2D input image and a 3D face template,hindering use in real applications.To address these problems,we propose a deep shape reconstruction and texture completion network,SRTC-Net,which jointly reconstructs 3D facial geometry and completes texture with correspondences from a single input face image.In SRTC-Net,we leverage the geometric cues from completed 3D texture to reconstruct detailed structures of 3D shapes.The SRTC-Net pipeline has three stages.The first introduces a correspondence network to identify pixel-wise correspondence between the input 2D image and a 3D template model,and transfers the input 2D image to a U-V texture map.Then we complete the invisible and occluded areas in the U-V texture map using an inpainting network.To get the 3D facial geometries,we predict coarse shape(U-V position maps)from the segmented face from the correspondence network using a shape network,and then refine the 3D coarse shape by regressing the U-V displacement map from the completed U-V texture map in a pixel-to-pixel way.We examine our methods on 3D reconstruction tasks as well as face frontalization and pose invariant face recognition tasks,using both in-the-lab datasets(MICC,MultiPIE)and in-the-wild datasets(CFP).The qualitative and quantitative results demonstrate the effectiveness of our methods on inferring 3D facial geometry and complete texture;they outperform or are comparable to the state-of-the-art.展开更多
基金This parper is supported by the 2019 Innovation and Entrepreneurship TrainingProgram for College Students in Jiangsu Province (Project name: Human posture recognitionbased on teaching interaction, No. 201911460042Y)This parper is supported by the National Natural Science Foundation of China Youth ScienceFoundation project (Project name: Research on Deep Discriminant Spares RepresentationLearning Method for Feature Extraction, No. 61806098)This parper is supported by Scientific Research Project of Nanjing Xiaozhuang University(Project name: Multi-robot collaborative system, No. 2017NXY16).
文摘In view of the problems of the high time cost and low accuracy ofmanual supervision in traditional classroom teaching, this paper proposes a humanbody pose recognition system based on teaching interaction. The enhanced basicnetwork (ResNext-101+FPN)was used in Mask R-CNN to extract the features ofthe input images. Then based on the behavior analysis algorithm and face detectiondata, the behavior data of each student in the classroom were obtained. Moreover,the behavior data were applied to support multi-dimensional visualization. Theexperimental results show that the system can timely and effectively reflect thelearning status of students, and help teachers accurately grasp the classroom learningstate of students, so as to adjust teaching strategies in a targeted way and helpimprove the quality of teaching.
文摘Cities are in constant change and city managers aim to keep an updated digital model of the city for city governance. There are a lot of images uploaded daily on image sharing platforms (as “Flickr”, “Twitter”, etc.). These images feature a rough localization and no orientation information. Nevertheless, they can help to populate an active collaborative database of street images usable to maintain a city 3D model, but their localization and orientation need to be known. Based on these images, we propose the Data Gathering system for image Pose Estimation (DGPE) that helps to find the pose (position and orientation) of the camera used to shoot them with better accuracy than the sole GPS localization that may be embedded in the image header. DGPE uses both visual and semantic information, existing in a single image processed by a fully automatic chain composed of three main layers: Data retrieval and preprocessing layer, Features extraction layer, Decision Making layer. In this article, we present the whole system details and compare its detection results with a state of the art method. Finally, we show the obtained localization, and often orientation results, combining both semantic and visual information processing on 47 images. Our multilayer system succeeds in 26% of our test cases in finding a better localization and orientation of the original photo. This is achieved by using only the image content and associated metadata. The use of semantic information found on social media such as comments, hash tags, etc. has doubled the success rate to 59%. It has reduced the search area and thus made the visual search more accurate.
文摘In view of the increase in the number of people participating in dance rating assessments,this paper proposes a dance assessment technology based on human body posture recognition.This technique adopts the human target detection of the dance video,extracts bone key points,and then uses the video data set col-lected by professional dancers to conduct PoseC3D model training,enabling the model to classify the basic movements of the dance;then,the dynamic time nor-malization algorithm is used to evaluate the classified movements.The experimen-tal results show that this technology can accurately identify the basic movements of various dances and accurately give the evaluation score of the corresponding movements,thus reducing the work intensity of the assessment staff.
基金co-supported by the National Natural Science Foundation of China (Grant Nos. 61371134, 61071137)the National Basic Research Program of China (No. 2010CB327900)
文摘The application of high-performance imaging sensors in space-based space surveillance systems makes it possible to recognize space objects and estimate their poses using vision-based methods. In this paper, we proposed a kernel regression-based method for joint multi-view space object recognition and pose estimation. We built a new simulated satellite image dataset named BUAA-SID 1.5 to test our method using different image representations. We evaluated our method for recognition-only tasks, pose estimation-only tasks, and joint recognition and pose estimation tasks. Experimental results show that our method outperforms the state-of-the-arts in space object recognition, and can recognize space objects and estimate their poses effectively and robustly against noise and lighting conditions.
文摘Conventionally, image object recognition and pose estimation are two independent components in machine vision. This paper presented a simple but effective method KNNSNG, which tightly couples these two com ponents within a single algorithm framework. The basic idea of this method came from the bionic pattern recog nition and the manifold ways of perception. Firstly, the shortest neighborhood graphs (SNG) are established for each registered object. SNG can be regarded as a covering and triangulation for a hypersurface on which the training data are distributed. Then for recognition task, the deter mined test image lies on which SNG by employing the parameter "k", which can be calculated adaptively. Finally, the local linear approximation method is adopted to build a local map between highdimensional image space and lowdimensional manifold for pose estimation. The projective coordinates on manifold can depict the pose of object. Experiment results manifested the effectiveness of the method.
基金supported by the National Natural Science Foundation of China(Nos.61872354,61772523,61620106003,and 61802406)the National Key R&D Program of China(No.2019YFB2204104)+2 种基金the Beijing Natural Science Foundation(Nos.L182059 and Z190004)the Intelligent Science and Technology Advanced Subject Project of University of Chinese Academy of Sciences(No.115200S001)the Alibaba Group through Alibaba Innovative Research Program。
文摘We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video.First,we combine two standard convolutional neural network models for face detection and mean shape learning to generate initial estimations of alignment and pose.Then,we design a bi-objective optimization strategy to iteratively refine the obtained estimations.This strategy achieves faster speed and more accurate outputs.Finally,we further apply algebraic filtering processing,including Gaussian filter for background removal and extended Kalman filter for target prediction,to maintain real-time tracking superiority.Only general RGB photos or videos are required,which are captured by a commodity monocular camera without any priori or label.We demonstrate the advantages of our approach by comparing it with the most recent work in terms of performance and accuracy.
基金supported by the National Natural Science Foundation of China(Nos.U1613211 and U1813218)Shenzhen Research Program(Nos.JCYJ20170818164704758 and JCYJ20150925163005055).
文摘Recent years have witnessed significant progress in image-based 3D face reconstruction using deep convolutional neural networks.However,current reconstruction methods often perform improperly in self-occluded regions and can lead to inaccurate correspondences between a 2D input image and a 3D face template,hindering use in real applications.To address these problems,we propose a deep shape reconstruction and texture completion network,SRTC-Net,which jointly reconstructs 3D facial geometry and completes texture with correspondences from a single input face image.In SRTC-Net,we leverage the geometric cues from completed 3D texture to reconstruct detailed structures of 3D shapes.The SRTC-Net pipeline has three stages.The first introduces a correspondence network to identify pixel-wise correspondence between the input 2D image and a 3D template model,and transfers the input 2D image to a U-V texture map.Then we complete the invisible and occluded areas in the U-V texture map using an inpainting network.To get the 3D facial geometries,we predict coarse shape(U-V position maps)from the segmented face from the correspondence network using a shape network,and then refine the 3D coarse shape by regressing the U-V displacement map from the completed U-V texture map in a pixel-to-pixel way.We examine our methods on 3D reconstruction tasks as well as face frontalization and pose invariant face recognition tasks,using both in-the-lab datasets(MICC,MultiPIE)and in-the-wild datasets(CFP).The qualitative and quantitative results demonstrate the effectiveness of our methods on inferring 3D facial geometry and complete texture;they outperform or are comparable to the state-of-the-art.