In the new era of technology,daily human activities are becoming more challenging in terms of monitoring complex scenes and backgrounds.To understand the scenes and activities from human life logs,human-object interac...In the new era of technology,daily human activities are becoming more challenging in terms of monitoring complex scenes and backgrounds.To understand the scenes and activities from human life logs,human-object interaction(HOI)is important in terms of visual relationship detection and human pose estimation.Activities understanding and interaction recognition between human and object along with the pose estimation and interaction modeling have been explained.Some existing algorithms and feature extraction procedures are complicated including accurate detection of rare human postures,occluded regions,and unsatisfactory detection of objects,especially small-sized objects.The existing HOI detection techniques are instancecentric(object-based)where interaction is predicted between all the pairs.Such estimation depends on appearance features and spatial information.Therefore,we propose a novel approach to demonstrate that the appearance features alone are not sufficient to predict the HOI.Furthermore,we detect the human body parts by using the Gaussian Matric Model(GMM)followed by object detection using YOLO.We predict the interaction points which directly classify the interaction and pair them with densely predicted HOI vectors by using the interaction algorithm.The interactions are linked with the human and object to predict the actions.The experiments have been performed on two benchmark HOI datasets demonstrating the proposed approach.展开更多
Human–object interaction(HOI)detection is crucial for human-centric image understanding which aims to infer human,action,object triplets within an image.Recent studies often exploit visual features and the spatial co...Human–object interaction(HOI)detection is crucial for human-centric image understanding which aims to infer human,action,object triplets within an image.Recent studies often exploit visual features and the spatial configuration of a human–object pair in order to learn the action linking the human and object in the pair.We argue that such a paradigm of pairwise feature extraction and action inference can be applied not only at the whole human and object instance level,but also at the part level at which a body part interacts with an object,and at the semantic level by considering the semantic label of an object along with human appearance and human–object spatial configuration,to infer the action.We thus propose a multi-level pairwise feature network(PFNet)for detecting human–object interactions.The network consists of three parallel streams to characterize HOI utilizing pairwise features at the above three levels;the three streams are finally fused to give the action prediction.Extensive experiments show that our proposed PFNet outperforms other state-of-the-art methods on the VCOCO dataset and achieves comparable results to the state-of-the-art on the HICO-DET dataset.展开更多
Efficient, interactive foreground/background seg- mentation in video is of great practical importance in video editing. This paper proposes an interactive and unsupervised video object segmentation algorithm named E-G...Efficient, interactive foreground/background seg- mentation in video is of great practical importance in video editing. This paper proposes an interactive and unsupervised video object segmentation algorithm named E-GrabCut con- centrating on achieving both of the segmentation quality and time efficiency as highly demanded in the related filed. There are three features in the proposed algorithms. Firstly, we have developed a powerful, non-iterative version of the optimiza- tion process for each frame. Secondly, more user interaction in the first frame is used to improve the Gaussian Mixture Model (GMM). Thirdly, a robust algorithm for the follow- ing frame segmentation has been developed by reusing the previous GMM. Extensive experiments demonstrate that our method outperforms the state-of-the-art video segmentation algorithm in terms of integration of time efficiency and seg- mentation quality.展开更多
基金supported by Priority Research Centers Program through NRF funded by MEST(2018R1A6A1A03024003)the Grand Information Technology Research Center support program IITP-2020-2020-0-01612 supervised by the IITP by MSIT,Korea.
文摘In the new era of technology,daily human activities are becoming more challenging in terms of monitoring complex scenes and backgrounds.To understand the scenes and activities from human life logs,human-object interaction(HOI)is important in terms of visual relationship detection and human pose estimation.Activities understanding and interaction recognition between human and object along with the pose estimation and interaction modeling have been explained.Some existing algorithms and feature extraction procedures are complicated including accurate detection of rare human postures,occluded regions,and unsatisfactory detection of objects,especially small-sized objects.The existing HOI detection techniques are instancecentric(object-based)where interaction is predicted between all the pairs.Such estimation depends on appearance features and spatial information.Therefore,we propose a novel approach to demonstrate that the appearance features alone are not sufficient to predict the HOI.Furthermore,we detect the human body parts by using the Gaussian Matric Model(GMM)followed by object detection using YOLO.We predict the interaction points which directly classify the interaction and pair them with densely predicted HOI vectors by using the interaction algorithm.The interactions are linked with the human and object to predict the actions.The experiments have been performed on two benchmark HOI datasets demonstrating the proposed approach.
基金supported by the National Natural Science Foundation of China(Project No.61902210),a Research Grant of Beijing Higher Institution Engineering Research Center,and the Tsinghua–Tencent Joint Laboratory for Internet Innovation Technology.
文摘Human–object interaction(HOI)detection is crucial for human-centric image understanding which aims to infer human,action,object triplets within an image.Recent studies often exploit visual features and the spatial configuration of a human–object pair in order to learn the action linking the human and object in the pair.We argue that such a paradigm of pairwise feature extraction and action inference can be applied not only at the whole human and object instance level,but also at the part level at which a body part interacts with an object,and at the semantic level by considering the semantic label of an object along with human appearance and human–object spatial configuration,to infer the action.We thus propose a multi-level pairwise feature network(PFNet)for detecting human–object interactions.The network consists of three parallel streams to characterize HOI utilizing pairwise features at the above three levels;the three streams are finally fused to give the action prediction.Extensive experiments show that our proposed PFNet outperforms other state-of-the-art methods on the VCOCO dataset and achieves comparable results to the state-of-the-art on the HICO-DET dataset.
文摘Efficient, interactive foreground/background seg- mentation in video is of great practical importance in video editing. This paper proposes an interactive and unsupervised video object segmentation algorithm named E-GrabCut con- centrating on achieving both of the segmentation quality and time efficiency as highly demanded in the related filed. There are three features in the proposed algorithms. Firstly, we have developed a powerful, non-iterative version of the optimiza- tion process for each frame. Secondly, more user interaction in the first frame is used to improve the Gaussian Mixture Model (GMM). Thirdly, a robust algorithm for the follow- ing frame segmentation has been developed by reusing the previous GMM. Extensive experiments demonstrate that our method outperforms the state-of-the-art video segmentation algorithm in terms of integration of time efficiency and seg- mentation quality.