Temporal localization is crucial for action video recognition.Since the manual annotations are expensive and time-consuming in videos,temporal localization with weak video-level labels is challenging but indispensable...Temporal localization is crucial for action video recognition.Since the manual annotations are expensive and time-consuming in videos,temporal localization with weak video-level labels is challenging but indispensable.In this paper,we propose a weakly-supervised temporal action localization approach in untrimmed videos.To settle this issue,we train the model based on the proxies of each action class.The proxies are used to measure the distances between action segments and different original action features.We use a proxy-based metric to cluster the same actions together and separate actions from backgrounds.Compared with state-of-the-art methods,our method achieved competitive results on the THUMOS14 and ActivityNet1.2 datasets.展开更多
基金supported by the National Key Research and Development Program of China(2018AAA0100104 and 2018AAA0100100)the National Natural Science Foundation of China(Grant No.61702095)+1 种基金Natural Science Foundation of Jiangsu Province(BK20211164,BK20190341,and BK20210002)the Big Data Computing Center of Southeast University.
文摘Temporal localization is crucial for action video recognition.Since the manual annotations are expensive and time-consuming in videos,temporal localization with weak video-level labels is challenging but indispensable.In this paper,we propose a weakly-supervised temporal action localization approach in untrimmed videos.To settle this issue,we train the model based on the proxies of each action class.The proxies are used to measure the distances between action segments and different original action features.We use a proxy-based metric to cluster the same actions together and separate actions from backgrounds.Compared with state-of-the-art methods,our method achieved competitive results on the THUMOS14 and ActivityNet1.2 datasets.