Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realist...Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario,mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75 073 Internet videos of over 4 000 hours,covering 2 427 celebrities and 649 001 faces. This is, to our knowledge, the most comprehensive dataset for this problem.We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques. We also outline important and challenging research problems that could be investigated in the future.展开更多
The massive web videos prompt an imperative demand on efficiently grasping the major events. However, the distinct characteristics of web videos, such as the limited number of features, the noisy text information, and...The massive web videos prompt an imperative demand on efficiently grasping the major events. However, the distinct characteristics of web videos, such as the limited number of features, the noisy text information, and the unavoidable error in near-duplicate keyframes (NDKs) detection, make web video event mining a challenging task. In this paper, we propose a novel four-stage framework to improve the performance of web video event mining. Data preprocessing is the first stage. Multiple Correspondence Analysis (MCA) is then applied to explore the correlation between terms and classes, targeting for bridging the gap between NDKs and high-level semantic concepts. Next, co-occurrence information is used to detect the similarity between NDKs and classes using the NDK-within-video information. Finally, both of them are integrated for web video event mining through negative NDK pruning and positive NDK enhancement. Moreover, both NDKs and terms with relatively low frequencies are treated as useful information in our experiments. Experimental results on large-scale web videos from YouTube demonstrate that the proposed framework outperforms several existing mining methods and obtains good results for web video event mining.展开更多
We present a method of 3D image mosaicing for real 3D representation of roadside buildings, and implement a Web-based interactive visualization environment for the 3D video mosaics created by 3D image mosaicing. The 3...We present a method of 3D image mosaicing for real 3D representation of roadside buildings, and implement a Web-based interactive visualization environment for the 3D video mosaics created by 3D image mosaicing. The 3D image mo- saicing technique developed in our previous work is a very powerful method for creating textured 3D-GIS data without excessive data processing like the laser or stereo system. For the Web-based open access to the 3D video mosaics, we build an interactive visualization environment using X3D, the emerging standard of Web 3D. We conduct the data preprocessing for 3D video mosaics and the X3D modeling for textured 3D data. The data preprocessing includes the conversion of each frame of 3D video mosaics into concatenated image files that can be hyperlinked on the Web. The X3D modeling handles the representation of concatenated images using necessary X3D nodes. By employing X3D as the data format for 3D image mosaics, the real 3D representation of roadside buildings is extended to the Web and mobile service systems.展开更多
基金supported by a research grant from City University of Hong Kong under Grant No.7008178the National Natural Science Foundation of China under Grant Nos.61228205,61303175 and 61172153
文摘Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario,mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75 073 Internet videos of over 4 000 hours,covering 2 427 celebrities and 649 001 faces. This is, to our knowledge, the most comprehensive dataset for this problem.We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques. We also outline important and challenging research problems that could be investigated in the future.
基金supported by the National Natural Science Foundation of China under Grant Nos. 61373121, 61071184, 60972111,61036008the Research Funds for the Doctoral Program of Higher Education of China under Grant No. 20100184120009+2 种基金the Program for Sichuan Provincial Science Fund for Distinguished Young Scholars under Grant Nos. 2012JQ0029, 13QNJJ0149the Fundamental Research Funds for the Central Universities of China under Grant Nos. SWJTU09CX032, SWJTU10CX08the Program of China Scholarships Council under Grant No. 201207000050
文摘The massive web videos prompt an imperative demand on efficiently grasping the major events. However, the distinct characteristics of web videos, such as the limited number of features, the noisy text information, and the unavoidable error in near-duplicate keyframes (NDKs) detection, make web video event mining a challenging task. In this paper, we propose a novel four-stage framework to improve the performance of web video event mining. Data preprocessing is the first stage. Multiple Correspondence Analysis (MCA) is then applied to explore the correlation between terms and classes, targeting for bridging the gap between NDKs and high-level semantic concepts. Next, co-occurrence information is used to detect the similarity between NDKs and classes using the NDK-within-video information. Finally, both of them are integrated for web video event mining through negative NDK pruning and positive NDK enhancement. Moreover, both NDKs and terms with relatively low frequencies are treated as useful information in our experiments. Experimental results on large-scale web videos from YouTube demonstrate that the proposed framework outperforms several existing mining methods and obtains good results for web video event mining.
文摘We present a method of 3D image mosaicing for real 3D representation of roadside buildings, and implement a Web-based interactive visualization environment for the 3D video mosaics created by 3D image mosaicing. The 3D image mo- saicing technique developed in our previous work is a very powerful method for creating textured 3D-GIS data without excessive data processing like the laser or stereo system. For the Web-based open access to the 3D video mosaics, we build an interactive visualization environment using X3D, the emerging standard of Web 3D. We conduct the data preprocessing for 3D video mosaics and the X3D modeling for textured 3D data. The data preprocessing includes the conversion of each frame of 3D video mosaics into concatenated image files that can be hyperlinked on the Web. The X3D modeling handles the representation of concatenated images using necessary X3D nodes. By employing X3D as the data format for 3D image mosaics, the real 3D representation of roadside buildings is extended to the Web and mobile service systems.