Recently, 3D display technology, and content creation tools have been undergone rigorous development and as a result they have been widely adopted by home and professional users. 3D digital repositories are increasing...Recently, 3D display technology, and content creation tools have been undergone rigorous development and as a result they have been widely adopted by home and professional users. 3D digital repositories are increasing and becoming available ubiquitously. However, searching and visualizing 3D content remains a great challenge. In this paper, we propose and present the development of a novel approach for creating hypervideos, which ease the 3D content search and retrieval. It is called the dynamic hyperlinker for 3D content search and retrieval process. It advances 3D multimedia navigability and searchability by creating dynamic links for selectable and clickable objects in the video scene whilst the user consumes the 3D video clip. The proposed system involves 3D video processing, such as detecting/tracking clickable objects, annotating objects, and metadata engineering including 3D content descriptive protocol. Such system attracts the attention from both home and professional users and more specifically broadcasters and digital content providers. The experiment is conducted on full parallax holoscopic 3D videos “also known as integral images”.展开更多
Realtime analyzing the feeding behavior of fish is the premise and key to accurate guidance on feeding.The identification of fish behavior using a single information is susceptible to various factors.To overcome the p...Realtime analyzing the feeding behavior of fish is the premise and key to accurate guidance on feeding.The identification of fish behavior using a single information is susceptible to various factors.To overcome the problems,this paper proposes an adaptive deep modular co-attention unified multi-modal transformers(DMCA-UMT).By fusing the video,audio and water quality parameters,the whole process of fish feeding behavior could be identified.Firstly,for the input video,audio and water quality parameter information,features are extracted to obtain feature vectors of different modalities.Secondly,deep modular co-attention(DMCA)is introduced on the basis of the original cross-modal encoder,and the adaptive learnable weights are added.The feature vector of video and audio joint representation is obtained by automatic learning based on fusion contribution.Finally,the information of visual-audio modality fusion and text features are used to generate clip-level moment queries.The query decoder decodes the input features and uses the prediction head to obtain the final joint moment retrieval,which is the start-end time of feeding the fish.The results show that the mAP Avg of the proposed algorithm reaches 75.3%,which is37.8%higher than that of unified multi-modal transformers(UMT)algorithm.展开更多
The recognition and retrieval of identical videos by combing through entire video files requires a great deal of time and memory space. Therefore, most current video-matching methods analyze only a part of each video&...The recognition and retrieval of identical videos by combing through entire video files requires a great deal of time and memory space. Therefore, most current video-matching methods analyze only a part of each video's image frame information. All these methods, however, share the critical problem of erroneously categorizing identical videos as different if they have merely been altered in resolution or converted with a different codec. This paper deals instead with an identical-video-retrieval method using the low-peak feature of audio data. The low-peak feature remains relatively stable even with changes in bit-rate or codec. The proposed method showed a search success rate of 93.7% in a video matching experiment. This approach could provide a technique for recognizing identical content on video file share sites.展开更多
Emerging Internet services and applications attract increasing users to involve in diverse video-related activities,such as video searching,video downloading,video sharing and so on.As normal operations,they lead to a...Emerging Internet services and applications attract increasing users to involve in diverse video-related activities,such as video searching,video downloading,video sharing and so on.As normal operations,they lead to an explosive growth of online video volume,and inevitably give rise to the massive near-duplicate contents.Near-duplicate video retrieval(NDVR)has always been a hot topic.The primary purpose of this paper is to present a comprehensive survey and an updated review of the advance on large-scale NDVR to supply guidance for researchers.Specifically,we summarize and compare the definitions of near-duplicate videos(NDVs)in the literature,analyze the relationship between NDVR and its related research topics theoretically,describe its generic framework in detail,investigate the existing state-of-the-art NDVR systems.Finally,we present the development trends and research directions of this topic.展开更多
Multimedia document annotation is used in traditional multimedia databasesystems. However, without the help of human beings, it is very difficult to extract the semanticcontent of multimedia automatically. On the othe...Multimedia document annotation is used in traditional multimedia databasesystems. However, without the help of human beings, it is very difficult to extract the semanticcontent of multimedia automatically. On the other hand, it is a tedious job to annotate multimediadocuments in large databases one by one manually. This paper first introduces a method to constructa semantic net-work on top of a multimedia database. Second, a useful and efficient annotationstrategy is presented based on the framework to obtain an accurate and rapid annotation of anymultimedia databases. Third, two methods of joint similarity measures for semantic and low-levelfeatures are evaluated .展开更多
Searching,recognizing and retrieving a video of interest froma large collection of a video data is an instantaneous requirement.This requirement has been recognized as an active area of research in computer vision,mac...Searching,recognizing and retrieving a video of interest froma large collection of a video data is an instantaneous requirement.This requirement has been recognized as an active area of research in computer vision,machine learning and pattern recognition.Flower video recognition and retrieval is vital in the field of floriculture and horticulture.In this paper we propose a model for the retrieval of videos of flowers.Initially,videos are represented with keyframes and flowers in keyframes are segmented from their background.Then,the model is analysed by features extracted from flower regions of the keyframe.A Linear Discriminant Analysis(LDA)is adapted for the extraction of discriminating features.Multiclass Support VectorMachine(MSVM)classifier is applied to identify the class of the query video.Experiments have been conducted on relatively large dataset of our own,consisting of 7788 videos of 30 different species of flowers captured from three different devices.Generally,retrieval of flower videos is addressed by the use of a query video consisting of a flower of a single species.In this work we made an attempt to develop a system consisting of retrieval of similar videos for a query video consisting of flowers of different species.展开更多
Current investigations on visual information retrieval are generally content-based methods. The significant difference between similarity in low-level features and similarity in high-level semantic meanings is still a...Current investigations on visual information retrieval are generally content-based methods. The significant difference between similarity in low-level features and similarity in high-level semantic meanings is still a major challenge in the area of image retrieval. In this work, a scheme for constructing visual ontology to retrieve art images is proposed. The proposed ontology describes images in various aspects, including type & style, objects and global perceptual effects. Concepts in the ontology could be automatically derived. Various art image classification methods are employed based on low-level image features. Non-objective semantics are introduced, and how to express these semantics is given. The proposed ontology scheme could make users more naturally find visual information and thus narrows the “semantic gap”. Experimental implementation demonstrates its good potential for retrieving art images in a human-centered manner.展开更多
Developments in multimedia technologies have paved way for the storage of huge collections of video doc- uments on computer systems. It is essential to design tools for content-based access to the documents, so as to ...Developments in multimedia technologies have paved way for the storage of huge collections of video doc- uments on computer systems. It is essential to design tools for content-based access to the documents, so as to allow an efficient exploitation of these collections. Content based anal- ysis provides a flexible and powerful way to access video data when compared with the other traditional video analysis tech- niques. The area of content based video indexing and retrieval (CBVIR), focusing on automating the indexing, retrieval and management of video, has attracted extensive research in the last decade. CBVIR is a lively area of research with endur- ing acknowledgments from several domains. Herein a vital assessment of contemporary researches associated with the content-based indexing and retrieval of visual information. In this paper, we present an extensive review of significant researches on CBV1R. Concise description of content based video analysis along with the techniques associated with the content based video indexing and retrieval is presented.展开更多
In video information retrieval, key frame extraction has been rec ognized as one of the important research issues. Although much progress has been made, the existing approaches are either computationally expensive or ...In video information retrieval, key frame extraction has been rec ognized as one of the important research issues. Although much progress has been made, the existing approaches are either computationally expensive or ineffective in capturing salient visual content. In this paper, we first discuss the importance of key frame extraction and then briefly review and evaluate the existing approaches. To overcome the shortcomings of the existing approaches, we introduce a new algorithm for key frame extraction based on unsupervised clustering. Meanwhile, we provide a feedback chain to adjust the granularity of the extraction result. The proposed algorithm is both computationally simple and able to capture the visual content.The efficiency and effectiveness are validated by large amount of real-world videos.展开更多
This paper proposes a novel algorithm for extracting key frames to represent video shots. Re- garding whether, or how well, a key frame represents a shot, different interpretations have been suggested. We develop ou...This paper proposes a novel algorithm for extracting key frames to represent video shots. Re- garding whether, or how well, a key frame represents a shot, different interpretations have been suggested. We develop our algorithm on the assumption that more important content may demand more attention and may last relatively more frames. Unsupervised clustering is used to divide the frames into clusters within a shot, and then a key frame is selected from each candidate cluster. To make the algorithm independent of video sequences, we employ a statistical model to calculate the clustering threshold. The proposed algo- rithm can capture the important yet salient content as the key frame. Its robustness and adaptability are validated by experiments with various kinds of video sequences.展开更多
Automatic content analysis of sports videos is a valuable and challenging task. Motivated by analogies between a class of sports videos and languages, the authors propose a novel approach for sports video analysis bas...Automatic content analysis of sports videos is a valuable and challenging task. Motivated by analogies between a class of sports videos and languages, the authors propose a novel approach for sports video analysis based on compiler principles. It integrates both semantic analysis and syntactic analysis to automatically create an index and a table of contents for a sports video. Each shot of the video sequence is first annotated and indexed with semantic labels through detection of events using domain knowledge. A grammar-based parser is then constructed to identify the tree structure of the video content based on the labels. Meanwhile, the grammar can be used to detect and recover errors during the analysis. As a case study, a sports video parsing system is presented in the particular domain of diving. Experimental results indicate the proposed approach is effective. Keywords sports video - event detection - grammar - video retrieval - content analysis and index This work was supported in part by the State Physical Culture Administration of China under Grant No.02005.Fei Wang was born in 1977. He is a Ph.D. candidate at Institute of Computing Technology (ICT), the Chinese Academy of Sciences (CAS). He received the B.S. degree in electrical engineering from Zhejiang University in 1999 and the M.S degree in computer science from Graduate School of the Chinese Academy of Sciences in 2001. His current research interests include content-based video analysis and retrieval.Jin-Tao Li was born in 1962. He is a professor and Ph.D. supervisor at ICT, CAS. His main research areas include multimedia data compression, virtual reality, and home network.Yong-Dong Zhang was born in 1973. He is an associate professor at ICT, CAS. His main research areas include multimedia data compression and multimedia information retrieval.Shou-Xun Lin was born in 1948. He is a professor and Ph.D. supervisor at ICT, CAS. His main research areas include multimedia technology and systems.展开更多
文摘Recently, 3D display technology, and content creation tools have been undergone rigorous development and as a result they have been widely adopted by home and professional users. 3D digital repositories are increasing and becoming available ubiquitously. However, searching and visualizing 3D content remains a great challenge. In this paper, we propose and present the development of a novel approach for creating hypervideos, which ease the 3D content search and retrieval. It is called the dynamic hyperlinker for 3D content search and retrieval process. It advances 3D multimedia navigability and searchability by creating dynamic links for selectable and clickable objects in the video scene whilst the user consumes the 3D video clip. The proposed system involves 3D video processing, such as detecting/tracking clickable objects, annotating objects, and metadata engineering including 3D content descriptive protocol. Such system attracts the attention from both home and professional users and more specifically broadcasters and digital content providers. The experiment is conducted on full parallax holoscopic 3D videos “also known as integral images”.
基金supported by the Beijing Natural Science Foundation(No.6212007)the National Key Technology R&D Program of China(No.2022YFD2001701)the Youth Research Fund of Beijing Academy of Agricultural and Forestry Sciences(No.QNJJ202014)。
文摘Realtime analyzing the feeding behavior of fish is the premise and key to accurate guidance on feeding.The identification of fish behavior using a single information is susceptible to various factors.To overcome the problems,this paper proposes an adaptive deep modular co-attention unified multi-modal transformers(DMCA-UMT).By fusing the video,audio and water quality parameters,the whole process of fish feeding behavior could be identified.Firstly,for the input video,audio and water quality parameter information,features are extracted to obtain feature vectors of different modalities.Secondly,deep modular co-attention(DMCA)is introduced on the basis of the original cross-modal encoder,and the adaptive learnable weights are added.The feature vector of video and audio joint representation is obtained by automatic learning based on fusion contribution.Finally,the information of visual-audio modality fusion and text features are used to generate clip-level moment queries.The query decoder decodes the input features and uses the prediction head to obtain the final joint moment retrieval,which is the start-end time of feeding the fish.The results show that the mAP Avg of the proposed algorithm reaches 75.3%,which is37.8%higher than that of unified multi-modal transformers(UMT)algorithm.
文摘The recognition and retrieval of identical videos by combing through entire video files requires a great deal of time and memory space. Therefore, most current video-matching methods analyze only a part of each video's image frame information. All these methods, however, share the critical problem of erroneously categorizing identical videos as different if they have merely been altered in resolution or converted with a different codec. This paper deals instead with an identical-video-retrieval method using the low-peak feature of audio data. The low-peak feature remains relatively stable even with changes in bit-rate or codec. The proposed method showed a search success rate of 93.7% in a video matching experiment. This approach could provide a technique for recognizing identical content on video file share sites.
基金The work was supported by the National Natural Science Foundation of China(Grant Nos.61722204,61732007 and 61632007).
文摘Emerging Internet services and applications attract increasing users to involve in diverse video-related activities,such as video searching,video downloading,video sharing and so on.As normal operations,they lead to an explosive growth of online video volume,and inevitably give rise to the massive near-duplicate contents.Near-duplicate video retrieval(NDVR)has always been a hot topic.The primary purpose of this paper is to present a comprehensive survey and an updated review of the advance on large-scale NDVR to supply guidance for researchers.Specifically,we summarize and compare the definitions of near-duplicate videos(NDVs)in the literature,analyze the relationship between NDVR and its related research topics theoretically,describe its generic framework in detail,investigate the existing state-of-the-art NDVR systems.Finally,we present the development trends and research directions of this topic.
文摘Multimedia document annotation is used in traditional multimedia databasesystems. However, without the help of human beings, it is very difficult to extract the semanticcontent of multimedia automatically. On the other hand, it is a tedious job to annotate multimediadocuments in large databases one by one manually. This paper first introduces a method to constructa semantic net-work on top of a multimedia database. Second, a useful and efficient annotationstrategy is presented based on the framework to obtain an accurate and rapid annotation of anymultimedia databases. Third, two methods of joint similarity measures for semantic and low-levelfeatures are evaluated .
文摘Searching,recognizing and retrieving a video of interest froma large collection of a video data is an instantaneous requirement.This requirement has been recognized as an active area of research in computer vision,machine learning and pattern recognition.Flower video recognition and retrieval is vital in the field of floriculture and horticulture.In this paper we propose a model for the retrieval of videos of flowers.Initially,videos are represented with keyframes and flowers in keyframes are segmented from their background.Then,the model is analysed by features extracted from flower regions of the keyframe.A Linear Discriminant Analysis(LDA)is adapted for the extraction of discriminating features.Multiclass Support VectorMachine(MSVM)classifier is applied to identify the class of the query video.Experiments have been conducted on relatively large dataset of our own,consisting of 7788 videos of 30 different species of flowers captured from three different devices.Generally,retrieval of flower videos is addressed by the use of a query video consisting of a flower of a single species.In this work we made an attempt to develop a system consisting of retrieval of similar videos for a query video consisting of flowers of different species.
基金China-American Digital Academic Library (CADAL) project, partially supported by the Research Project on Context-Based Multiple Digital Media Semantic Organization and System Development,中国科学院'百人计划',the One-Hundred Talents Plan of CAS
文摘Current investigations on visual information retrieval are generally content-based methods. The significant difference between similarity in low-level features and similarity in high-level semantic meanings is still a major challenge in the area of image retrieval. In this work, a scheme for constructing visual ontology to retrieve art images is proposed. The proposed ontology describes images in various aspects, including type & style, objects and global perceptual effects. Concepts in the ontology could be automatically derived. Various art image classification methods are employed based on low-level image features. Non-objective semantics are introduced, and how to express these semantics is given. The proposed ontology scheme could make users more naturally find visual information and thus narrows the “semantic gap”. Experimental implementation demonstrates its good potential for retrieving art images in a human-centered manner.
文摘Developments in multimedia technologies have paved way for the storage of huge collections of video doc- uments on computer systems. It is essential to design tools for content-based access to the documents, so as to allow an efficient exploitation of these collections. Content based anal- ysis provides a flexible and powerful way to access video data when compared with the other traditional video analysis tech- niques. The area of content based video indexing and retrieval (CBVIR), focusing on automating the indexing, retrieval and management of video, has attracted extensive research in the last decade. CBVIR is a lively area of research with endur- ing acknowledgments from several domains. Herein a vital assessment of contemporary researches associated with the content-based indexing and retrieval of visual information. In this paper, we present an extensive review of significant researches on CBV1R. Concise description of content based video analysis along with the techniques associated with the content based video indexing and retrieval is presented.
文摘In video information retrieval, key frame extraction has been rec ognized as one of the important research issues. Although much progress has been made, the existing approaches are either computationally expensive or ineffective in capturing salient visual content. In this paper, we first discuss the importance of key frame extraction and then briefly review and evaluate the existing approaches. To overcome the shortcomings of the existing approaches, we introduce a new algorithm for key frame extraction based on unsupervised clustering. Meanwhile, we provide a feedback chain to adjust the granularity of the extraction result. The proposed algorithm is both computationally simple and able to capture the visual content.The efficiency and effectiveness are validated by large amount of real-world videos.
基金Supported by the National Natural Science Foundation of China(No. 60072009)
文摘This paper proposes a novel algorithm for extracting key frames to represent video shots. Re- garding whether, or how well, a key frame represents a shot, different interpretations have been suggested. We develop our algorithm on the assumption that more important content may demand more attention and may last relatively more frames. Unsupervised clustering is used to divide the frames into clusters within a shot, and then a key frame is selected from each candidate cluster. To make the algorithm independent of video sequences, we employ a statistical model to calculate the clustering threshold. The proposed algo- rithm can capture the important yet salient content as the key frame. Its robustness and adaptability are validated by experiments with various kinds of video sequences.
文摘Automatic content analysis of sports videos is a valuable and challenging task. Motivated by analogies between a class of sports videos and languages, the authors propose a novel approach for sports video analysis based on compiler principles. It integrates both semantic analysis and syntactic analysis to automatically create an index and a table of contents for a sports video. Each shot of the video sequence is first annotated and indexed with semantic labels through detection of events using domain knowledge. A grammar-based parser is then constructed to identify the tree structure of the video content based on the labels. Meanwhile, the grammar can be used to detect and recover errors during the analysis. As a case study, a sports video parsing system is presented in the particular domain of diving. Experimental results indicate the proposed approach is effective. Keywords sports video - event detection - grammar - video retrieval - content analysis and index This work was supported in part by the State Physical Culture Administration of China under Grant No.02005.Fei Wang was born in 1977. He is a Ph.D. candidate at Institute of Computing Technology (ICT), the Chinese Academy of Sciences (CAS). He received the B.S. degree in electrical engineering from Zhejiang University in 1999 and the M.S degree in computer science from Graduate School of the Chinese Academy of Sciences in 2001. His current research interests include content-based video analysis and retrieval.Jin-Tao Li was born in 1962. He is a professor and Ph.D. supervisor at ICT, CAS. His main research areas include multimedia data compression, virtual reality, and home network.Yong-Dong Zhang was born in 1973. He is an associate professor at ICT, CAS. His main research areas include multimedia data compression and multimedia information retrieval.Shou-Xun Lin was born in 1948. He is a professor and Ph.D. supervisor at ICT, CAS. His main research areas include multimedia technology and systems.