期刊文献+
共找到13篇文章
< 1 >
每页显示 20 50 100
Multi-Stream Temporally Enhanced Network for Video Salient Object Detection
1
作者 Dan Xu Jiale Ru Jinlong Shi 《Computers, Materials & Continua》 SCIE EI 2024年第1期85-104,共20页
Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing com... Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing complex spatial data that is also influenced by temporal dynamics.Despite the progress made in existing VSOD models,they still struggle in scenes of great background diversity within and between frames.Additionally,they encounter difficulties related to accumulated noise and high time consumption during the extraction of temporal features over a long-term duration.We propose a multi-stream temporal enhanced network(MSTENet)to address these problems.It investigates saliency cues collaboration in the spatial domain with a multi-stream structure to deal with the great background diversity challenge.A straightforward,yet efficient approach for temporal feature extraction is developed to avoid the accumulative noises and reduce time consumption.The distinction between MSTENet and other VSOD methods stems from its incorporation of both foreground supervision and background supervision,facilitating enhanced extraction of collaborative saliency cues.Another notable differentiation is the innovative integration of spatial and temporal features,wherein the temporal module is integrated into the multi-stream structure,enabling comprehensive spatial-temporal interactions within an end-to-end framework.Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on five benchmark datasets while maintaining a real-time speed of 27 fps(Titan XP).Our code and models are available at https://github.com/RuJiaLe/MSTENet. 展开更多
关键词 video salient object detection deep learning temporally enhanced foreground-background collaboration
下载PDF
Customized Convolutional Neural Network for Accurate Detection of Deep Fake Images in Video Collections
2
作者 Dmitry Gura Bo Dong +1 位作者 Duaa Mehiar Nidal Al Said 《Computers, Materials & Continua》 SCIE EI 2024年第5期1995-2014,共20页
The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method in... The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method involves extracting structured data from video frames using facial landmark detection,which is then used as input to the CNN.The customized Convolutional Neural Network method is the date augmented-based CNN model to generate‘fake data’or‘fake images’.This study was carried out using Python and its libraries.We used 242 films from the dataset gathered by the Deep Fake Detection Challenge,of which 199 were made up and the remaining 53 were real.Ten seconds were allotted for each video.There were 318 videos used in all,199 of which were fake and 119 of which were real.Our proposedmethod achieved a testing accuracy of 91.47%,loss of 0.342,and AUC score of 0.92,outperforming two alternative approaches,CNN and MLP-CNN.Furthermore,our method succeeded in greater accuracy than contemporary models such as XceptionNet,Meso-4,EfficientNet-BO,MesoInception-4,VGG-16,and DST-Net.The novelty of this investigation is the development of a new Convolutional Neural Network(CNN)learning model that can accurately detect deep fake face photos. 展开更多
关键词 Deep fake detection video analysis convolutional neural network machine learning video dataset collection facial landmark prediction accuracy models
下载PDF
SwinVid:Enhancing Video Object Detection Using Swin Transformer
3
作者 Abdelrahman Maharek Amr Abozeid +1 位作者 Rasha Orban Kamal ElDahshan 《Computer Systems Science & Engineering》 2024年第2期305-320,共16页
What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reas... What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reasons have made video object detection(VID)a growing area of research in recent years.Video object detection can be used for various healthcare applications,such as detecting and tracking tumors in medical imaging,monitoring the movement of patients in hospitals and long-term care facilities,and analyzing videos of surgeries to improve technique and training.Additionally,it can be used in telemedicine to help diagnose and monitor patients remotely.Existing VID techniques are based on recurrent neural networks or optical flow for feature aggregation to produce reliable features which can be used for detection.Some of those methods aggregate features on the full-sequence level or from nearby frames.To create feature maps,existing VID techniques frequently use Convolutional Neural Networks(CNNs)as the backbone network.On the other hand,Vision Transformers have outperformed CNNs in various vision tasks,including object detection in still images and image classification.We propose in this research to use Swin-Transformer,a state-of-the-art Vision Transformer,as an alternative to CNN-based backbone networks for object detection in videos.The proposed architecture enhances the accuracy of existing VID methods.The ImageNet VID and EPIC KITCHENS datasets are used to evaluate the suggested methodology.We have demonstrated that our proposed method is efficient by achieving 84.3%mean average precision(mAP)on ImageNet VID using less memory in comparison to other leading VID techniques.The source code is available on the website https://github.com/amaharek/SwinVid. 展开更多
关键词 video object detection vision transformers convolutional neural networks deep learning
下载PDF
COVAD: Content-oriented video anomaly detection using a self attention-based deep learning model
4
作者 Wenhao SHAO Praboda RAJAPAKSHA +3 位作者 Yanyan WEI Dun LI Noel CRESPI Zhigang LUO 《Virtual Reality & Intelligent Hardware》 2023年第1期24-41,共18页
Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than consider... Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than considering only the significant context. Method This paper proposes a novel video anomaly detection method called COVAD that mainly focuses on the region of interest in the video instead of the entire video. Our proposed COVAD method is based on an autoencoded convolutional neural network and a coordinated attention mechanism,which can effectively capture meaningful objects in the video and dependencies among different objects. Relying on the existing memory-guided video frame prediction network, our algorithm can significantly predict the future motion and appearance of objects in a video more effectively. Result The proposed algorithm obtained better experimental results on multiple datasets and outperformed the baseline models considered in our analysis. Simultaneously, we provide an improved visual test that can provide pixel-level anomaly explanations. 展开更多
关键词 video surveillance video anomaly detection Machine learning Deep learning Neural network Coordinate attention
下载PDF
Video Shot Boundary Detection in MPEG Compressed Sequences Using SVM Learning 被引量:1
5
作者 GUO Lihua YANG Shutang LIJianhua TONGZhipeng(School of Electronic and Information Technology,Shanghai JiaoTong University Shanghai 200030 China) 《Journal of Electronic Science and Technology of China》 2003年第1期15-17,28,共4页
A number of automated video shot boundary detection methods for indexing a videosequence to facilitate browsing and retrieval have been proposed in recent years.Among these methods,the dissolve shot boundary isn't... A number of automated video shot boundary detection methods for indexing a videosequence to facilitate browsing and retrieval have been proposed in recent years.Among these methods,the dissolve shot boundary isn't accurately detected because it involves the camera operation and objectmovement.In this paper,a method based on support vector machine (SVM) is proposed to detect thedissolve shot boundary in MPEG compressed sequence.The problem of detection between the dissolveshot boundary and other boundaries is considered as two-class classification in our method.Featuresfrom the compressed sequences are directly extracted without decoding them,and the optimal classboundary between two classes are learned from training data by using SVM.Experiments,whichcompare various classification methods,show that using proposed method encourages performance ofvideo shot boundary detection. 展开更多
关键词 video shot boundary detection dissolve detection MPEG compressed sequences support vector machine(SVM)
下载PDF
A New Fire Detection Method Using a Multi-Expert System Based on Color Dispersion, Similarity and Centroid Motion in Indoor Environment 被引量:7
6
作者 Teng Wang Leping Bu +2 位作者 Zhikai Yang Peng Yuan Jineng Ouyang 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2020年第1期263-275,共13页
In this paper, a video fire detection method is proposed, which demonstrated good performance in indoor environment. Three main novel ideas have been introduced. Firstly, a flame color model in RGB and HIS color space... In this paper, a video fire detection method is proposed, which demonstrated good performance in indoor environment. Three main novel ideas have been introduced. Firstly, a flame color model in RGB and HIS color space is used to extract pre-detected regions instead of traditional motion differential method, as it’s more suitable for fire detection in indoor environment. Secondly, according to the flicker characteristic of the flame, similarity and two main values of centroid motion are proposed. At the same time, a simple but effective method for tracking the same regions in consecutive frames is established. Thirdly,a multi-expert system consisting of color component dispersion,similarity and centroid motion is established to identify flames.The proposed method has been tested on a very large dataset of fire videos acquired both in real indoor environment tests and from the Internet. The experimental results show that the proposed approach achieved a balance between the false positive rate and the false negative rate, and demonstrated a better performance in terms of overall accuracy and F standard with respect to other similar fire detection methods in indoor environment. 展开更多
关键词 Index Terms—Color dispersion centroid motion expert system RGB-HIS color model SIMILARITY video fire detection
下载PDF
Transfer Learning on Deep Neural Networks to Detect Pornography
7
作者 Saleh Albahli 《Computer Systems Science & Engineering》 SCIE EI 2022年第11期701-717,共17页
While the internet has a lot of positive impact on society,there are negative components.Accessible to everyone through online platforms,pornography is,inducing psychological and health related issues among people of ... While the internet has a lot of positive impact on society,there are negative components.Accessible to everyone through online platforms,pornography is,inducing psychological and health related issues among people of all ages.While a difficult task,detecting pornography can be the important step in determining the porn and adult content in a video.In this paper,an architecture is proposed which yielded high scores for both training and testing.This dataset was produced from 190 videos,yielding more than 19 h of videos.The main sources for the content were from YouTube,movies,torrent,and websites that hosts both pornographic and non-pornographic contents.The videos were from different ethnicities and skin color which ensures the models can detect any kind of video.A VGG16,Inception V3 and Resnet 50 models were initially trained to detect these pornographic images but failed to achieve a high testing accuracy with accuracies of 0.49,0.49 and 0.78 respectively.Finally,utilizing transfer learning,a convolutional neural network was designed and yielded an accuracy of 0.98. 展开更多
关键词 Pornographic video detection classification convolutional neural network InceptionV3 Resnet50 VGG16
下载PDF
Full-duplex strategy for video object segmentation
8
作者 Ge-Peng Ji Deng-Ping Fan +3 位作者 Keren Fu Zhe Wu Jianbing Shen Ling Shao 《Computational Visual Media》 SCIE EI CSCD 2023年第1期155-175,共21页
Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient ... Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet. 展开更多
关键词 video object segmentation(VOS) video salient object detection(V-SOD) visual attention
原文传递
Video Copy Detection Based on Spatiotemporal Fusion Model 被引量:4
9
作者 Jianmin Li Yingyu Liang Bo Zhang 《Tsinghua Science and Technology》 EI CAS 2012年第1期51-59,共9页
Content-based video copy detection is an active research field due to the need for copyright pro- tection and business intellectual property protection. This paper gives a probabilistic spatiotemporal fusion approach ... Content-based video copy detection is an active research field due to the need for copyright pro- tection and business intellectual property protection. This paper gives a probabilistic spatiotemporal fusion approach for video copy detection. This approach directly estimates the location of the copy segment with a probabilistic graphical model. The spatial and temporal consistency of the video copy is embedded in the local probability function. An effective local descriptor and a two-level descriptor pairing method are used to build a video copy detection system to evaluate the approach. Tests show that it outperforms the popular voting algorithm and the probabilistic fusion framework based on the Hidden Markov Model, improving F-score (F1) by 8%. 展开更多
关键词 video copy detection probabilistic graphical model spatiotemporal fusion model
原文传递
Multiple hypergraph ranking for video concept detection 被引量:1
10
作者 Ya-hong HAN Jian SHAO Fei WU Bao-gang WEI 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2010年第7期525-537,共13页
This paper tackles the problem of video concept detection using the multi-modality fusion method. Motivated by multi-view learning algorithms, multi-modality features of videos can be represented by multiple graphs. A... This paper tackles the problem of video concept detection using the multi-modality fusion method. Motivated by multi-view learning algorithms, multi-modality features of videos can be represented by multiple graphs. And the graph-based semi-supervised learning methods can be extended to multiple graphs to predict the semantic labels for unlabeled video data. However, traditional graphs represent only homogeneous pairwise linking relations, and therefore the high-order correlations inherent in videos, such as high-order visual similarities, are ignored. In this paper we represent heterogeneous features by multiple hypergraphs and then the high-order correlated samples can be associated with hyperedges. Furthermore, the multi-hypergraph ranking (MHR) algorithm is proposed by defining Markov random walk on each hypergraph and then forming the mixture Markov chains so as to perform transductive learning in multiple hypergraphs. In experiments on the TRECVID dataset, a triple-hypergraph consisting of visual, textual features and multiple labeled tags is constructed to predict concept labels for unlabeled video shots by the MHR framework. Experimental results show that our approach is effective. 展开更多
关键词 Multiple hypergraph ranking video concept detection Multi-view learning Multiple labeled tags CLUSTERING
原文传递
A Novel Divide and Conquer Solution for Long-term Video Salient Object Detection
11
作者 Yun-Xiao Li Cheng-Li-Zhao Chen +2 位作者 Shuai Li Ai-Min Hao Hong Qin 《Machine Intelligence Research》 EI 2024年第4期684-703,共20页
Recently,a new research trend in our video salient object detection(VSOD)research community has focused on enhancing the detection results via model self-fine-tuning using sparsely mined high-quality keyframes from th... Recently,a new research trend in our video salient object detection(VSOD)research community has focused on enhancing the detection results via model self-fine-tuning using sparsely mined high-quality keyframes from the given sequence.Although such a learning scheme is generally effective,it has a critical limitation,i.e.,the model learned on sparse frames only possesses weak generalization ability.This situation could become worse on“long”videos since they tend to have intensive scene variations.Moreover,in such videos,the keyframe information from a longer time span is less relevant to the previous,which could also cause learning conflict and deteriorate the model performance.Thus,the learning scheme is usually incapable of handling complex pattern modeling.To solve this problem,we propose a divide-and-conquer framework,which can convert a complex problem domain into multiple simple ones.First,we devise a novel background consistency analysis(BCA)which effectively divides the mined frames into disjoint groups.Then for each group,we assign an individual deep model on it to capture its key attribute during the fine-tuning phase.During the testing phase,we design a model-matching strategy,which could dynamically select the best-matched model from those fine-tuned ones to handle the given testing frame.Comprehensive experiments show that our method can adapt severe background appearance variation coupling with object movement and obtain robust saliency detection compared with the previous scheme and the state-of-the-art methods. 展开更多
关键词 video salient object detection background consistency analysis weakly supervised learning long-term information background shift.
原文传递
Unsupervised object detection with scene-adaptive concept learning 被引量:2
12
作者 Shiliang PU Wei ZHAO +3 位作者 Weijie CHEN Shicai YANG Di XIE Yunhe PAN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2021年第5期638-651,共14页
Object detection is one of the hottest research directions in computer vision,has already made impressive progress in academia,and has many valuable applications in the industry.However,the mainstream detection method... Object detection is one of the hottest research directions in computer vision,has already made impressive progress in academia,and has many valuable applications in the industry.However,the mainstream detection methods still have two shortcomings:(1)even a model that is well trained using large amounts of data still cannot generally be used across different kinds of scenes;(2)once a model is deployed,it cannot autonomously evolve along with the accumulated unlabeled scene data.To address these problems,and inspired by visual knowledge theory,we propose a novel scene-adaptive evolution unsupervised video object detection algorithm that can decrease the impact of scene changes through the concept of object groups.We first extract a large number of object proposals from unlabeled data through a pre-trained detection model.Second,we build the visual knowledge dictionary of object concepts by clustering the proposals,in which each cluster center represents an object prototype.Third,we look into the relations between different clusters and the object information of different groups,and propose a graph-based group information propagation strategy to determine the category of an object concept,which can effectively distinguish positive and negative proposals.With these pseudo labels,we can easily fine-tune the pretrained model.The effectiveness of the proposed method is verified by performing different experiments,and the significant improvements are achieved. 展开更多
关键词 Visual knowledge Unsupervised video object detection Scene-adaptive learning
原文传递
A novel robotic visual perception framework for underwater operation
13
作者 Yue LU Xingyu CHEN +2 位作者 Zhengxing WU Junzhi YU Li WEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第11期1602-1619,共18页
Underwater robotic operation usually requires visual perception(e.g.,object detection and tracking),but underwater scenes have poor visual quality and represent a special domain which can affect the accuracy of visual... Underwater robotic operation usually requires visual perception(e.g.,object detection and tracking),but underwater scenes have poor visual quality and represent a special domain which can affect the accuracy of visual perception.In addition,detection continuity and stability are important for robotic perception,but the commonly used static accuracy based evaluation(i.e.,average precision)is insufficient to reflect detector performance across time.In response to these two problems,we present a design for a novel robotic visual perception framework.First,we generally investigate the relationship between a quality-diverse data domain and visual restoration in detection performance.As a result,although domain quality has an ignorable effect on within-domain detection accuracy,visual restoration is beneficial to detection in real sea scenarios by reducing the domain shift.Moreover,non-reference assessments are proposed for detection continuity and stability based on object tracklets.Further,online tracklet refinement is developed to improve the temporal performance of detectors.Finally,combined with visual restoration,an accurate and stable underwater robotic visual perception framework is established.Small-overlap suppression is proposed to extend video object detection(VID)methods to a single-object tracking task,leading to the flexibility to switch between detection and tracking.Extensive experiments were conducted on the ImageNet VID dataset and real-world robotic tasks to verify the correctness of our analysis and the superiority of our proposed approaches.The codes are available at https://github.com/yrqs/VisPerception. 展开更多
关键词 Underwater operation Robotic perception Visual restoration video object detection
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部