期刊文献+
共找到17篇文章
< 1 >
每页显示 20 50 100
Customized Convolutional Neural Network for Accurate Detection of Deep Fake Images in Video Collections 被引量:1
1
作者 Dmitry Gura Bo Dong +1 位作者 Duaa Mehiar Nidal Al Said 《Computers, Materials & Continua》 SCIE EI 2024年第5期1995-2014,共20页
The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method in... The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method involves extracting structured data from video frames using facial landmark detection,which is then used as input to the CNN.The customized Convolutional Neural Network method is the date augmented-based CNN model to generate‘fake data’or‘fake images’.This study was carried out using Python and its libraries.We used 242 films from the dataset gathered by the Deep Fake Detection Challenge,of which 199 were made up and the remaining 53 were real.Ten seconds were allotted for each video.There were 318 videos used in all,199 of which were fake and 119 of which were real.Our proposedmethod achieved a testing accuracy of 91.47%,loss of 0.342,and AUC score of 0.92,outperforming two alternative approaches,CNN and MLP-CNN.Furthermore,our method succeeded in greater accuracy than contemporary models such as XceptionNet,Meso-4,EfficientNet-BO,MesoInception-4,VGG-16,and DST-Net.The novelty of this investigation is the development of a new Convolutional Neural Network(CNN)learning model that can accurately detect deep fake face photos. 展开更多
关键词 Deep fake detection video analysis convolutional neural network machine learning video dataset collection facial landmark prediction accuracy models
下载PDF
Multi-Stream Temporally Enhanced Network for Video Salient Object Detection
2
作者 Dan Xu Jiale Ru Jinlong Shi 《Computers, Materials & Continua》 SCIE EI 2024年第1期85-104,共20页
Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing com... Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing complex spatial data that is also influenced by temporal dynamics.Despite the progress made in existing VSOD models,they still struggle in scenes of great background diversity within and between frames.Additionally,they encounter difficulties related to accumulated noise and high time consumption during the extraction of temporal features over a long-term duration.We propose a multi-stream temporal enhanced network(MSTENet)to address these problems.It investigates saliency cues collaboration in the spatial domain with a multi-stream structure to deal with the great background diversity challenge.A straightforward,yet efficient approach for temporal feature extraction is developed to avoid the accumulative noises and reduce time consumption.The distinction between MSTENet and other VSOD methods stems from its incorporation of both foreground supervision and background supervision,facilitating enhanced extraction of collaborative saliency cues.Another notable differentiation is the innovative integration of spatial and temporal features,wherein the temporal module is integrated into the multi-stream structure,enabling comprehensive spatial-temporal interactions within an end-to-end framework.Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on five benchmark datasets while maintaining a real-time speed of 27 fps(Titan XP).Our code and models are available at https://github.com/RuJiaLe/MSTENet. 展开更多
关键词 video salient object detection deep learning temporally enhanced foreground-background collaboration
下载PDF
SwinVid:Enhancing Video Object Detection Using Swin Transformer
3
作者 Abdelrahman Maharek Amr Abozeid +1 位作者 Rasha Orban Kamal ElDahshan 《Computer Systems Science & Engineering》 2024年第2期305-320,共16页
What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reas... What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reasons have made video object detection(VID)a growing area of research in recent years.Video object detection can be used for various healthcare applications,such as detecting and tracking tumors in medical imaging,monitoring the movement of patients in hospitals and long-term care facilities,and analyzing videos of surgeries to improve technique and training.Additionally,it can be used in telemedicine to help diagnose and monitor patients remotely.Existing VID techniques are based on recurrent neural networks or optical flow for feature aggregation to produce reliable features which can be used for detection.Some of those methods aggregate features on the full-sequence level or from nearby frames.To create feature maps,existing VID techniques frequently use Convolutional Neural Networks(CNNs)as the backbone network.On the other hand,Vision Transformers have outperformed CNNs in various vision tasks,including object detection in still images and image classification.We propose in this research to use Swin-Transformer,a state-of-the-art Vision Transformer,as an alternative to CNN-based backbone networks for object detection in videos.The proposed architecture enhances the accuracy of existing VID methods.The ImageNet VID and EPIC KITCHENS datasets are used to evaluate the suggested methodology.We have demonstrated that our proposed method is efficient by achieving 84.3%mean average precision(mAP)on ImageNet VID using less memory in comparison to other leading VID techniques.The source code is available on the website https://github.com/amaharek/SwinVid. 展开更多
关键词 video object detection vision transformers convolutional neural networks deep learning
下载PDF
COVAD: Content-oriented video anomaly detection using a self attention-based deep learning model
4
作者 Wenhao SHAO Praboda RAJAPAKSHA +3 位作者 Yanyan WEI Dun LI Noel CRESPI Zhigang LUO 《Virtual Reality & Intelligent Hardware》 2023年第1期24-41,共18页
Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than consider... Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than considering only the significant context. Method This paper proposes a novel video anomaly detection method called COVAD that mainly focuses on the region of interest in the video instead of the entire video. Our proposed COVAD method is based on an autoencoded convolutional neural network and a coordinated attention mechanism,which can effectively capture meaningful objects in the video and dependencies among different objects. Relying on the existing memory-guided video frame prediction network, our algorithm can significantly predict the future motion and appearance of objects in a video more effectively. Result The proposed algorithm obtained better experimental results on multiple datasets and outperformed the baseline models considered in our analysis. Simultaneously, we provide an improved visual test that can provide pixel-level anomaly explanations. 展开更多
关键词 video surveillance video anomaly detection Machine learning Deep learning Neural network Coordinate attention
下载PDF
Road boundary estimation to improve vehicle detection and tracking in UAV video 被引量:1
5
作者 张立业 彭仲仁 +1 位作者 李立 王华 《Journal of Central South University》 SCIE EI CAS 2014年第12期4732-4741,共10页
Video processing is one challenge in collecting vehicle trajectories from unmanned aerial vehicle(UAV) and road boundary estimation is one way to improve the video processing algorithms. However, current methods do no... Video processing is one challenge in collecting vehicle trajectories from unmanned aerial vehicle(UAV) and road boundary estimation is one way to improve the video processing algorithms. However, current methods do not work well for low volume road, which is not well-marked and with noises such as vehicle tracks. A fusion-based method termed Dempster-Shafer-based road detection(DSRD) is proposed to address this issue. This method detects road boundary by combining multiple information sources using Dempster-Shafer theory(DST). In order to test the performance of the proposed method, two field experiments were conducted, one of which was on a highway partially covered by snow and another was on a dense traffic highway. The results show that DSRD is robust and accurate, whose detection rates are 100% and 99.8% compared with manual detection results. Then, DSRD is adopted to improve UAV video processing algorithm, and the vehicle detection and tracking rate are improved by 2.7% and 5.5%,respectively. Also, the computation time has decreased by 5% and 8.3% for two experiments, respectively. 展开更多
关键词 road boundary detection vehicle detection and tracking airborne video unmanned aerial vehicle Dempster-Shafer theory
下载PDF
Video Shot Boundary Detection in MPEG Compressed Sequences Using SVM Learning 被引量:1
6
作者 GUO Lihua YANG Shutang LIJianhua TONGZhipeng(School of Electronic and Information Technology,Shanghai JiaoTong University Shanghai 200030 China) 《Journal of Electronic Science and Technology of China》 2003年第1期15-17,28,共4页
A number of automated video shot boundary detection methods for indexing a videosequence to facilitate browsing and retrieval have been proposed in recent years.Among these methods,the dissolve shot boundary isn't... A number of automated video shot boundary detection methods for indexing a videosequence to facilitate browsing and retrieval have been proposed in recent years.Among these methods,the dissolve shot boundary isn't accurately detected because it involves the camera operation and objectmovement.In this paper,a method based on support vector machine (SVM) is proposed to detect thedissolve shot boundary in MPEG compressed sequence.The problem of detection between the dissolveshot boundary and other boundaries is considered as two-class classification in our method.Featuresfrom the compressed sequences are directly extracted without decoding them,and the optimal classboundary between two classes are learned from training data by using SVM.Experiments,whichcompare various classification methods,show that using proposed method encourages performance ofvideo shot boundary detection. 展开更多
关键词 video shot boundary detection dissolve detection MPEG compressed sequences support vector machine(SVM)
下载PDF
A Novel Divide and Conquer Solution for Long-term Video Salient Object Detection
7
作者 Yun-Xiao Li Cheng-Li-Zhao Chen +2 位作者 Shuai Li Ai-Min Hao Hong Qin 《Machine Intelligence Research》 EI CSCD 2024年第4期684-703,共20页
Recently,a new research trend in our video salient object detection(VSOD)research community has focused on enhancing the detection results via model self-fine-tuning using sparsely mined high-quality keyframes from th... Recently,a new research trend in our video salient object detection(VSOD)research community has focused on enhancing the detection results via model self-fine-tuning using sparsely mined high-quality keyframes from the given sequence.Although such a learning scheme is generally effective,it has a critical limitation,i.e.,the model learned on sparse frames only possesses weak generalization ability.This situation could become worse on“long”videos since they tend to have intensive scene variations.Moreover,in such videos,the keyframe information from a longer time span is less relevant to the previous,which could also cause learning conflict and deteriorate the model performance.Thus,the learning scheme is usually incapable of handling complex pattern modeling.To solve this problem,we propose a divide-and-conquer framework,which can convert a complex problem domain into multiple simple ones.First,we devise a novel background consistency analysis(BCA)which effectively divides the mined frames into disjoint groups.Then for each group,we assign an individual deep model on it to capture its key attribute during the fine-tuning phase.During the testing phase,we design a model-matching strategy,which could dynamically select the best-matched model from those fine-tuned ones to handle the given testing frame.Comprehensive experiments show that our method can adapt severe background appearance variation coupling with object movement and obtain robust saliency detection compared with the previous scheme and the state-of-the-art methods. 展开更多
关键词 video salient object detection background consistency analysis weakly supervised learning long-term information background shift.
原文传递
JudPriNet: Video transition detection based on semantic relationship and Monte Carlo sampling
8
作者 Bo Ma Jinsong Wu Wei Qi Yan 《Intelligent and Converged Networks》 EI 2024年第2期134-146,共13页
Video understanding and content boundary detection are vital stages in video recommendation.However,previous content boundary detection methods require collecting information,including location,cast,action,and audio,a... Video understanding and content boundary detection are vital stages in video recommendation.However,previous content boundary detection methods require collecting information,including location,cast,action,and audio,and if any of these elements are missing,the results may be adversely affected.To address this issue and effectively detect transitions in video content,in this paper,we introduce a video classification and boundary detection method named JudPriNet.The focus of this paper is on objects in videos along with their labels,enabling automatic scene detection in video clips and establishing semantic connections among local objects in the images.As a significant contribution,JudPriNet presents a framework that maps labels to“Continuous Bag of Visual Words Model”to cluster labels and generates new standardized labels as video-type tags.This facilitates automatic classification of video clips.Furthermore,JudPriNet employs Monte Carlo sampling method to classify video clips,the features of video clips as elements within the framework.This proposed method seamlessly integrates video and textual components without compromising training and inference speed.Through experimentation,we have demonstrated that JudPriNet,with its semantic connections,is able to effectively classify videos alongside textual content.Our results indicate that,compared with several other detection approaches,JudPriNet excels in high-level content detection without disrupting the integrity of the video content,outperforming existing methods. 展开更多
关键词 video scene detection Monte Carlo object detection Continuous Bag-of-Words
原文传递
A New Fire Detection Method Using a Multi-Expert System Based on Color Dispersion, Similarity and Centroid Motion in Indoor Environment 被引量:8
9
作者 Teng Wang Leping Bu +2 位作者 Zhikai Yang Peng Yuan Jineng Ouyang 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2020年第1期263-275,共13页
In this paper, a video fire detection method is proposed, which demonstrated good performance in indoor environment. Three main novel ideas have been introduced. Firstly, a flame color model in RGB and HIS color space... In this paper, a video fire detection method is proposed, which demonstrated good performance in indoor environment. Three main novel ideas have been introduced. Firstly, a flame color model in RGB and HIS color space is used to extract pre-detected regions instead of traditional motion differential method, as it’s more suitable for fire detection in indoor environment. Secondly, according to the flicker characteristic of the flame, similarity and two main values of centroid motion are proposed. At the same time, a simple but effective method for tracking the same regions in consecutive frames is established. Thirdly,a multi-expert system consisting of color component dispersion,similarity and centroid motion is established to identify flames.The proposed method has been tested on a very large dataset of fire videos acquired both in real indoor environment tests and from the Internet. The experimental results show that the proposed approach achieved a balance between the false positive rate and the false negative rate, and demonstrated a better performance in terms of overall accuracy and F standard with respect to other similar fire detection methods in indoor environment. 展开更多
关键词 Color dispersion centroid motion expert system RGB-HIS color model SIMILARITY video fire detection
下载PDF
A robust system for real-time pedestrian detection and tracking 被引量:2
10
作者 李琦 邵春福 赵熠 《Journal of Central South University》 SCIE EI CAS 2014年第4期1643-1653,共11页
A real-time pedestrian detection and tracking system using a single video camera was developed to monitor pedestrians. This system contained six modules: video flow capture, pre-processing, movement detection, shadow ... A real-time pedestrian detection and tracking system using a single video camera was developed to monitor pedestrians. This system contained six modules: video flow capture, pre-processing, movement detection, shadow removal, tracking, and object classification. The Gaussian mixture model was utilized to extract the moving object from an image sequence segmented by the mean-shift technique in the pre-processing module. Shadow removal was used to alleviate the negative impact of the shadow to the detected objects. A model-free method was adopted to identify pedestrians. The maximum and minimum integration methods were developed to integrate multiple cues into the mean-shift algorithm and the initial tracking iteration with the competent integrated probability distribution map for object tracking. A simple but effective algorithm was proposed to handle full occlusion cases. The system was tested using real traffic videos from different sites. The results of the test confirm that the system is reliable and has an overall accuracy of over 85%. 展开更多
关键词 image processing technique pedestrian detection tracking video camera
下载PDF
Transfer Learning on Deep Neural Networks to Detect Pornography
11
作者 Saleh Albahli 《Computer Systems Science & Engineering》 SCIE EI 2022年第11期701-717,共17页
While the internet has a lot of positive impact on society,there are negative components.Accessible to everyone through online platforms,pornography is,inducing psychological and health related issues among people of ... While the internet has a lot of positive impact on society,there are negative components.Accessible to everyone through online platforms,pornography is,inducing psychological and health related issues among people of all ages.While a difficult task,detecting pornography can be the important step in determining the porn and adult content in a video.In this paper,an architecture is proposed which yielded high scores for both training and testing.This dataset was produced from 190 videos,yielding more than 19 h of videos.The main sources for the content were from YouTube,movies,torrent,and websites that hosts both pornographic and non-pornographic contents.The videos were from different ethnicities and skin color which ensures the models can detect any kind of video.A VGG16,Inception V3 and Resnet 50 models were initially trained to detect these pornographic images but failed to achieve a high testing accuracy with accuracies of 0.49,0.49 and 0.78 respectively.Finally,utilizing transfer learning,a convolutional neural network was designed and yielded an accuracy of 0.98. 展开更多
关键词 Pornographic video detection classification convolutional neural network InceptionV3 Resnet50 VGG16
下载PDF
Full-duplex strategy for video object segmentation 被引量:1
12
作者 Ge-Peng Ji Deng-Ping Fan +3 位作者 Keren Fu Zhe Wu Jianbing Shen Ling Shao 《Computational Visual Media》 SCIE EI CSCD 2023年第1期155-175,共21页
Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient ... Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet. 展开更多
关键词 video object segmentation(VOS) video salient object detection(V-SOD) visual attention
原文传递
Video Copy Detection Based on Spatiotemporal Fusion Model 被引量:4
13
作者 Jianmin Li Yingyu Liang Bo Zhang 《Tsinghua Science and Technology》 EI CAS 2012年第1期51-59,共9页
Content-based video copy detection is an active research field due to the need for copyright pro- tection and business intellectual property protection. This paper gives a probabilistic spatiotemporal fusion approach ... Content-based video copy detection is an active research field due to the need for copyright pro- tection and business intellectual property protection. This paper gives a probabilistic spatiotemporal fusion approach for video copy detection. This approach directly estimates the location of the copy segment with a probabilistic graphical model. The spatial and temporal consistency of the video copy is embedded in the local probability function. An effective local descriptor and a two-level descriptor pairing method are used to build a video copy detection system to evaluate the approach. Tests show that it outperforms the popular voting algorithm and the probabilistic fusion framework based on the Hidden Markov Model, improving F-score (F1) by 8%. 展开更多
关键词 video copy detection probabilistic graphical model spatiotemporal fusion model
原文传递
Multiple hypergraph ranking for video concept detection 被引量:1
14
作者 Ya-hong HAN Jian SHAO Fei WU Bao-gang WEI 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2010年第7期525-537,共13页
This paper tackles the problem of video concept detection using the multi-modality fusion method. Motivated by multi-view learning algorithms, multi-modality features of videos can be represented by multiple graphs. A... This paper tackles the problem of video concept detection using the multi-modality fusion method. Motivated by multi-view learning algorithms, multi-modality features of videos can be represented by multiple graphs. And the graph-based semi-supervised learning methods can be extended to multiple graphs to predict the semantic labels for unlabeled video data. However, traditional graphs represent only homogeneous pairwise linking relations, and therefore the high-order correlations inherent in videos, such as high-order visual similarities, are ignored. In this paper we represent heterogeneous features by multiple hypergraphs and then the high-order correlated samples can be associated with hyperedges. Furthermore, the multi-hypergraph ranking (MHR) algorithm is proposed by defining Markov random walk on each hypergraph and then forming the mixture Markov chains so as to perform transductive learning in multiple hypergraphs. In experiments on the TRECVID dataset, a triple-hypergraph consisting of visual, textual features and multiple labeled tags is constructed to predict concept labels for unlabeled video shots by the MHR framework. Experimental results show that our approach is effective. 展开更多
关键词 Multiple hypergraph ranking video concept detection Multi-view learning Multiple labeled tags CLUSTERING
原文传递
A sparse representation-based approach for video copy detection
15
作者 Jianmin LI Chen SUN Bo ZHANG 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2012年第2期208-215,共8页
Content-based video copy detection becomes an active research field due to requirement of copyright protection, business intelligence, video retrieval, etc. Although it is assumed in the existing methods that referenc... Content-based video copy detection becomes an active research field due to requirement of copyright protection, business intelligence, video retrieval, etc. Although it is assumed in the existing methods that reference database consists of original videos, these videos are difficult to be obtained in many practical cases. In this paper, a copy detection method based on sparse repre- sentation is proposed to make use of some imperfect prototypes of original videos maintained in the reference database. A query video is represented as a linear combination of all the videos in the database. Then we can determine that whether the query has sibling videos in the database based on distribution of coefficients and find them out based on reconstruction error. The experiments show that even with very limited dimensional feature, this method can achieve high performance. 展开更多
关键词 video copy detection near duplicated video sparse representation
原文传递
Unsupervised object detection with scene-adaptive concept learning 被引量:2
16
作者 Shiliang PU Wei ZHAO +3 位作者 Weijie CHEN Shicai YANG Di XIE Yunhe PAN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2021年第5期638-651,共14页
Object detection is one of the hottest research directions in computer vision,has already made impressive progress in academia,and has many valuable applications in the industry.However,the mainstream detection method... Object detection is one of the hottest research directions in computer vision,has already made impressive progress in academia,and has many valuable applications in the industry.However,the mainstream detection methods still have two shortcomings:(1)even a model that is well trained using large amounts of data still cannot generally be used across different kinds of scenes;(2)once a model is deployed,it cannot autonomously evolve along with the accumulated unlabeled scene data.To address these problems,and inspired by visual knowledge theory,we propose a novel scene-adaptive evolution unsupervised video object detection algorithm that can decrease the impact of scene changes through the concept of object groups.We first extract a large number of object proposals from unlabeled data through a pre-trained detection model.Second,we build the visual knowledge dictionary of object concepts by clustering the proposals,in which each cluster center represents an object prototype.Third,we look into the relations between different clusters and the object information of different groups,and propose a graph-based group information propagation strategy to determine the category of an object concept,which can effectively distinguish positive and negative proposals.With these pseudo labels,we can easily fine-tune the pretrained model.The effectiveness of the proposed method is verified by performing different experiments,and the significant improvements are achieved. 展开更多
关键词 Visual knowledge Unsupervised video object detection Scene-adaptive learning
原文传递
A novel robotic visual perception framework for underwater operation 被引量:1
17
作者 Yue LU Xingyu CHEN +2 位作者 Zhengxing WU Junzhi YU Li WEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第11期1602-1619,共18页
Underwater robotic operation usually requires visual perception(e.g.,object detection and tracking),but underwater scenes have poor visual quality and represent a special domain which can affect the accuracy of visual... Underwater robotic operation usually requires visual perception(e.g.,object detection and tracking),but underwater scenes have poor visual quality and represent a special domain which can affect the accuracy of visual perception.In addition,detection continuity and stability are important for robotic perception,but the commonly used static accuracy based evaluation(i.e.,average precision)is insufficient to reflect detector performance across time.In response to these two problems,we present a design for a novel robotic visual perception framework.First,we generally investigate the relationship between a quality-diverse data domain and visual restoration in detection performance.As a result,although domain quality has an ignorable effect on within-domain detection accuracy,visual restoration is beneficial to detection in real sea scenarios by reducing the domain shift.Moreover,non-reference assessments are proposed for detection continuity and stability based on object tracklets.Further,online tracklet refinement is developed to improve the temporal performance of detectors.Finally,combined with visual restoration,an accurate and stable underwater robotic visual perception framework is established.Small-overlap suppression is proposed to extend video object detection(VID)methods to a single-object tracking task,leading to the flexibility to switch between detection and tracking.Extensive experiments were conducted on the ImageNet VID dataset and real-world robotic tasks to verify the correctness of our analysis and the superiority of our proposed approaches.The codes are available at https://github.com/yrqs/VisPerception. 展开更多
关键词 Underwater operation Robotic perception Visual restoration video object detection
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部