The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method in...The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method involves extracting structured data from video frames using facial landmark detection,which is then used as input to the CNN.The customized Convolutional Neural Network method is the date augmented-based CNN model to generate‘fake data’or‘fake images’.This study was carried out using Python and its libraries.We used 242 films from the dataset gathered by the Deep Fake Detection Challenge,of which 199 were made up and the remaining 53 were real.Ten seconds were allotted for each video.There were 318 videos used in all,199 of which were fake and 119 of which were real.Our proposedmethod achieved a testing accuracy of 91.47%,loss of 0.342,and AUC score of 0.92,outperforming two alternative approaches,CNN and MLP-CNN.Furthermore,our method succeeded in greater accuracy than contemporary models such as XceptionNet,Meso-4,EfficientNet-BO,MesoInception-4,VGG-16,and DST-Net.The novelty of this investigation is the development of a new Convolutional Neural Network(CNN)learning model that can accurately detect deep fake face photos.展开更多
Cloud computing has drastically changed the delivery and consumption of live streaming content.The designs,challenges,and possible uses of cloud computing for live streaming are studied.A comprehensive overview of the...Cloud computing has drastically changed the delivery and consumption of live streaming content.The designs,challenges,and possible uses of cloud computing for live streaming are studied.A comprehensive overview of the technical and business issues surrounding cloudbased live streaming is provided,including the benefits of cloud computing,the various live streaming architectures,and the challenges that live streaming service providers face in delivering high‐quality,real‐time services.The different techniques used to improve the performance of video streaming,such as adaptive bit‐rate streaming,multicast distribution,and edge computing are discussed and the necessity of low‐latency and high‐quality video transmission in cloud‐based live streaming is underlined.Issues such as improving user experience and live streaming service performance using cutting‐edge technology,like artificial intelligence and machine learning are discussed.In addition,the legal and regulatory implications of cloud‐based live streaming,including issues with network neutrality,data privacy,and content moderation are addressed.The future of cloud computing for live streaming is examined in the section that follows,and it looks at the most likely new developments in terms of trends and technology.For technology vendors,live streaming service providers,and regulators,the findings have major policy‐relevant implications.Suggestions on how stakeholders should address these concerns and take advantage of the potential presented by this rapidly evolving sector,as well as insights into the key challenges and opportunities associated with cloud‐based live streaming are provided.展开更多
Video captioning aims at automatically generating a natural language caption to describe the content of a video.However,most of the existing methods in the video captioning task ignore the relationship between objects...Video captioning aims at automatically generating a natural language caption to describe the content of a video.However,most of the existing methods in the video captioning task ignore the relationship between objects in the video and the correlation between multimodal features,and they also ignore the effect of caption length on the task.This study proposes a novel video captioning framework(ORMF)based on the object relation graph and multimodal feature fusion.ORMF uses the similarity and Spatio-temporal relationship of objects in video to construct object relation features graph and introduce graph convolution network(GCN)to encode the object relation.At the same time,ORMF also constructs a multimodal features fusion network to learn the relationship between different modal features.The multimodal feature fusion network is used to fuse the features of different modals.Furthermore,the proposed model calculates the length loss of the caption,making the caption get richer information.The experimental results on two public datasets(Microsoft video captioning corpus[MSVD]and Microsoft research-video to text[MSR-VTT])demonstrate the effectiveness of our method.展开更多
Currently,worldwide industries and communities are concerned with building,expanding,and exploring the assets and resources found in the oceans and seas.More precisely,to analyze a stock,archaeology,and surveillance,s...Currently,worldwide industries and communities are concerned with building,expanding,and exploring the assets and resources found in the oceans and seas.More precisely,to analyze a stock,archaeology,and surveillance,sev-eral cameras are installed underseas to collect videos.However,on the other hand,these large size videos require a lot of time and memory for their processing to extract relevant information.Hence,to automate this manual procedure of video assessment,an accurate and efficient automated system is a greater necessity.From this perspective,we intend to present a complete framework solution for the task of video summarization and object detection in underwater videos.We employed a perceived motion energy(PME)method tofirst extract the keyframes followed by an object detection model approach namely YoloV3 to perform object detection in underwater videos.The issues of blurriness and low contrast in underwater images are also taken into account in the presented approach by applying the image enhancement method.Furthermore,the suggested framework of underwater video summarization and object detection has been evaluated on a publicly available brackish dataset.It is observed that the proposed framework shows good performance and hence ultimately assists several marine researchers or scientists related to thefield of underwater archaeology,stock assessment,and surveillance.展开更多
Football is one of the most-watched sports,but analyzing players’per-formance is currently difficult and labor intensive.Performance analysis is done manually,which means that someone must watch video recordings and ...Football is one of the most-watched sports,but analyzing players’per-formance is currently difficult and labor intensive.Performance analysis is done manually,which means that someone must watch video recordings and then log each player’s performance.This includes the number of passes and shots taken by each player,the location of the action,and whether or not the play had a successful outcome.Due to the time-consuming nature of manual analyses,interest in automatic analysis tools is high despite the many interdependent phases involved,such as pitch segmentation,player and ball detection,assigning players to their teams,identifying individual players,activity recognition,etc.This paper proposes a system for developing an automatic video analysis tool for sports.The proposed system is the first to integrate multiple phases,such as segmenting the field,detecting the players and the ball,assigning players to their teams,and iden-tifying players’jersey numbers.In team assignment,this research employed unsu-pervised learning based on convolutional autoencoders(CAEs)to learn discriminative latent representations and minimize the latent embedding distance between the players on the same team while simultaneously maximizing the dis-tance between those on opposing teams.This paper also created a highly accurate approach for the real-time detection of the ball.Furthermore,it also addressed the lack of jersey number datasets by creating a new dataset with more than 6,500 images for numbers ranging from 0 to 99.Since achieving a high perfor-mance in deep learning requires a large training set,and the collected dataset was not enough,this research utilized transfer learning(TL)to first pretrain the jersey number detection model on another large dataset and then fine-tune it on the target dataset to increase the accuracy.To test the proposed system,this paper presents a comprehensive evaluation of its individual stages as well as of the sys-tem as a whole.展开更多
Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed video...Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed videos is proposed to analyze the video events according to six low-level features. After the detailed analysis of video events, the pattern of global motion and five features in foreground—the principal parts of videos, are employed as the observations of the Hidden Markov Model to classify events in videos. The applications of the proposed framework in some video event detections demonstrate the promising success of the proposed framework on semantic video analysis.展开更多
Foreground detection methods can be applied to efficiently distinguish foreground objects including moving or static objects from back- ground which is very important in the application of video analysis, especially v...Foreground detection methods can be applied to efficiently distinguish foreground objects including moving or static objects from back- ground which is very important in the application of video analysis, especially video surveillance. An excellent background model can obtain a good foreground detection results. A lot of background modeling methods had been proposed, but few comprehensive evaluations of them are available. These methods suffer from various challenges such as illumination changes and dynamic background. This paper first analyzed advantages and disadvantages of various background modeling methods in video analysis applications and then compared their performance in terms of quality and the computational cost. The Change detection.Net (CDnet2014) dataset and another video dataset with different envi- ronmental conditions (indoor, outdoor, snow) were used to test each method. The experimental results sufficiently demonstrated the strengths and drawbacks of traditional and recently proposed state-of-the-art background modeling methods. This work is helpful for both researchers and engineering practitioners. Codes of background modeling methods evaluated in this paper are available atwww.yongxu.org/lunwen.html.展开更多
:Strabismus is a medical condition that is defined as the lack of coordination between the eyes.When Strabismus is detected at an early age,the chances of curing it are higher.The methods used to detect strabismus and...:Strabismus is a medical condition that is defined as the lack of coordination between the eyes.When Strabismus is detected at an early age,the chances of curing it are higher.The methods used to detect strabismus and measure its degree of deviation are complex and time-consuming,and they always require the presence of a physician.In this paper,we present a method of detecting strabismus and measuring its degree of deviation using videos of the patient’s eye region under a cover test.Our method involves extracting features from a set of training videos(training corpora)and using them to build a classifier.A decision tree(ID3)is built using labeled cases from actual strabismus diagnosis.Patterns are extracted from the corresponding videos of patients,and an association between the extracted features and actual diagnoses is established.Matching Rules from the correlation plot are used to predict diagnoses for future patients.The classifier was tested using a set of testing videos(testing corpora).The results showed 95.9%accuracy,4.1%were light cases and could not be detected correctly from the videos,half of them were false positive and the other half was false negative.展开更多
Trials is a specialty of off-road cycling in which the rider has to face obstacle courses without resting feet on the ground. Technique in this sport has a great importance, since it reduces the risk of committing pen...Trials is a specialty of off-road cycling in which the rider has to face obstacle courses without resting feet on the ground. Technique in this sport has a great importance, since it reduces the risk of committing penalties and allows more efficient execution of the gesture. To improve technique, the motion analysis allows to study the gesture both qualitatively and quantitatively. In this work video analysis was used to study the side hop from rear wheel technique. Two different executions of this technique were analyzed. The primary purpose is the identification of the phases that make up the technical gesture. It was given an explanation to the movement strategies adopted in the execution of the jump in the two different situations.展开更多
Deep learning-based action classification technology has been applied to various fields,such as social safety,medical services,and sports.Analyzing an action on a practical level requires tracking multiple human bodie...Deep learning-based action classification technology has been applied to various fields,such as social safety,medical services,and sports.Analyzing an action on a practical level requires tracking multiple human bodies in an image in real-time and simultaneously classifying their actions.There are various related studies on the real-time classification of actions in an image.However,existing deep learning-based action classification models have prolonged response speeds,so there is a limit to real-time analysis.In addition,it has low accuracy of action of each object ifmultiple objects appear in the image.Also,it needs to be improved since it has a memory overhead in processing image data.Deep learning-based action classification using one-shot object detection is proposed to overcome the limitations of multiframe-based analysis technology.The proposed method uses a one-shot object detection model and a multi-object tracking algorithm to detect and track multiple objects in the image.Then,a deep learning-based pattern classification model is used to classify the body action of the object in the image by reducing the data for each object to an action vector.Compared to the existing studies,the constructed model shows higher accuracy of 74.95%,and in terms of speed,it offered better performance than the current studies at 0.234 s per frame.The proposed model makes it possible to classify some actions only through action vector learning without additional image learning because of the vector learning feature of the posterior neural network.Therefore,it is expected to contribute significantly to commercializing realistic streaming data analysis technologies,such as CCTV.展开更多
Artificial intelligence is increasingly being applied in the field of video analysis,particularly in the area of public safety where video surveillance equipment such as closed-circuit television(CCTV)is used and auto...Artificial intelligence is increasingly being applied in the field of video analysis,particularly in the area of public safety where video surveillance equipment such as closed-circuit television(CCTV)is used and automated analysis of video information is required.However,various issues such as data size limitations and low processing speeds make real-time extraction of video data challenging.Video analysis technology applies object classification,detection,and relationship analysis to continuous 2D frame data,and the various meanings within the video are thus analyzed based on the extracted basic data.Motion recognition is key in this analysis.Motion recognition is a challenging field that analyzes human body movements,requiring the interpretation of complex movements of human joints and the relationships between various objects.The deep learning-based human skeleton detection algorithm is a representative motion recognition algorithm.Recently,motion analysis models such as the SlowFast network algorithm,have also been developed with excellent performance.However,these models do not operate properly in most wide-angle video environments outdoors,displaying low response speed,as expected from motion classification extraction in environments associated with high-resolution images.The proposed method achieves high level of extraction and accuracy by improving SlowFast’s input data preprocessing and data structure methods.The input data are preprocessed through object tracking and background removal using YOLO and DeepSORT.A higher performance than that of a single model is achieved by improving the existing SlowFast’s data structure into a frame unit structure.Based on the confusion matrix,accuracies of 70.16%and 70.74%were obtained for the existing SlowFast and proposed model,respectively,indicating a 0.58%increase in accuracy.Comparing detection,based on behavioral classification,the existing SlowFast detected 2,341,164 cases,whereas the proposed model detected 3,119,323 cases,which is an increase of 33.23%.展开更多
Abnormal event detection aims to automatically identify unusual events that do not comply with expectation.Recently,many methods have been proposed to obtain the temporal locations of abnormal events under various det...Abnormal event detection aims to automatically identify unusual events that do not comply with expectation.Recently,many methods have been proposed to obtain the temporal locations of abnormal events under various determined thresholds.However,the specific categories of abnormal events are mostly neglect,which are important to help in monitoring agents to make decisions.In this study,a Temporal Attention Network(TANet)is proposed to capture both the specific categories and temporal locations of abnormal events in a weakly supervised manner.The TANet learns the anomaly score and specific category for each video segment with only video-level abnormal event labels.An event recognition module is exploited to predict the event scores for each video segment while a temporal attention module is proposed to learn a temporal attention value.Finally,to learn anomaly scores and specific categories,three constraints are considered:event category constraint,event separation constraint and temporal smoothness constraint.Experiments on the University of Central Florida Crime dataset demonstrate the effectiveness of the proposed method.展开更多
We propose a mobile system,called PotholeEye+,for automatically monitoring the surface of a roadway and detecting the pavement distress in real-time through analysis of a video.PotholeEye+pre-processes the images,extr...We propose a mobile system,called PotholeEye+,for automatically monitoring the surface of a roadway and detecting the pavement distress in real-time through analysis of a video.PotholeEye+pre-processes the images,extracts features,and classifies the distress into a variety of types,while the road manager is driving.Every day for a year,we have tested PotholeEye+on real highway involving real settings,a camera,a mini computer,a GPS receiver,and so on.Consequently,PotholeEye+detected the pavement distress with accuracy of 92%,precision of 87%and recall 74%averagely during driving at an average speed of 110 km/h on a real highway.展开更多
Avoiding lameness or leg weakness in pig production is crucial to reduce cost, improve animal welfare and meat quality. Detection of lameness detection by the use of vision systems may assist the farmer or breeder to ...Avoiding lameness or leg weakness in pig production is crucial to reduce cost, improve animal welfare and meat quality. Detection of lameness detection by the use of vision systems may assist the farmer or breeder to obtain a more accurate and robust measurement of lameness. The paper presents a low-cost vision system for measuring the locomotion of moving pigs based on motion detection, frame-grabbing and multivariate image analysis. The first step is to set up a video system based on web camera technology and choose a test area. Secondly, a motion detection and data storage system are used to build a processing system of video data. The video data are analyzed measuring the properties of each image, stacking them for each animal and then analyze these stacks using multivariate image analysis. The system was able to obtain and decompose information from these stacks, where components could be extracted, representing a particular motion pattern. These components could be used to classify or score animals according to this pattern, which might be an indicator of lameness. However, further improvement is needed with respect to standardization of herding, test area and tracking of animals in order to have a robust system to be used in a farm environment.展开更多
The advent of the COVID-19 pandemic has adversely affected the entire world and has put forth high demand for techniques that remotely manage crowd-related tasks.Video surveillance and crowd management using video ana...The advent of the COVID-19 pandemic has adversely affected the entire world and has put forth high demand for techniques that remotely manage crowd-related tasks.Video surveillance and crowd management using video analysis techniques have significantly impacted today’s research,and numerous applications have been developed in this domain.This research proposed an anomaly detection technique applied to Umrah videos in Kaaba during the COVID-19 pandemic through sparse crowd analysis.Managing theKaaba rituals is crucial since the crowd gathers from around the world and requires proper analysis during these days of the pandemic.The Umrah videos are analyzed,and a system is devised that can track and monitor the crowd flow in Kaaba.The crowd in these videos is sparse due to the pandemic,and we have developed a technique to track the maximum crowd flow and detect any object(person)moving in the direction unlikely of the major flow.We have detected abnormal movement by creating the histograms for the vertical and horizontal flows and applying thresholds to identify the non-majority flow.Our algorithm aims to analyze the crowd through video surveillance and timely detect any abnormal activity tomaintain a smooth crowd flowinKaaba during the pandemic.展开更多
A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descrip...A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descriptor (MFD) is designed to describe motion feature of each block in a picture based on motion intensity, motion in occlusion areas, and motion correlation among neighbouring blocks. Then, a fuzzy C-means clustering algorithm (FCM) is implemented based on those MFDs so as to segment moving objects. Moreover, a new parameter named as gathering degree is used to distinguish foreground moving objects and background motion. Experimental results demonstrate the effectiveness of the proposed method.展开更多
Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Seman...Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.展开更多
To overcome the limitations of traditional dairy cow's rumination detection methods,a video-based analysis on the intelligent monitoring method of cow ruminant behavior was proposed in this study.The Mean Shift al...To overcome the limitations of traditional dairy cow's rumination detection methods,a video-based analysis on the intelligent monitoring method of cow ruminant behavior was proposed in this study.The Mean Shift algorithm was used to track the jaw motion of dairy cows accurately.The centroid trajectory curve of the cow mouth motion was subsequently extracted from the video.In this way,the monitoring of the ruminant behavior of dairy cows was realized.To verify the accuracy of the method,six videos,a total of 99'00",24000 frames were selected.The test results demonstrated that the success rate of this method was 92.03%,despite the interference of behaviors,such as raising or turning of the cow’s head.The results demonstrate that this method,which monitors the ruminant behavior of dairy cows,is effective and feasible.展开更多
In order to realize the automatic monitoring of ruminant activities of cows,an automatic detection method for the mouth area of ruminant cows based on machine vision technology was studied.Optical flow was used to cal...In order to realize the automatic monitoring of ruminant activities of cows,an automatic detection method for the mouth area of ruminant cows based on machine vision technology was studied.Optical flow was used to calculate the relative motion speed of each pixel in the video frame images.The candidate mouth region with large motion ranges was extracted,and a series of processing methods,such as grayscale processing,threshold segmentation,pixel point expansion and adjacent region merging,were carried out to extract the real area of cows’mouth.To verify the accuracy of the proposed method,six videos with a total length of 96 min were selected for this research.The results showed that the highest accuracy was 87.80%,the average accuracy was 76.46%and the average running time of the algorithm was 6.39 s.All the results showed that this method can be used to detect the mouth area automatically,which lays the foundation for automatic monitoring of cows’ruminant behavior.展开更多
Human group activity recognition(GAR)has attracted significant attention from computer vision researchers due to its wide practical applications in security surveillance,social role understanding and sports video anal...Human group activity recognition(GAR)has attracted significant attention from computer vision researchers due to its wide practical applications in security surveillance,social role understanding and sports video analysis.In this paper,we give a comprehensive overview of the advances in group activity recognition in videos during the past 20 years.First,we provide a summary and comparison of 11 GAR video datasets in this field.Second,we survey the group activity recognition methods,including those based on handcrafted features and those based on deep learning networks.For better understanding of the pros and cons of these methods,we compare various models from the past to the present.Finally,we outline several challenging issues and possible directions for future research.From this comprehensive literature review,readers can obtain an overview of progress in group activity recognition for future studies.展开更多
基金Science and Technology Funds from the Liaoning Education Department(Serial Number:LJKZ0104).
文摘The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method involves extracting structured data from video frames using facial landmark detection,which is then used as input to the CNN.The customized Convolutional Neural Network method is the date augmented-based CNN model to generate‘fake data’or‘fake images’.This study was carried out using Python and its libraries.We used 242 films from the dataset gathered by the Deep Fake Detection Challenge,of which 199 were made up and the remaining 53 were real.Ten seconds were allotted for each video.There were 318 videos used in all,199 of which were fake and 119 of which were real.Our proposedmethod achieved a testing accuracy of 91.47%,loss of 0.342,and AUC score of 0.92,outperforming two alternative approaches,CNN and MLP-CNN.Furthermore,our method succeeded in greater accuracy than contemporary models such as XceptionNet,Meso-4,EfficientNet-BO,MesoInception-4,VGG-16,and DST-Net.The novelty of this investigation is the development of a new Convolutional Neural Network(CNN)learning model that can accurately detect deep fake face photos.
文摘Cloud computing has drastically changed the delivery and consumption of live streaming content.The designs,challenges,and possible uses of cloud computing for live streaming are studied.A comprehensive overview of the technical and business issues surrounding cloudbased live streaming is provided,including the benefits of cloud computing,the various live streaming architectures,and the challenges that live streaming service providers face in delivering high‐quality,real‐time services.The different techniques used to improve the performance of video streaming,such as adaptive bit‐rate streaming,multicast distribution,and edge computing are discussed and the necessity of low‐latency and high‐quality video transmission in cloud‐based live streaming is underlined.Issues such as improving user experience and live streaming service performance using cutting‐edge technology,like artificial intelligence and machine learning are discussed.In addition,the legal and regulatory implications of cloud‐based live streaming,including issues with network neutrality,data privacy,and content moderation are addressed.The future of cloud computing for live streaming is examined in the section that follows,and it looks at the most likely new developments in terms of trends and technology.For technology vendors,live streaming service providers,and regulators,the findings have major policy‐relevant implications.Suggestions on how stakeholders should address these concerns and take advantage of the potential presented by this rapidly evolving sector,as well as insights into the key challenges and opportunities associated with cloud‐based live streaming are provided.
基金The National Natural Science Foundation of China under Grant,Grant/Award Number:62077015National Natural Science Foundation of ChinaZhejiang Normal University。
文摘Video captioning aims at automatically generating a natural language caption to describe the content of a video.However,most of the existing methods in the video captioning task ignore the relationship between objects in the video and the correlation between multimodal features,and they also ignore the effect of caption length on the task.This study proposes a novel video captioning framework(ORMF)based on the object relation graph and multimodal feature fusion.ORMF uses the similarity and Spatio-temporal relationship of objects in video to construct object relation features graph and introduce graph convolution network(GCN)to encode the object relation.At the same time,ORMF also constructs a multimodal features fusion network to learn the relationship between different modal features.The multimodal feature fusion network is used to fuse the features of different modals.Furthermore,the proposed model calculates the length loss of the caption,making the caption get richer information.The experimental results on two public datasets(Microsoft video captioning corpus[MSVD]and Microsoft research-video to text[MSR-VTT])demonstrate the effectiveness of our method.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2020R1G1A1099559).
文摘Currently,worldwide industries and communities are concerned with building,expanding,and exploring the assets and resources found in the oceans and seas.More precisely,to analyze a stock,archaeology,and surveillance,sev-eral cameras are installed underseas to collect videos.However,on the other hand,these large size videos require a lot of time and memory for their processing to extract relevant information.Hence,to automate this manual procedure of video assessment,an accurate and efficient automated system is a greater necessity.From this perspective,we intend to present a complete framework solution for the task of video summarization and object detection in underwater videos.We employed a perceived motion energy(PME)method tofirst extract the keyframes followed by an object detection model approach namely YoloV3 to perform object detection in underwater videos.The issues of blurriness and low contrast in underwater images are also taken into account in the presented approach by applying the image enhancement method.Furthermore,the suggested framework of underwater video summarization and object detection has been evaluated on a publicly available brackish dataset.It is observed that the proposed framework shows good performance and hence ultimately assists several marine researchers or scientists related to thefield of underwater archaeology,stock assessment,and surveillance.
文摘Football is one of the most-watched sports,but analyzing players’per-formance is currently difficult and labor intensive.Performance analysis is done manually,which means that someone must watch video recordings and then log each player’s performance.This includes the number of passes and shots taken by each player,the location of the action,and whether or not the play had a successful outcome.Due to the time-consuming nature of manual analyses,interest in automatic analysis tools is high despite the many interdependent phases involved,such as pitch segmentation,player and ball detection,assigning players to their teams,identifying individual players,activity recognition,etc.This paper proposes a system for developing an automatic video analysis tool for sports.The proposed system is the first to integrate multiple phases,such as segmenting the field,detecting the players and the ball,assigning players to their teams,and iden-tifying players’jersey numbers.In team assignment,this research employed unsu-pervised learning based on convolutional autoencoders(CAEs)to learn discriminative latent representations and minimize the latent embedding distance between the players on the same team while simultaneously maximizing the dis-tance between those on opposing teams.This paper also created a highly accurate approach for the real-time detection of the ball.Furthermore,it also addressed the lack of jersey number datasets by creating a new dataset with more than 6,500 images for numbers ranging from 0 to 99.Since achieving a high perfor-mance in deep learning requires a large training set,and the collected dataset was not enough,this research utilized transfer learning(TL)to first pretrain the jersey number detection model on another large dataset and then fine-tune it on the target dataset to increase the accuracy.To test the proposed system,this paper presents a comprehensive evaluation of its individual stages as well as of the sys-tem as a whole.
基金Supported in part by the National Natural Science Foundation of China (No. 60572045)the Ministry of Education of China Ph.D. Program Foundation (No.20050698033)Cooperation Project (2005.7-2007.6) with Microsoft Research Asia.
文摘Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed videos is proposed to analyze the video events according to six low-level features. After the detailed analysis of video events, the pattern of global motion and five features in foreground—the principal parts of videos, are employed as the observations of the Hidden Markov Model to classify events in videos. The applications of the proposed framework in some video event detections demonstrate the promising success of the proposed framework on semantic video analysis.
文摘Foreground detection methods can be applied to efficiently distinguish foreground objects including moving or static objects from back- ground which is very important in the application of video analysis, especially video surveillance. An excellent background model can obtain a good foreground detection results. A lot of background modeling methods had been proposed, but few comprehensive evaluations of them are available. These methods suffer from various challenges such as illumination changes and dynamic background. This paper first analyzed advantages and disadvantages of various background modeling methods in video analysis applications and then compared their performance in terms of quality and the computational cost. The Change detection.Net (CDnet2014) dataset and another video dataset with different envi- ronmental conditions (indoor, outdoor, snow) were used to test each method. The experimental results sufficiently demonstrated the strengths and drawbacks of traditional and recently proposed state-of-the-art background modeling methods. This work is helpful for both researchers and engineering practitioners. Codes of background modeling methods evaluated in this paper are available atwww.yongxu.org/lunwen.html.
基金funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University,through the Research Funding Program(Grand No.FRP-1440-32).
文摘:Strabismus is a medical condition that is defined as the lack of coordination between the eyes.When Strabismus is detected at an early age,the chances of curing it are higher.The methods used to detect strabismus and measure its degree of deviation are complex and time-consuming,and they always require the presence of a physician.In this paper,we present a method of detecting strabismus and measuring its degree of deviation using videos of the patient’s eye region under a cover test.Our method involves extracting features from a set of training videos(training corpora)and using them to build a classifier.A decision tree(ID3)is built using labeled cases from actual strabismus diagnosis.Patterns are extracted from the corresponding videos of patients,and an association between the extracted features and actual diagnoses is established.Matching Rules from the correlation plot are used to predict diagnoses for future patients.The classifier was tested using a set of testing videos(testing corpora).The results showed 95.9%accuracy,4.1%were light cases and could not be detected correctly from the videos,half of them were false positive and the other half was false negative.
文摘Trials is a specialty of off-road cycling in which the rider has to face obstacle courses without resting feet on the ground. Technique in this sport has a great importance, since it reduces the risk of committing penalties and allows more efficient execution of the gesture. To improve technique, the motion analysis allows to study the gesture both qualitatively and quantitatively. In this work video analysis was used to study the side hop from rear wheel technique. Two different executions of this technique were analyzed. The primary purpose is the identification of the phases that make up the technical gesture. It was given an explanation to the movement strategies adopted in the execution of the jump in the two different situations.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.NRF-2022R1I1A1A01069526).
文摘Deep learning-based action classification technology has been applied to various fields,such as social safety,medical services,and sports.Analyzing an action on a practical level requires tracking multiple human bodies in an image in real-time and simultaneously classifying their actions.There are various related studies on the real-time classification of actions in an image.However,existing deep learning-based action classification models have prolonged response speeds,so there is a limit to real-time analysis.In addition,it has low accuracy of action of each object ifmultiple objects appear in the image.Also,it needs to be improved since it has a memory overhead in processing image data.Deep learning-based action classification using one-shot object detection is proposed to overcome the limitations of multiframe-based analysis technology.The proposed method uses a one-shot object detection model and a multi-object tracking algorithm to detect and track multiple objects in the image.Then,a deep learning-based pattern classification model is used to classify the body action of the object in the image by reducing the data for each object to an action vector.Compared to the existing studies,the constructed model shows higher accuracy of 74.95%,and in terms of speed,it offered better performance than the current studies at 0.234 s per frame.The proposed model makes it possible to classify some actions only through action vector learning without additional image learning because of the vector learning feature of the posterior neural network.Therefore,it is expected to contribute significantly to commercializing realistic streaming data analysis technologies,such as CCTV.
基金supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2020R1A6A1A03040583)supported by Kyonggi University’s Graduate Research Assistantship 2023.
文摘Artificial intelligence is increasingly being applied in the field of video analysis,particularly in the area of public safety where video surveillance equipment such as closed-circuit television(CCTV)is used and automated analysis of video information is required.However,various issues such as data size limitations and low processing speeds make real-time extraction of video data challenging.Video analysis technology applies object classification,detection,and relationship analysis to continuous 2D frame data,and the various meanings within the video are thus analyzed based on the extracted basic data.Motion recognition is key in this analysis.Motion recognition is a challenging field that analyzes human body movements,requiring the interpretation of complex movements of human joints and the relationships between various objects.The deep learning-based human skeleton detection algorithm is a representative motion recognition algorithm.Recently,motion analysis models such as the SlowFast network algorithm,have also been developed with excellent performance.However,these models do not operate properly in most wide-angle video environments outdoors,displaying low response speed,as expected from motion classification extraction in environments associated with high-resolution images.The proposed method achieves high level of extraction and accuracy by improving SlowFast’s input data preprocessing and data structure methods.The input data are preprocessed through object tracking and background removal using YOLO and DeepSORT.A higher performance than that of a single model is achieved by improving the existing SlowFast’s data structure into a frame unit structure.Based on the confusion matrix,accuracies of 70.16%and 70.74%were obtained for the existing SlowFast and proposed model,respectively,indicating a 0.58%increase in accuracy.Comparing detection,based on behavioral classification,the existing SlowFast detected 2,341,164 cases,whereas the proposed model detected 3,119,323 cases,which is an increase of 33.23%.
基金supported in part by the National Science Fund for Distinguished Young Scholars under grant no.61925112,in part by the National Natural Science Foundation of China under grant no.61806193 and grant no.61772510Support Program of Shaanxi under grant no.2020KJXX‐091in part by the Key Research Program of Frontier Sciences,Chinese Academy of Sciences under grant no.QYZDY‐SSW‐JSC044.
文摘Abnormal event detection aims to automatically identify unusual events that do not comply with expectation.Recently,many methods have been proposed to obtain the temporal locations of abnormal events under various determined thresholds.However,the specific categories of abnormal events are mostly neglect,which are important to help in monitoring agents to make decisions.In this study,a Temporal Attention Network(TANet)is proposed to capture both the specific categories and temporal locations of abnormal events in a weakly supervised manner.The TANet learns the anomaly score and specific category for each video segment with only video-level abnormal event labels.An event recognition module is exploited to predict the event scores for each video segment while a temporal attention module is proposed to learn a temporal attention value.Finally,to learn anomaly scores and specific categories,three constraints are considered:event category constraint,event separation constraint and temporal smoothness constraint.Experiments on the University of Central Florida Crime dataset demonstrate the effectiveness of the proposed method.
文摘We propose a mobile system,called PotholeEye+,for automatically monitoring the surface of a roadway and detecting the pavement distress in real-time through analysis of a video.PotholeEye+pre-processes the images,extracts features,and classifies the distress into a variety of types,while the road manager is driving.Every day for a year,we have tested PotholeEye+on real highway involving real settings,a camera,a mini computer,a GPS receiver,and so on.Consequently,PotholeEye+detected the pavement distress with accuracy of 92%,precision of 87%and recall 74%averagely during driving at an average speed of 110 km/h on a real highway.
基金The Norwegian Re-search Council is gratefully acknowledged for providing financial support for this research as part of the Robust Pig project.
文摘Avoiding lameness or leg weakness in pig production is crucial to reduce cost, improve animal welfare and meat quality. Detection of lameness detection by the use of vision systems may assist the farmer or breeder to obtain a more accurate and robust measurement of lameness. The paper presents a low-cost vision system for measuring the locomotion of moving pigs based on motion detection, frame-grabbing and multivariate image analysis. The first step is to set up a video system based on web camera technology and choose a test area. Secondly, a motion detection and data storage system are used to build a processing system of video data. The video data are analyzed measuring the properties of each image, stacking them for each animal and then analyze these stacks using multivariate image analysis. The system was able to obtain and decompose information from these stacks, where components could be extracted, representing a particular motion pattern. These components could be used to classify or score animals according to this pattern, which might be an indicator of lameness. However, further improvement is needed with respect to standardization of herding, test area and tracking of animals in order to have a robust system to be used in a farm environment.
基金The authors extend their appreciation to the Deputyship for Research and Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number QURDO001Project title:Intelligent Real-Time Crowd Monitoring System Using Unmanned Aerial Vehicle(UAV)Video and Global Positioning Systems(GPS)Data。
文摘The advent of the COVID-19 pandemic has adversely affected the entire world and has put forth high demand for techniques that remotely manage crowd-related tasks.Video surveillance and crowd management using video analysis techniques have significantly impacted today’s research,and numerous applications have been developed in this domain.This research proposed an anomaly detection technique applied to Umrah videos in Kaaba during the COVID-19 pandemic through sparse crowd analysis.Managing theKaaba rituals is crucial since the crowd gathers from around the world and requires proper analysis during these days of the pandemic.The Umrah videos are analyzed,and a system is devised that can track and monitor the crowd flow in Kaaba.The crowd in these videos is sparse due to the pandemic,and we have developed a technique to track the maximum crowd flow and detect any object(person)moving in the direction unlikely of the major flow.We have detected abnormal movement by creating the histograms for the vertical and horizontal flows and applying thresholds to identify the non-majority flow.Our algorithm aims to analyze the crowd through video surveillance and timely detect any abnormal activity tomaintain a smooth crowd flowinKaaba during the pandemic.
基金Supported by the National Natural Science Foundation of China (No. 60772134, 60902081, 60902052) the 111 Project (No.B08038) the Fundamental Research Funds for the Central Universities(No.72105457).
文摘A novel moving objects segmentation method is proposed in this paper. A modified three dimensional recursive search (3DRS) algorithm is used in order to obtain motion information accurately. A motion feature descriptor (MFD) is designed to describe motion feature of each block in a picture based on motion intensity, motion in occlusion areas, and motion correlation among neighbouring blocks. Then, a fuzzy C-means clustering algorithm (FCM) is implemented based on those MFDs so as to segment moving objects. Moreover, a new parameter named as gathering degree is used to distinguish foreground moving objects and background motion. Experimental results demonstrate the effectiveness of the proposed method.
基金Supported by the National Natural Science Foundation of China (No. 61072110)the Industrial Tackling Project of Shaanxi Province (2010K06-20)the Natural Science Foundation of Shaanxi Province (SJ08F15)
文摘Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.
基金supported by the National Key Technology R&D Program of China(No.2017YFD0701603)the Natural Science Foundation of China(No.60975007).
文摘To overcome the limitations of traditional dairy cow's rumination detection methods,a video-based analysis on the intelligent monitoring method of cow ruminant behavior was proposed in this study.The Mean Shift algorithm was used to track the jaw motion of dairy cows accurately.The centroid trajectory curve of the cow mouth motion was subsequently extracted from the video.In this way,the monitoring of the ruminant behavior of dairy cows was realized.To verify the accuracy of the method,six videos,a total of 99'00",24000 frames were selected.The test results demonstrated that the success rate of this method was 92.03%,despite the interference of behaviors,such as raising or turning of the cow’s head.The results demonstrate that this method,which monitors the ruminant behavior of dairy cows,is effective and feasible.
基金This work was supported by the National Key Research and Development Program of China(2017YFD0701603)Natural Science Foundation of China(61473235).
文摘In order to realize the automatic monitoring of ruminant activities of cows,an automatic detection method for the mouth area of ruminant cows based on machine vision technology was studied.Optical flow was used to calculate the relative motion speed of each pixel in the video frame images.The candidate mouth region with large motion ranges was extracted,and a series of processing methods,such as grayscale processing,threshold segmentation,pixel point expansion and adjacent region merging,were carried out to extract the real area of cows’mouth.To verify the accuracy of the proposed method,six videos with a total length of 96 min were selected for this research.The results showed that the highest accuracy was 87.80%,the average accuracy was 76.46%and the average running time of the algorithm was 6.39 s.All the results showed that this method can be used to detect the mouth area automatically,which lays the foundation for automatic monitoring of cows’ruminant behavior.
基金supported by National Natural Science Foundation of China(Nos.61976010,61802011)Beijing Postdoctoral Research Foundation(No.ZZ2019-63)+1 种基金Beijing excellent young talent cultivation project(No.2017000020124G075)“Ri xin”Training Programme Foundation for the Talents by Beijing University of Technology。
文摘Human group activity recognition(GAR)has attracted significant attention from computer vision researchers due to its wide practical applications in security surveillance,social role understanding and sports video analysis.In this paper,we give a comprehensive overview of the advances in group activity recognition in videos during the past 20 years.First,we provide a summary and comparison of 11 GAR video datasets in this field.Second,we survey the group activity recognition methods,including those based on handcrafted features and those based on deep learning networks.For better understanding of the pros and cons of these methods,we compare various models from the past to the present.Finally,we outline several challenging issues and possible directions for future research.From this comprehensive literature review,readers can obtain an overview of progress in group activity recognition for future studies.