In today’s information age,video data,as an important carrier of information,is growing explosively in terms of production volume.The quick and accurate extraction of useful information from massive video data has be...In today’s information age,video data,as an important carrier of information,is growing explosively in terms of production volume.The quick and accurate extraction of useful information from massive video data has become a focus of research in the field of computer vision.AI dynamic recognition technology has become one of the key technologies to address this issue due to its powerful data processing capabilities and intelligent recognition functions.Based on this,this paper first elaborates on the development of intelligent video AI dynamic recognition technology,then proposes several optimization strategies for intelligent video AI dynamic recognition technology,and finally analyzes the performance of intelligent video AI dynamic recognition technology for reference.展开更多
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid...Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.展开更多
In the realm of contemporary artificial intelligence,machine learning enables automation,allowing systems to naturally acquire and enhance their capabilities through learning.In this cycle,Video recommendation is fini...In the realm of contemporary artificial intelligence,machine learning enables automation,allowing systems to naturally acquire and enhance their capabilities through learning.In this cycle,Video recommendation is finished by utilizing machine learning strategies.A suggestion framework is an interaction of data sifting framework,which is utilized to foresee the“rating”or“inclination”given by the different clients.The expectation depends on past evaluations,history,interest,IMDB rating,and so on.This can be carried out by utilizing collective and substance-based separating approaches which utilize the data given by the different clients,examine them,and afterward suggest the video that suits the client at that specific time.The required datasets for the video are taken from Grouplens.This recommender framework is executed by utilizing Python Programming Language.For building this video recommender framework,two calculations are utilized,for example,K-implies Clustering and KNN grouping.K-implies is one of the unaided AI calculations and the fundamental goal is to bunch comparable sort of information focuses together and discover the examples.For that K-implies searches for a steady‘k'of bunches in a dataset.A group is an assortment of information focuses collected due to specific similitudes.K-Nearest Neighbor is an administered learning calculation utilized for characterization,with the given information;KNN can group new information by examination of the‘k'number of the closest information focuses.The last qualities acquired are through bunching qualities and root mean squared mistake,by using this algorithm we can recommend videos more appropriately based on user previous records and ratings.展开更多
Due to the increasing demand for developing a secure and smart living environment, the intelligent video surveillance technology has attracted considerable attention. Building an automatic, reliable, secure, and intel...Due to the increasing demand for developing a secure and smart living environment, the intelligent video surveillance technology has attracted considerable attention. Building an automatic, reliable, secure, and intelligent video surveillance system has spawned large research projects and triggered many popular research topics in several international conferences and workshops recently. This special issue of Journal of ElecWonic Science and Technology (JEST) aims to present recent advances in video surveillance systems which address the observation of people in an environment, leading to a real-time description of their actions and interactions.展开更多
Generating ground truth data for developing object detection algorithms of intelligent surveillance systems is a considerably important yet time-consuming task; therefore, a user-friendly tool to annotate videos effic...Generating ground truth data for developing object detection algorithms of intelligent surveillance systems is a considerably important yet time-consuming task; therefore, a user-friendly tool to annotate videos efficiently and accurately is required. In this paper, the development of a semi-automatic video annotation tool is described. For efficiency, the developed tool can automatically generate the initial annotation data for the input videos utilizing automatic object detection modules, which are developed independently and registered in the tool. To guarantee the accuracy of the ground truth data, the system also has several user-friendly functions to help users check and edit the initial annotation data generated by the automatic object detection modules. According to the experiment's results, employing the developed annotation tool is considerably beneficial for reducing annotation time; when compared to manual annotation schemes, using the tool resulted in an annotation time reduction of up to 2.3 times.展开更多
In this paper, we conduct research on the big data and the artificial intelligence aided decision-making mechanism with the applications on video website homemade program innovation. Make homemade video shows new medi...In this paper, we conduct research on the big data and the artificial intelligence aided decision-making mechanism with the applications on video website homemade program innovation. Make homemade video shows new media platform site content production with new possible, as also make the traditional media found in Internet age, the breakthrough point of the times. Site homemade video program, which is beneficial to reduce copyright purchase demand, reduce the cost, avoid the homogeneity competition, rich advertising marketing at the same time, improve the profit pattern, the organic combination of content production and operation, complete the strategic transformation. On the basis of these advantages, once the site of homemade video program to form a brand and a higher brand influence. Our later research provides the literature survey for the related issues.展开更多
This paper describes the structure, geometric model and geometric calibrationof Photogrammetron Ⅰ - the first type of photogrammetron which is designed to be a coherent stereophotogrammetric system in which two camer...This paper describes the structure, geometric model and geometric calibrationof Photogrammetron Ⅰ - the first type of photogrammetron which is designed to be a coherent stereophotogrammetric system in which two cameras are mounted on a physical base but driven by anintelligent agent architecture. The system calibration is divided into two parts: the in-labcalibration determines the fixed parameters in advance of system operation, and the in-situcalibration keeps tracking the free parameters in real-time during the system operation. In a videosurveillance set-up, prepared control points are tracked in stereo image sequences, so that the freeparameters of the system can be continuously updated through iterative bundle adjustment and Kalmanfiltering.展开更多
Awide range of camera apps and online video conferencing services support the feature of changing the background in real-time for aesthetic,privacy,and security reasons.Numerous studies show that theDeep-Learning(DL)i...Awide range of camera apps and online video conferencing services support the feature of changing the background in real-time for aesthetic,privacy,and security reasons.Numerous studies show that theDeep-Learning(DL)is a suitable option for human segmentation,and the ensemble of multiple DL-based segmentation models can improve the segmentation result.However,these approaches are not as effective when directly applied to the image segmentation in a video.This paper proposes an Adaptive N-Frames Ensemble(AFE)approach for high-movement human segmentation in a video using an ensemble of multiple DL models.In contrast to an ensemble,which executes multiple DL models simultaneously for every single video frame,the proposed AFE approach executes only a single DL model upon a current video frame.It combines the segmentation outputs of previous frames for the final segmentation output when the frame difference is less than a particular threshold.Our method employs the idea of the N-Frames Ensemble(NFE)method,which uses the ensemble of the image segmentation of a current video frame and previous video frames.However,NFE is not suitable for the segmentation of fast-moving objects in a video nor a video with low frame rates.The proposed AFE approach addresses the limitations of the NFE method.Our experiment uses three human segmentation models,namely Fully Convolutional Network(FCN),DeepLabv3,and Mediapipe.We evaluated our approach using 1711 videos of the TikTok50f dataset with a single-person view.The TikTok50f dataset is a reconstructed version of the publicly available TikTok dataset by cropping,resizing and dividing it into videos having 50 frames each.This paper compares the proposed AFE with single models and the Two-Models Ensemble,as well as the NFE models.The experiment results show that the proposed AFE is suitable for low-movement as well as high-movement human segmentation in a video.展开更多
Gastrointestinal angiodysplasia(GIAD)is defined as the pathological process where blood vessels,typically venules and capillaries,become engorged,tortuous and thin walled–which then form arteriovenous connections wit...Gastrointestinal angiodysplasia(GIAD)is defined as the pathological process where blood vessels,typically venules and capillaries,become engorged,tortuous and thin walled–which then form arteriovenous connections within the mucosal and submucosal layers of the gastrointestinal tract.GIADs are a significant cause of gastrointestinal bleeding and are the main cause for suspected small bowel bleeding.To make the diagnosis,gastroenterologists rely on the use of video capsule endoscopy(VCE)to“target”GIAD.However,the use of VCE can be cumbersome secondary to reader fatigue,suboptimal preparation,and difficulty in distinguishing images.The human eye is imperfect.The same capsule study read by two different readers are noted to have miss rates like other forms of endoscopy.Artificial intelligence(AI)has been a means to bridge the gap between human imperfection and recognition of GIAD.The use of AI in VCE have shown that detection has improved,however the other burdens and limitations still need to be addressed.The use of AI for the diagnosis of GIAD shows promise and the changes needed to enhance the current practice of VCE are near.展开更多
Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opini...Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opinion. However, despite deepfake technology’s risks, current deepfake detection methods lack generalization and are inconsistent when applied to unknown videos, i.e., videos on which they have not been trained. The purpose of this study is to develop a generalizable deepfake detection model by training convoluted neural networks (CNNs) to classify human facial features in videos. The study formulated the research questions: “How effectively does the developed model provide reliable generalizations?” A CNN model was trained to distinguish between real and fake videos using the facial features of human subjects in videos. The model was trained, validated, and tested using the FaceForensiq++ dataset, which contains more than 500,000 frames and subsets of the DFDC dataset, totaling more than 22,000 videos. The study demonstrated high generalizability, as the accuracy of the unknown dataset was only marginally (about 1%) lower than that of the known dataset. The findings of this study indicate that detection systems can be more generalizable, lighter, and faster by focusing on just a small region (the human face) of an entire video.展开更多
针对智能交通管理设备本身缺乏安全监管,传统视频监控延迟高、画质低、稳定性差的问题,提出一种基于FFmpeg的多线程编码视频流传输方案。通过FFmpeg调用h264_nvenc编码器,实现宏块行级的GPU多线程加速,降低编码延迟。使用Visual Studio ...针对智能交通管理设备本身缺乏安全监管,传统视频监控延迟高、画质低、稳定性差的问题,提出一种基于FFmpeg的多线程编码视频流传输方案。通过FFmpeg调用h264_nvenc编码器,实现宏块行级的GPU多线程加速,降低编码延迟。使用Visual Studio 2019和QT15.5开发基于FFmpeg的音视频处理软件,对多路视频流进行封装、推流,并搭建Nginx流媒体服务器进行分发。通过实验表明,该系统整体的传输延迟低于1 s,且拥有良好的率失真特性,监控画面清晰、稳定性高,实现了对交通管理设备实时稳定的安全监控。展开更多
文摘In today’s information age,video data,as an important carrier of information,is growing explosively in terms of production volume.The quick and accurate extraction of useful information from massive video data has become a focus of research in the field of computer vision.AI dynamic recognition technology has become one of the key technologies to address this issue due to its powerful data processing capabilities and intelligent recognition functions.Based on this,this paper first elaborates on the development of intelligent video AI dynamic recognition technology,then proposes several optimization strategies for intelligent video AI dynamic recognition technology,and finally analyzes the performance of intelligent video AI dynamic recognition technology for reference.
基金This work is supported in part by the National Natural Science Foundation of China(Grant Number 61971078)which provided domain expertise and computational power that greatly assisted the activity+1 种基金This work was financially supported by Chongqing Municipal Education Commission Grants forMajor Science and Technology Project(KJZD-M202301901)the Science and Technology Research Project of Jiangxi Department of Education(GJJ2201049).
文摘Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.
文摘In the realm of contemporary artificial intelligence,machine learning enables automation,allowing systems to naturally acquire and enhance their capabilities through learning.In this cycle,Video recommendation is finished by utilizing machine learning strategies.A suggestion framework is an interaction of data sifting framework,which is utilized to foresee the“rating”or“inclination”given by the different clients.The expectation depends on past evaluations,history,interest,IMDB rating,and so on.This can be carried out by utilizing collective and substance-based separating approaches which utilize the data given by the different clients,examine them,and afterward suggest the video that suits the client at that specific time.The required datasets for the video are taken from Grouplens.This recommender framework is executed by utilizing Python Programming Language.For building this video recommender framework,two calculations are utilized,for example,K-implies Clustering and KNN grouping.K-implies is one of the unaided AI calculations and the fundamental goal is to bunch comparable sort of information focuses together and discover the examples.For that K-implies searches for a steady‘k'of bunches in a dataset.A group is an assortment of information focuses collected due to specific similitudes.K-Nearest Neighbor is an administered learning calculation utilized for characterization,with the given information;KNN can group new information by examination of the‘k'number of the closest information focuses.The last qualities acquired are through bunching qualities and root mean squared mistake,by using this algorithm we can recommend videos more appropriately based on user previous records and ratings.
文摘Due to the increasing demand for developing a secure and smart living environment, the intelligent video surveillance technology has attracted considerable attention. Building an automatic, reliable, secure, and intelligent video surveillance system has spawned large research projects and triggered many popular research topics in several international conferences and workshops recently. This special issue of Journal of ElecWonic Science and Technology (JEST) aims to present recent advances in video surveillance systems which address the observation of people in an environment, leading to a real-time description of their actions and interactions.
文摘Generating ground truth data for developing object detection algorithms of intelligent surveillance systems is a considerably important yet time-consuming task; therefore, a user-friendly tool to annotate videos efficiently and accurately is required. In this paper, the development of a semi-automatic video annotation tool is described. For efficiency, the developed tool can automatically generate the initial annotation data for the input videos utilizing automatic object detection modules, which are developed independently and registered in the tool. To guarantee the accuracy of the ground truth data, the system also has several user-friendly functions to help users check and edit the initial annotation data generated by the automatic object detection modules. According to the experiment's results, employing the developed annotation tool is considerably beneficial for reducing annotation time; when compared to manual annotation schemes, using the tool resulted in an annotation time reduction of up to 2.3 times.
文摘In this paper, we conduct research on the big data and the artificial intelligence aided decision-making mechanism with the applications on video website homemade program innovation. Make homemade video shows new media platform site content production with new possible, as also make the traditional media found in Internet age, the breakthrough point of the times. Site homemade video program, which is beneficial to reduce copyright purchase demand, reduce the cost, avoid the homogeneity competition, rich advertising marketing at the same time, improve the profit pattern, the organic combination of content production and operation, complete the strategic transformation. On the basis of these advantages, once the site of homemade video program to form a brand and a higher brand influence. Our later research provides the literature survey for the related issues.
基金ProjectsupportedbytheNationalNaturalScienceFoundation (No .40 1 71 0 80 ) .
文摘This paper describes the structure, geometric model and geometric calibrationof Photogrammetron Ⅰ - the first type of photogrammetron which is designed to be a coherent stereophotogrammetric system in which two cameras are mounted on a physical base but driven by anintelligent agent architecture. The system calibration is divided into two parts: the in-labcalibration determines the fixed parameters in advance of system operation, and the in-situcalibration keeps tracking the free parameters in real-time during the system operation. In a videosurveillance set-up, prepared control points are tracked in stereo image sequences, so that the freeparameters of the system can be continuously updated through iterative bundle adjustment and Kalmanfiltering.
基金This research was financially supported by the Ministry of Small and Medium-sized Enterprises(SMEs)and Startups(MSS)Korea,under the“Regional Specialized Industry Development Program(R&D,S3091627)”supervised by the Korea Institute for Advancement of Technology(KIAT).
文摘Awide range of camera apps and online video conferencing services support the feature of changing the background in real-time for aesthetic,privacy,and security reasons.Numerous studies show that theDeep-Learning(DL)is a suitable option for human segmentation,and the ensemble of multiple DL-based segmentation models can improve the segmentation result.However,these approaches are not as effective when directly applied to the image segmentation in a video.This paper proposes an Adaptive N-Frames Ensemble(AFE)approach for high-movement human segmentation in a video using an ensemble of multiple DL models.In contrast to an ensemble,which executes multiple DL models simultaneously for every single video frame,the proposed AFE approach executes only a single DL model upon a current video frame.It combines the segmentation outputs of previous frames for the final segmentation output when the frame difference is less than a particular threshold.Our method employs the idea of the N-Frames Ensemble(NFE)method,which uses the ensemble of the image segmentation of a current video frame and previous video frames.However,NFE is not suitable for the segmentation of fast-moving objects in a video nor a video with low frame rates.The proposed AFE approach addresses the limitations of the NFE method.Our experiment uses three human segmentation models,namely Fully Convolutional Network(FCN),DeepLabv3,and Mediapipe.We evaluated our approach using 1711 videos of the TikTok50f dataset with a single-person view.The TikTok50f dataset is a reconstructed version of the publicly available TikTok dataset by cropping,resizing and dividing it into videos having 50 frames each.This paper compares the proposed AFE with single models and the Two-Models Ensemble,as well as the NFE models.The experiment results show that the proposed AFE is suitable for low-movement as well as high-movement human segmentation in a video.
文摘Gastrointestinal angiodysplasia(GIAD)is defined as the pathological process where blood vessels,typically venules and capillaries,become engorged,tortuous and thin walled–which then form arteriovenous connections within the mucosal and submucosal layers of the gastrointestinal tract.GIADs are a significant cause of gastrointestinal bleeding and are the main cause for suspected small bowel bleeding.To make the diagnosis,gastroenterologists rely on the use of video capsule endoscopy(VCE)to“target”GIAD.However,the use of VCE can be cumbersome secondary to reader fatigue,suboptimal preparation,and difficulty in distinguishing images.The human eye is imperfect.The same capsule study read by two different readers are noted to have miss rates like other forms of endoscopy.Artificial intelligence(AI)has been a means to bridge the gap between human imperfection and recognition of GIAD.The use of AI in VCE have shown that detection has improved,however the other burdens and limitations still need to be addressed.The use of AI for the diagnosis of GIAD shows promise and the changes needed to enhance the current practice of VCE are near.
文摘Deepfake technology can be used to replace people’s faces in videos or pictures to show them saying or doing things they never said or did. Deepfake media are often used to extort, defame, and manipulate public opinion. However, despite deepfake technology’s risks, current deepfake detection methods lack generalization and are inconsistent when applied to unknown videos, i.e., videos on which they have not been trained. The purpose of this study is to develop a generalizable deepfake detection model by training convoluted neural networks (CNNs) to classify human facial features in videos. The study formulated the research questions: “How effectively does the developed model provide reliable generalizations?” A CNN model was trained to distinguish between real and fake videos using the facial features of human subjects in videos. The model was trained, validated, and tested using the FaceForensiq++ dataset, which contains more than 500,000 frames and subsets of the DFDC dataset, totaling more than 22,000 videos. The study demonstrated high generalizability, as the accuracy of the unknown dataset was only marginally (about 1%) lower than that of the known dataset. The findings of this study indicate that detection systems can be more generalizable, lighter, and faster by focusing on just a small region (the human face) of an entire video.
文摘针对智能交通管理设备本身缺乏安全监管,传统视频监控延迟高、画质低、稳定性差的问题,提出一种基于FFmpeg的多线程编码视频流传输方案。通过FFmpeg调用h264_nvenc编码器,实现宏块行级的GPU多线程加速,降低编码延迟。使用Visual Studio 2019和QT15.5开发基于FFmpeg的音视频处理软件,对多路视频流进行封装、推流,并搭建Nginx流媒体服务器进行分发。通过实验表明,该系统整体的传输延迟低于1 s,且拥有良好的率失真特性,监控画面清晰、稳定性高,实现了对交通管理设备实时稳定的安全监控。