期刊文献+
共找到42,958篇文章
< 1 2 250 >
每页显示 20 50 100
从Make-A-Video到Sora:AI视频生成技术的进步与挑战
1
作者 郑凯 王菂 袁堂青 《科技视界》 2024年第4期74-77,共4页
随着人工智能技术的飞速发展,AI视频生成技术已成为研究和应用的热点。从Meta的MakeA-Video, Runway AI的Runway Gen-2,Stability AI的Stable Video Diffusion,到Google的Lumiere,再到OpenAI的Sora,每一个模型的推出都不仅代表了AI视频... 随着人工智能技术的飞速发展,AI视频生成技术已成为研究和应用的热点。从Meta的MakeA-Video, Runway AI的Runway Gen-2,Stability AI的Stable Video Diffusion,到Google的Lumiere,再到OpenAI的Sora,每一个模型的推出都不仅代表了AI视频生成技术的进步,也带来了新的挑战。回顾了这些关键AI视频模型的原理和特点,并对比它们之间的优势和不足,探讨了AI视频生成技术面临的主要挑战,展望了未来的发展方向。 展开更多
关键词 AI Make-A-video Runway Gen-2 Stable video Diffusion Lumiere SORA
下载PDF
Customized Convolutional Neural Network for Accurate Detection of Deep Fake Images in Video Collections 被引量:1
2
作者 Dmitry Gura Bo Dong +1 位作者 Duaa Mehiar Nidal Al Said 《Computers, Materials & Continua》 SCIE EI 2024年第5期1995-2014,共20页
The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method in... The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method involves extracting structured data from video frames using facial landmark detection,which is then used as input to the CNN.The customized Convolutional Neural Network method is the date augmented-based CNN model to generate‘fake data’or‘fake images’.This study was carried out using Python and its libraries.We used 242 films from the dataset gathered by the Deep Fake Detection Challenge,of which 199 were made up and the remaining 53 were real.Ten seconds were allotted for each video.There were 318 videos used in all,199 of which were fake and 119 of which were real.Our proposedmethod achieved a testing accuracy of 91.47%,loss of 0.342,and AUC score of 0.92,outperforming two alternative approaches,CNN and MLP-CNN.Furthermore,our method succeeded in greater accuracy than contemporary models such as XceptionNet,Meso-4,EfficientNet-BO,MesoInception-4,VGG-16,and DST-Net.The novelty of this investigation is the development of a new Convolutional Neural Network(CNN)learning model that can accurately detect deep fake face photos. 展开更多
关键词 Deep fake detection video analysis convolutional neural network machine learning video dataset collection facial landmark prediction accuracy models
下载PDF
A HEVC Video Steganalysis Method Using the Optimality of Motion Vector Prediction
3
作者 Jun Li Minqing Zhang +2 位作者 Ke Niu Yingnan Zhang Xiaoyuan Yang 《Computers, Materials & Continua》 SCIE EI 2024年第5期2085-2103,共19页
Among steganalysis techniques,detection against MV(motion vector)domain-based video steganography in the HEVC(High Efficiency Video Coding)standard remains a challenging issue.For the purpose of improving the detectio... Among steganalysis techniques,detection against MV(motion vector)domain-based video steganography in the HEVC(High Efficiency Video Coding)standard remains a challenging issue.For the purpose of improving the detection performance,this paper proposes a steganalysis method that can perfectly detectMV-based steganography in HEVC.Firstly,we define the local optimality of MVP(Motion Vector Prediction)based on the technology of AMVP(Advanced Motion Vector Prediction).Secondly,we analyze that in HEVC video,message embedding either usingMVP index orMVD(Motion Vector Difference)may destroy the above optimality of MVP.And then,we define the optimal rate of MVP as a steganalysis feature.Finally,we conduct steganalysis detection experiments on two general datasets for three popular steganographymethods and compare the performance with four state-ofthe-art steganalysis methods.The experimental results demonstrate the effectiveness of the proposed feature set.Furthermore,our method stands out for its practical applicability,requiring no model training and exhibiting low computational complexity,making it a viable solution for real-world scenarios. 展开更多
关键词 video steganography video steganalysis motion vector prediction motion vector difference advanced motion vector prediction local optimality
下载PDF
Trends in Event Understanding and Caption Generation/Reconstruction in Dense Video:A Review
4
作者 Ekanayake Mudiyanselage Chulabhaya Lankanatha Ekanayake Abubakar Sulaiman Gezawa Yunqi Lei 《Computers, Materials & Continua》 SCIE EI 2024年第3期2941-2965,共25页
Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It... Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It is also playing an essential role in devolving human-robot interaction.The dense video description is more difficult when compared with simple Video captioning because of the object’s interactions and event overlapping.Deep learning is changing the shape of computer vision(CV)technologies and natural language processing(NLP).There are hundreds of deep learning models,datasets,and evaluations that can improve the gaps in current research.This article filled this gap by evaluating some state-of-the-art approaches,especially focusing on deep learning and machine learning for video caption in a dense environment.In this article,some classic techniques concerning the existing machine learning were reviewed.And provides deep learning models,a detail of benchmark datasets with their respective domains.This paper reviews various evaluation metrics,including Bilingual EvaluationUnderstudy(BLEU),Metric for Evaluation of Translation with Explicit Ordering(METEOR),WordMover’s Distance(WMD),and Recall-Oriented Understudy for Gisting Evaluation(ROUGE)with their pros and cons.Finally,this article listed some future directions and proposed work for context enhancement using key scene extraction with object detection in a particular frame.Especially,how to improve the context of video description by analyzing key frames detection through morphological image analysis.Additionally,the paper discusses a novel approach involving sentence reconstruction and context improvement through key frame object detection,which incorporates the fusion of large languagemodels for refining results.The ultimate results arise fromenhancing the generated text of the proposedmodel by improving the predicted text and isolating objects using various keyframes.These keyframes identify dense events occurring in the video sequence. 展开更多
关键词 video description video to text video caption sentence reconstruction
下载PDF
Workout Action Recognition in Video Streams Using an Attention Driven Residual DC-GRU Network
5
作者 Arnab Dey Samit Biswas Dac-Nhuong Le 《Computers, Materials & Continua》 SCIE EI 2024年第5期3067-3087,共21页
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions i... Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd dataset serves as a catalyst for the development ofmore robust and effective fitnesstracking systems and ultimately promotes healthier lifestyles through improved exercise monitoring and analysis. 展开更多
关键词 Workout action recognition video stream action recognition residual network GRU ATTENTION
下载PDF
A Hybrid Machine Learning Approach for Improvised QoE in Video Services over 5G Wireless Networks
6
作者 K.B.Ajeyprasaath P.Vetrivelan 《Computers, Materials & Continua》 SCIE EI 2024年第3期3195-3213,共19页
Video streaming applications have grown considerably in recent years.As a result,this becomes one of the most significant contributors to global internet traffic.According to recent studies,the telecommunications indu... Video streaming applications have grown considerably in recent years.As a result,this becomes one of the most significant contributors to global internet traffic.According to recent studies,the telecommunications industry loses millions of dollars due to poor video Quality of Experience(QoE)for users.Among the standard proposals for standardizing the quality of video streaming over internet service providers(ISPs)is the Mean Opinion Score(MOS).However,the accurate finding of QoE by MOS is subjective and laborious,and it varies depending on the user.A fully automated data analytics framework is required to reduce the inter-operator variability characteristic in QoE assessment.This work addresses this concern by suggesting a novel hybrid XGBStackQoE analytical model using a two-level layering technique.Level one combines multiple Machine Learning(ML)models via a layer one Hybrid XGBStackQoE-model.Individual ML models at level one are trained using the entire training data set.The level two Hybrid XGBStackQoE-Model is fitted using the outputs(meta-features)of the layer one ML models.The proposed model outperformed the conventional models,with an accuracy improvement of 4 to 5 percent,which is still higher than the current traditional models.The proposed framework could significantly improve video QoE accuracy. 展开更多
关键词 Hybrid XGBStackQoE-model machine learning MOS performance metrics QOE 5G video services
下载PDF
Improving Video Watermarking through Galois Field GF(2^(4)) Multiplication Tables with Diverse Irreducible Polynomials and Adaptive Techniques
7
作者 Yasmin Alaa Hassan Abdul Monem S.Rahma 《Computers, Materials & Continua》 SCIE EI 2024年第1期1423-1442,共20页
Video watermarking plays a crucial role in protecting intellectual property rights and ensuring content authenticity.This study delves into the integration of Galois Field(GF)multiplication tables,especially GF(2^(4))... Video watermarking plays a crucial role in protecting intellectual property rights and ensuring content authenticity.This study delves into the integration of Galois Field(GF)multiplication tables,especially GF(2^(4)),and their interaction with distinct irreducible polynomials.The primary aim is to enhance watermarking techniques for achieving imperceptibility,robustness,and efficient execution time.The research employs scene selection and adaptive thresholding techniques to streamline the watermarking process.Scene selection is used strategically to embed watermarks in the most vital frames of the video,while adaptive thresholding methods ensure that the watermarking process adheres to imperceptibility criteria,maintaining the video's visual quality.Concurrently,careful consideration is given to execution time,crucial in real-world scenarios,to balance efficiency and efficacy.The Peak Signal-to-Noise Ratio(PSNR)serves as a pivotal metric to gauge the watermark's imperceptibility and video quality.The study explores various irreducible polynomials,navigating the trade-offs between computational efficiency and watermark imperceptibility.In parallel,the study pays careful attention to the execution time,a paramount consideration in real-world scenarios,to strike a balance between efficiency and efficacy.This comprehensive analysis provides valuable insights into the interplay of GF multiplication tables,diverse irreducible polynomials,scene selection,adaptive thresholding,imperceptibility,and execution time.The evaluation of the proposed algorithm's robustness was conducted using PSNR and NC metrics,and it was subjected to assessment under the impact of five distinct attack scenarios.These findings contribute to the development of watermarking strategies that balance imperceptibility,robustness,and processing efficiency,enhancing the field's practicality and effectiveness. 展开更多
关键词 video watermarking galois field irreducible polynomial multiplication table scene selection adaptive thresholding
下载PDF
Video Recommendation System Using Machine-Learning Techniques
8
作者 Meesala Sravani Ch Vidyadhari S Anjali Devi 《Journal of Harbin Institute of Technology(New Series)》 CAS 2024年第4期24-33,共10页
In the realm of contemporary artificial intelligence,machine learning enables automation,allowing systems to naturally acquire and enhance their capabilities through learning.In this cycle,Video recommendation is fini... In the realm of contemporary artificial intelligence,machine learning enables automation,allowing systems to naturally acquire and enhance their capabilities through learning.In this cycle,Video recommendation is finished by utilizing machine learning strategies.A suggestion framework is an interaction of data sifting framework,which is utilized to foresee the“rating”or“inclination”given by the different clients.The expectation depends on past evaluations,history,interest,IMDB rating,and so on.This can be carried out by utilizing collective and substance-based separating approaches which utilize the data given by the different clients,examine them,and afterward suggest the video that suits the client at that specific time.The required datasets for the video are taken from Grouplens.This recommender framework is executed by utilizing Python Programming Language.For building this video recommender framework,two calculations are utilized,for example,K-implies Clustering and KNN grouping.K-implies is one of the unaided AI calculations and the fundamental goal is to bunch comparable sort of information focuses together and discover the examples.For that K-implies searches for a steady‘k'of bunches in a dataset.A group is an assortment of information focuses collected due to specific similitudes.K-Nearest Neighbor is an administered learning calculation utilized for characterization,with the given information;KNN can group new information by examination of the‘k'number of the closest information focuses.The last qualities acquired are through bunching qualities and root mean squared mistake,by using this algorithm we can recommend videos more appropriately based on user previous records and ratings. 展开更多
关键词 video recommendation system KNN algorithms collaborative filtering contentbased filtering classification algorithms artificial intelligence
下载PDF
Pulse rate estimation based on facial videos:an evaluation and optimization of the classical methods using both self-constructed and public datasets 被引量:1
9
作者 Chao-Yong Wu Jian-Xin Chen +3 位作者 Yu Chen Ai-Ping Chen Lu Zhou Xu Wang 《Traditional Medicine Research》 2024年第1期14-22,共9页
Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate b... Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate based on facial video is an exciting research field for getting palpation information by observation diagnosis.However,most studies focus on optimizing the algorithm based on a small sample of participants without systematically investigating multiple influencing factors.A total of 209 participants and 2,435 facial videos,based on our self-constructed Multi-Scene Sign Dataset and the public datasets,were used to perform a multi-level and multi-factor comprehensive comparison.The effects of different datasets,blood volume pulse signal extraction algorithms,region of interests,time windows,color spaces,pulse rate calculation methods,and video recording scenes were analyzed.Furthermore,we proposed a blood volume pulse signal quality optimization strategy based on the inverse Fourier transform and an improvement strategy for pulse rate estimation based on signal-to-noise ratio threshold sliding.We found that the effects of video estimation of pulse rate in the Multi-Scene Sign Dataset and Pulse Rate Detection Dataset were better than in other datasets.Compared with Fast independent component analysis and Single Channel algorithms,chrominance-based method and plane-orthogonal-to-skin algorithms have a more vital anti-interference ability and higher robustness.The performances of the five-organs fusion area and the full-face area were better than that of single sub-regions,and the fewer motion artifacts and better lighting can improve the precision of pulse rate estimation. 展开更多
关键词 pulse rate heart rate PHOTOPLETHYSMOGRAPHY observation and pulse diagnosis facial videos
下载PDF
Rate-distortion optimized frame dropping and scheduling for multi-user conversational and streaming video 被引量:1
10
作者 CHAKARESKI Jacob STEINBACH Eckehard 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2006年第5期864-872,共9页
We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the fo... We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the forwarding resources at a network node. Two buffers are setup on the node to temporarily store the packets for these two types of video applications. For streaming video, a big buffer is used as the associated delay constraint of the application is moderate and a very small buffer is used for conversational video to ensure that the forwarding delay of every packet is limited. A scheduler is located behind these two buffers that dynamically assigns transmission slots on the outgoing link to the two buffers. Rate-distortion side information is used to perform RD-optimized frame dropping in case of node overload. Sharing the data rate on the outgoing link between the con- versational and the streaming videos is done either based on the fullness of the two associated buffers or on the mean incoming rates of the respective videos. Simulation results showed that our proposed RD-optimized frame dropping and scheduling ap- proach provides significant improvements in performance over the popular priority-based random dropping (PRD) technique. 展开更多
关键词 RATE-DISTORTION optimization video FRAME dropping CONVERSATIONAL video Streaming video Distortion matrix Hinttracks Scheduling Resource assignment
下载PDF
Video Summarization Approach Based on Binary Robust Invariant Scalable Keypoints and Bisecting K-Means
11
作者 Sameh Zarif Eman Morad +3 位作者 Khalid Amin Abdullah Alharbi Wail S.Elkilani Shouze Tang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3565-3583,共19页
Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract ... Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract of the entire video that includes the most representative frames is known as static video summarization.This method resulted in rapid exploration,indexing,and retrieval of massive video libraries.We propose a framework for static video summary based on a Binary Robust Invariant Scalable Keypoint(BRISK)and bisecting K-means clustering algorithm.The current method effectively recognizes relevant frames using BRISK by extracting keypoints and the descriptors from video sequences.The video frames’BRISK features are clustered using a bisecting K-means,and the keyframe is determined by selecting the frame that is most near the cluster center.Without applying any clustering parameters,the appropriate clusters number is determined using the silhouette coefficient.Experiments were carried out on a publicly available open video project(OVP)dataset that contained videos of different genres.The proposed method’s effectiveness is compared to existing methods using a variety of evaluation metrics,and the proposed method achieves a trade-off between computational cost and quality. 展开更多
关键词 BRISK bisecting K-mean video summarization keyframe extraction shot detection
下载PDF
A Video Captioning Method by Semantic Topic-Guided Generation
12
作者 Ou Ye Xinli Wei +2 位作者 Zhenhua Yu Yan Fu Ying Yang 《Computers, Materials & Continua》 SCIE EI 2024年第1期1071-1093,共23页
In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is de... In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits. 展开更多
关键词 video captioning encoder-decoder semantic topic jointly decoding Enhance-TopK sampling
下载PDF
Adaptive Graph Convolutional Adjacency Matrix Network for Video Summarization
13
作者 Jing Zhang Guangli Wu Shanshan Song 《Computers, Materials & Continua》 SCIE EI 2024年第8期1947-1965,共19页
Video summarization aims to select key frames or key shots to create summaries for fast retrieval,compression,and efficient browsing of videos.Graph neural networks efficiently capture information about graph nodes an... Video summarization aims to select key frames or key shots to create summaries for fast retrieval,compression,and efficient browsing of videos.Graph neural networks efficiently capture information about graph nodes and their neighbors,but ignore the dynamic dependencies between nodes.To address this challenge,we propose an innovative Adaptive Graph Convolutional Adjacency Matrix Network(TAMGCN),leveraging the attention mechanism to dynamically adjust dependencies between graph nodes.Specifically,we first segment shots and extract features of each frame,then compute the representative features of each shot.Subsequently,we utilize the attention mechanism to dynamically adjust the adjacency matrix of the graph convolutional network to better capture the dynamic dependencies between graph nodes.Finally,we fuse temporal features extracted by Bi-directional Long Short-Term Memory network with structural features extracted by the graph convolutional network to generate high-quality summaries.Extensive experiments are conducted on two benchmark datasets,TVSum and SumMe,yielding F1-scores of 60.8%and 53.2%,respectively.Experimental results demonstrate that our method outperforms most state-of-the-art video summarization techniques. 展开更多
关键词 Attention mechanism deep learning graph neural network key-shot video summarization
下载PDF
Exploring Frontier Technologies in Video-Based Person Re-Identification:A Survey on Deep Learning Approach
14
作者 Jiahe Wang Xizhan Gao +1 位作者 Fa Zhu Xingchi Chen 《Computers, Materials & Continua》 SCIE EI 2024年第10期25-51,共27页
Video-based person re-identification(Re-ID),a subset of retrieval tasks,faces challenges like uncoordinated sample capturing,viewpoint variations,occlusions,cluttered backgrounds,and sequence uncertainties.Recent adva... Video-based person re-identification(Re-ID),a subset of retrieval tasks,faces challenges like uncoordinated sample capturing,viewpoint variations,occlusions,cluttered backgrounds,and sequence uncertainties.Recent advancements in deep learning have significantly improved video-based person Re-ID,laying a solid foundation for further progress in the field.In order to enrich researchers’insights into the latest research findings and prospective developments,we offer an extensive overview and meticulous analysis of contemporary video-based person ReID methodologies,with a specific emphasis on network architecture design and loss function design.Firstly,we introduce methods based on network architecture design and loss function design from multiple perspectives,and analyzes the advantages and disadvantages of these methods.Furthermore,we provide a synthesis of prevalent datasets and key evaluation metrics utilized within this field to assist researchers in assessing methodological efficacy and establishing benchmarks for performance evaluation.Lastly,through a critical evaluation of the experimental outcomes derived from various methodologies across four prominent public datasets,we identify promising research avenues and offer valuable insights to steer future exploration and innovation in this vibrant and evolving field of video-based person Re-ID.This comprehensive analysis aims to equip researchers with the necessary knowledge and strategic foresight to navigate the complexities of video-based person Re-ID,fostering continued progress and breakthroughs in this challenging yet promising research domain. 展开更多
关键词 video-based person Re-ID deep learning survey of video Re-ID loss function
下载PDF
CVTD: A Robust Car-Mounted Video Text Detector
15
作者 Di Zhou Jianxun Zhang +2 位作者 Chao Li Yifan Guo Bowen Li 《Computers, Materials & Continua》 SCIE EI 2024年第2期1821-1842,共22页
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid... Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios. 展开更多
关键词 Deep learning text detection Car-mounted video text detector intelligent driving assistance arbitrary shape text detector
下载PDF
How Rustic Videos Influence Youth’s Linguistic Expression: An Analysis Based on Semiotics, Meme Theory and Media Spectacle Theory
16
作者 Jiaxin Yang 《Open Journal of Applied Sciences》 2024年第7期1648-1665,共18页
The surge in popularity of rustic videos has spawned a great number of Internet memes, such as Internet trendy words growing from dialects and strange pronunciations, picture memes made from video screenshots, and mes... The surge in popularity of rustic videos has spawned a great number of Internet memes, such as Internet trendy words growing from dialects and strange pronunciations, picture memes made from video screenshots, and mesmerizing music with a vernacular flavor. Due to their reproducibility, social interaction, and involvement, these rustic videos adhere to the fundamental logic of the propagation of online memes. Rustic videos are widely disseminated as online memes on TikTok (the Chinese version), are often reproduced and used by young people in social contact, and have become a unique linguistic symbol in modern internet culture. As a symbolic carrier that transports the consciousness of the video creator and viewer, it is widely employed in the communication and engagement of young people on a regular basis, progressively altering their linguistic expression. This specific semiotic interaction has deconstructed and recreated the conventional media culture spectacle. This research examines the influence of rustic videos on TikTok on the linguistic expressions of modern youth from the perspectives of meme theory and semiotics, as well as the impact of rustic videos on the media spectacle from the standpoint of media spectacle theory. It also examines in depth the effects of the popularity of rustic videos on China’s economy and culture. 展开更多
关键词 Rustic videos Linguistic Expression SEMIOTICS MEME Media Spectacle
下载PDF
Video Games and School Performances of Dakar High School Students
17
作者 Souleymane Diallo Papa Mamadou Gaye 《Open Journal of Applied Sciences》 2024年第7期1851-1862,共12页
The triptych (Smartphones, Video games, adolescents) has imposed itself on the collective consciousness in the form of questioning to the point of becoming a social phenomenon. It thus raises concerns about the uses t... The triptych (Smartphones, Video games, adolescents) has imposed itself on the collective consciousness in the form of questioning to the point of becoming a social phenomenon. It thus raises concerns about the uses that adolescents make of it and the effects on their academic performance, which we proposed to study among middle school students from the Dakar academy inspection. Where appropriate, we used mixed methods with the collection techniques of questionnaire survey, semi-structured interviews respectively with middle school students, adults (parents, supervisors, teachers, etc.), participant observation and literature review. Concretely, before accessibility to video games, the middle school students were more idle and well-behaved, watched a lot of TV, played with their brothers and sisters or did household chores. The majority of young people had good, very good or excellent conduct and their averages were fair, fairly good or good. With access to a diverse digital environment, middle school students have passionately turned to video games. As a result, their learning time, concentration and submission to parental injunctions have declined significantly. This situation negatively affected their academic performance and encouraged bad behavior. 展开更多
关键词 Digital Revolution video Games Middle School Student EDUCATION Academic Performance
下载PDF
Real-Time Mosaic Method of Aerial Video Based on Two-Stage Key Frame Selection Method
18
作者 Minwen Yuan Yonghong Long Xin Li 《Open Journal of Applied Sciences》 2024年第4期1008-1021,共14页
A two-stage automatic key frame selection method is proposed to enhance stitching speed and quality for UAV aerial videos. In the first stage, to reduce redundancy, the overlapping rate of the UAV aerial video sequenc... A two-stage automatic key frame selection method is proposed to enhance stitching speed and quality for UAV aerial videos. In the first stage, to reduce redundancy, the overlapping rate of the UAV aerial video sequence within the sampling period is calculated. Lagrange interpolation is used to fit the overlapping rate curve of the sequence. An empirical threshold for the overlapping rate is then applied to filter candidate key frames from the sequence. In the second stage, the principle of minimizing remapping spots is used to dynamically adjust and determine the final key frame close to the candidate key frames. Comparative experiments show that the proposed method significantly improves stitching speed and accuracy by more than 40%. 展开更多
关键词 UAV Aerial video Image Stiching Key Frame Selection Overlapping Rate Remap Error
下载PDF
A Multimodal Critical Discourse Analysis of Lingnan Cultural Promotional Videos on Social Media
19
作者 Lu Jiang 《Journal of Contemporary Educational Research》 2024年第4期59-68,共10页
In recent years,more and more directors of culture and tourism have taken part in the promotion of local cultural tourism by cross-dressing,talent shows,and pushing their limits on self-media platforms.This study inve... In recent years,more and more directors of culture and tourism have taken part in the promotion of local cultural tourism by cross-dressing,talent shows,and pushing their limits on self-media platforms.This study investigates short videos of Lingnan culture promoted by directors general and deputy directors general of the Culture,Radio,Television,Tourism,and Sports Bureau of counties and cities in Guangdong Province on social media by the method of multimodal critical discourse analysis.The analysis of 33 videos shows that Lingnan culture is a domineering and confident culture,historical culture,graceful and elegant culture,and vibrant and active culture.Domineering and confident culture is embedded in the utterances and behaviors of the directors general or deputy directors general in the video.Historical culture is realized through the conversation with historical figures through time travel.Graceful and elegant culture is constructed in the depiction of sceneries and the depiction of characters’manners.Vibrant and active culture is represented in the depiction of the characters’actional process and analytical process. 展开更多
关键词 Lingnan culture Multimodal critical discourse analysis Promotional videos TikTok WeChat
下载PDF
Application Progress of Teaching in the Form of Traffic Short Videos in Nursing Education
20
作者 Qingxia Yu 《Journal of Contemporary Educational Research》 2024年第5期160-166,共7页
With the rapid development of the information technology era,the teaching quality requirements continue to surge,and the mode of education in colleges and universities has also carried out certain innovations.The inte... With the rapid development of the information technology era,the teaching quality requirements continue to surge,and the mode of education in colleges and universities has also carried out certain innovations.The integration of modern information technology into the teaching process and the combination of medical content has become a new hotspot for reform and innovation of medical education at home and abroad.In this paper,we will describe the application of traffic short videos as the main teaching form in nursing education in domestic and foreign studies,and the role of the application of this teaching form in the improvement of theoretical knowledge and clinical skills of nursing students,as well as the impact on the cultivation of nursing students’professional cognition,communication skills,critical thinking,etc.,with the aim of providing new perspectives for the subsequent nursing education. 展开更多
关键词 Traffic short video Nursing education Nursing theory Nursing operation Clinical thinking
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部