Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the...Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.展开更多
Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed video...Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed videos is proposed to analyze the video events according to six low-level features. After the detailed analysis of video events, the pattern of global motion and five features in foreground—the principal parts of videos, are employed as the observations of the Hidden Markov Model to classify events in videos. The applications of the proposed framework in some video event detections demonstrate the promising success of the proposed framework on semantic video analysis.展开更多
Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Seman...Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.展开更多
With the proliferation of the internet,big data continues to grow exponentially,and video has become the largest source.Video big data intro-duces many technological challenges,including compression,storage,trans-miss...With the proliferation of the internet,big data continues to grow exponentially,and video has become the largest source.Video big data intro-duces many technological challenges,including compression,storage,trans-mission,analysis,and recognition.The increase in the number of multimedia resources has brought an urgent need to develop intelligent methods to organize and process them.The integration between Semantic link Networks and multimedia resources provides a new prospect for organizing them with their semantics.The tags and surrounding texts of multimedia resources are used to measure their semantic association.Two evaluation methods including clustering and retrieval are performed to measure the semantic relatedness between images accurately and robustly.A Fuzzy Rule-Based Model for Semantic Content Extraction is designed which performs classification with fuzzy rules.The features extracted are trained with the neural network where each network contains several layers among them each layer of neurons is dedicated to measuring the weight towards different semantic events.Each neuron measures its weight according to different features like shape,size,direction,speed,and other features.The object is identified by subtracting the background features and trained to detect based on the features like size,shape,and direction.The weight measurement is performed according to the fuzzy rules and based on the weight measures.These frameworks enhance the video analytics feature and help in video surveillance systems with better accuracy and precision.展开更多
基金supported by the National Key Research and Development Project under Grant 2020YFB1807602Key Program of Marine Economy Development Special Foundation of Department of Natural Resources of Guangdong Province(GDNRC[2023]24)the National Natural Science Foundation of China under Grant 62271267.
文摘Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.
基金Supported in part by the National Natural Science Foundation of China (No. 60572045)the Ministry of Education of China Ph.D. Program Foundation (No.20050698033)Cooperation Project (2005.7-2007.6) with Microsoft Research Asia.
文摘Semantic video analysis plays an important role in the field of machine intelligence and pattern recognition. In this paper, based on the Hidden Markov Model (HMM), a semantic recognition framework on compressed videos is proposed to analyze the video events according to six low-level features. After the detailed analysis of video events, the pattern of global motion and five features in foreground—the principal parts of videos, are employed as the observations of the Hidden Markov Model to classify events in videos. The applications of the proposed framework in some video event detections demonstrate the promising success of the proposed framework on semantic video analysis.
基金Supported by the National Natural Science Foundation of China (No. 61072110)the Industrial Tackling Project of Shaanxi Province (2010K06-20)the Natural Science Foundation of Shaanxi Province (SJ08F15)
文摘Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.
基金funded in part by Major projects of the National Social Science Fund(16ZDA054)of Chinathe Postgraduate Research&Practice Innovation Program of Jiansu Province(NO.KYCX18_0999)of Chinathe Engineering Research Center for Software Testing and Evaluation of Fujian Province(ST2018004)of China.
文摘With the proliferation of the internet,big data continues to grow exponentially,and video has become the largest source.Video big data intro-duces many technological challenges,including compression,storage,trans-mission,analysis,and recognition.The increase in the number of multimedia resources has brought an urgent need to develop intelligent methods to organize and process them.The integration between Semantic link Networks and multimedia resources provides a new prospect for organizing them with their semantics.The tags and surrounding texts of multimedia resources are used to measure their semantic association.Two evaluation methods including clustering and retrieval are performed to measure the semantic relatedness between images accurately and robustly.A Fuzzy Rule-Based Model for Semantic Content Extraction is designed which performs classification with fuzzy rules.The features extracted are trained with the neural network where each network contains several layers among them each layer of neurons is dedicated to measuring the weight towards different semantic events.Each neuron measures its weight according to different features like shape,size,direction,speed,and other features.The object is identified by subtracting the background features and trained to detect based on the features like size,shape,and direction.The weight measurement is performed according to the fuzzy rules and based on the weight measures.These frameworks enhance the video analytics feature and help in video surveillance systems with better accuracy and precision.