Degraded broadcast channels(DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on mult...Degraded broadcast channels(DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. The transmitter extracts semantic features for two users separately and then effectively fuses them for broadcasting by leveraging semantic similarity. Unlike traditional allocation of time, power, or bandwidth, the semantic fusion scheme can dynamically control the weight of the semantic features of the two users to balance their performance. Considering the different channel state information(CSI) of both users over DBC,a DBC-Aware method is developed that embeds the CSI of both users into the joint source-channel coding encoder and fusion module to adapt to the channel.Experimental results show that the proposed system outperforms the traditional broadcasting schemes.展开更多
With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althou...With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althoughthis approach can achieve higher detection performance,it requires huge human labor and resources to maintainthe feature library.In contrast,semantic feature engineering can dynamically discover new semantic featuresand optimize feature selection by automatically analyzing the semantic information contained in the data itself,thus reducing dependence on prior knowledge.However,current semantic features still have the problem ofsemantic expression singularity,as they are extracted from a single semantic mode such as word segmentation,character segmentation,or arbitrary semantic feature extraction.This paper extracts features of web requestsfrom dual semantic granularity,and proposes a semantic feature fusion method to solve the above problems.Themethod first preprocesses web requests,and extracts word-level and character-level semantic features of URLs viaconvolutional neural network(CNN),respectively.By constructing three loss functions to reduce losses betweenfeatures,labels and categories.Experiments on the HTTP CSIC 2010,Malicious URLs and HttpParams datasetsverify the proposedmethod.Results show that compared withmachine learning,deep learningmethods and BERTmodel,the proposed method has better detection performance.And it achieved the best detection rate of 99.16%in the dataset HttpParams.展开更多
Image fusion aims to integrate complementary information in source images to synthesize a fused image comprehensively characterizing the imaging scene. However, existing image fusion algorithms are only applicable to ...Image fusion aims to integrate complementary information in source images to synthesize a fused image comprehensively characterizing the imaging scene. However, existing image fusion algorithms are only applicable to strictly aligned source images and cause severe artifacts in the fusion results when input images have slight shifts or deformations. In addition,the fusion results typically only have good visual effect, but neglect the semantic requirements of high-level vision tasks.This study incorporates image registration, image fusion, and semantic requirements of high-level vision tasks into a single framework and proposes a novel image registration and fusion method, named Super Fusion. Specifically, we design a registration network to estimate bidirectional deformation fields to rectify geometric distortions of input images under the supervision of both photometric and end-point constraints. The registration and fusion are combined in a symmetric scheme, in which while mutual promotion can be achieved by optimizing the naive fusion loss, it is further enhanced by the mono-modal consistent constraint on symmetric fusion outputs. In addition, the image fusion network is equipped with the global spatial attention mechanism to achieve adaptive feature integration. Moreover, the semantic constraint based on the pre-trained segmentation model and Lovasz-Softmax loss is deployed to guide the fusion network to focus more on the semantic requirements of high-level vision tasks. Extensive experiments on image registration, image fusion,and semantic segmentation tasks demonstrate the superiority of our Super Fusion compared to the state-of-the-art alternatives.The source code and pre-trained model are publicly available at https://github.com/Linfeng-Tang/Super Fusion.展开更多
With the rapid spread of Internet information and the spread of fake news,the detection of fake news becomes more and more important.Traditional detection methods often rely on a single emotional or semantic feature t...With the rapid spread of Internet information and the spread of fake news,the detection of fake news becomes more and more important.Traditional detection methods often rely on a single emotional or semantic feature to identify fake news,but these methods have limitations when dealing with news in specific domains.In order to solve the problem of weak feature correlation between data from different domains,a model for detecting fake news by integrating domain-specific emotional and semantic features is proposed.This method makes full use of the attention mechanism,grasps the correlation between different features,and effectively improves the effect of feature fusion.The algorithm first extracts the semantic features of news text through the Bi-LSTM(Bidirectional Long Short-Term Memory)layer to capture the contextual relevance of news text.Senta-BiLSTM is then used to extract emotional features and predict the probability of positive and negative emotions in the text.It then uses domain features as an enhancement feature and attention mechanism to fully capture more fine-grained emotional features associated with that domain.Finally,the fusion features are taken as the input of the fake news detection classifier,combined with the multi-task representation of information,and the MLP and Softmax functions are used for classification.The experimental results show that on the Chinese dataset Weibo21,the F1 value of this model is 0.958,4.9% higher than that of the sub-optimal model;on the English dataset FakeNewsNet,the F1 value of the detection result of this model is 0.845,1.8% higher than that of the sub-optimal model,which is advanced and feasible.展开更多
Data fusion is usually an important process in multi-sensor remotely sensed imagery integration environments with the aim of enriching features lacking in the sensors involved in the fusion process. This technique has...Data fusion is usually an important process in multi-sensor remotely sensed imagery integration environments with the aim of enriching features lacking in the sensors involved in the fusion process. This technique has attracted much interest in many researches especially in the field of agriculture. On the other hand, deep learning (DL) based semantic segmentation shows high performance in remote sensing classification, and it requires large datasets in a supervised learning way. In the paper, a method of fusing multi-source remote sensing images with convolution neural networks (CNN) for semantic segmentation is proposed and applied to identify crops. Venezuelan Remote Sensing Satellite-2 (VRSS-2) and the high-resolution of Google Earth (GE) imageries have been used and more than 1000 sample sets have been collected for supervised learning process. The experiment results show that the crops extraction with an average overall accuracy more than 93% has been obtained, which demonstrates that data fusion combined with DL is highly feasible to crops extraction from satellite images and GE imagery, and it shows that deep learning techniques can serve as an invaluable tools for larger remote sensing data fusion frameworks, specifically for the applications in precision farming.展开更多
Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and ...Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and malicious detection,lacking the malicious Power Shell families classification and behavior analysis.Moreover,the state-of-the-art methods fail to capture fine-grained features and semantic relationships,resulting in low robustness and accuracy.To this end,we propose Power Detector,a novel malicious Power Shell script detector based on multimodal semantic fusion and deep learning.Specifically,we design four feature extraction methods to extract key features from character,token,abstract syntax tree(AST),and semantic knowledge graph.Then,we intelligently design four embeddings(i.e.,Char2Vec,Token2Vec,AST2Vec,and Rela2Vec) and construct a multi-modal fusion algorithm to concatenate feature vectors from different views.Finally,we propose a combined model based on transformer and CNN-Bi LSTM to implement Power Shell family detection.Our experiments with five types of Power Shell attacks show that PowerDetector can accurately detect various obfuscated and stealth PowerShell scripts,with a 0.9402 precision,a 0.9358 recall,and a 0.9374 F1-score.Furthermore,through singlemodal and multi-modal comparison experiments,we demonstrate that PowerDetector’s multi-modal embedding and deep learning model can achieve better accuracy and even identify more unknown attacks.展开更多
Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ...Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.展开更多
Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges...Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.展开更多
The application of unmanned driving in the Internet of Things is one of the concrete manifestations of the application of artificial intelligence technology.Image semantic segmentation can help the unmanned driving sy...The application of unmanned driving in the Internet of Things is one of the concrete manifestations of the application of artificial intelligence technology.Image semantic segmentation can help the unmanned driving system by achieving road accessibility analysis.Semantic segmentation is also a challenging technology for image understanding and scene parsing.We focused on the challenging task of real-time semantic segmentation in this paper.In this paper,we proposed a novel fast architecture for real-time semantic segmentation named DuFNet.Starting from the existing work of Bilateral Segmentation Network(BiSeNet),DuFNet proposes a novel Semantic Information Flow(SIF)structure for context information and a novel Fringe Information Flow(FIF)structure for spatial information.We also proposed two kinds of SIF with cascaded and paralleled structures,respectively.The SIF encodes the input stage by stage in the ResNet18 backbone and provides context information for the feature fusionmodule.Features from previous stages usually contain rich low-level details but high-level semantics for later stages.Themultiple convolutions embed in Parallel SIF aggregate the corresponding features among different stages and generate a powerful global context representation with less computational cost.The FIF consists of a pooling layer and an upsampling operator followed by projection convolution layer.The concise component provides more spatial details for the network.Compared with BiSeNet,our work achieved faster speed and comparable performance with 72.34%mIoU accuracy and 78 FPS on Cityscapes Dataset based on the ResNet18 backbone.展开更多
Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Seman...Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.展开更多
There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information...There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information. Thus, there is an urgent need to retrieve, organize, and automate videos. Video retrieval is a vital process in multimedia applications such as video search engines, digital museums, and video-on-demand broadcasting. In this paper, the different approaches of video retrieval are outlined and briefly categorized. Moreover, the different methods that bridge the semantic gap in video retrieval are discussed in more details.展开更多
Recent convolutional neural networks(CNNs)based deep learning has significantly promoted fire detection.Existing fire detection methods can efficiently recognize and locate the fire.However,the accurate flame boundary...Recent convolutional neural networks(CNNs)based deep learning has significantly promoted fire detection.Existing fire detection methods can efficiently recognize and locate the fire.However,the accurate flame boundary and shape information is hard to obtain by them,which makes it difficult to conduct automated fire region analysis,prediction,and early warning.To this end,we propose a fire semantic segmentation method based on Global Position Guidance(GPG)and Multi-path explicit Edge information Interaction(MEI).Specifically,to solve the problem of local segmentation errors in low-level feature space,a top-down global position guidance module is used to restrain the offset of low-level features.Besides,an MEI module is proposed to explicitly extract and utilize the edge information to refine the coarse fire segmentation results.We compare the proposed method with existing advanced semantic segmentation and salient object detection methods.Experimental results demonstrate that the proposed method achieves 94.1%,93.6%,94.6%,95.3%,and 95.9%Intersection over Union(IoU)on five test sets respectively which outperforms the suboptimal method by a large margin.In addition,in terms of accuracy,our approach also achieves the best score.展开更多
真实场景点云不仅具有点云的空间几何信息,还具有三维物体的颜色信息,现有的网络无法有效利用真实场景的局部特征以及空间几何特征信息,因此提出了一种双通道特征融合的真实场景点云语义分割方法DCFNet(dual-channel feature fusion of ...真实场景点云不仅具有点云的空间几何信息,还具有三维物体的颜色信息,现有的网络无法有效利用真实场景的局部特征以及空间几何特征信息,因此提出了一种双通道特征融合的真实场景点云语义分割方法DCFNet(dual-channel feature fusion of real scene for point cloud semantic segmentation)可用于不同场景下的室内外场景语义分割。更具体地说,为了解决不能充分提取真实场景点云颜色信息的问题,该方法采用上下两个输入通道,通道均采用相同的特征提取网络结构,其中上通道的输入是完整RGB颜色和点云坐标信息,该通道主要关注于复杂物体对象场景特征,下通道仅输入点云坐标信息,该通道主要关注于点云的空间几何特征;在每个通道中为了更好地提取局部与全局信息,改善网络性能,引入了层间融合模块和Transformer通道特征扩充模块;同时,针对现有的三维点云语义分割方法缺乏关注局部特征与全局特征的联系,导致对复杂场景的分割效果不佳的问题,对上下两个通道所提取的特征通过DCFFS(dual-channel feature fusion segmentation)模块进行融合,并对真实场景进行语义分割。对室内复杂场景和大规模室内外场景点云分割基准进行了实验,实验结果表明,提出的DCFNet分割方法在S3DIS Area5室内场景数据集以及STPLS3D室外场景数据集上,平均交并比(MIOU)分别达到71.18%和48.87%,平均准确率(MACC)和整体准确率(OACC)分别达到77.01%与86.91%,实现了真实场景的高精度点云语义分割。展开更多
基金supported in part by National Key R&D Project of China (2023YFB2906201)the National Natural Science Foundation of China (62222111, 62125108 and 62431015)the Fundamental Research Funds for the Central Universities。
文摘Degraded broadcast channels(DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. The transmitter extracts semantic features for two users separately and then effectively fuses them for broadcasting by leveraging semantic similarity. Unlike traditional allocation of time, power, or bandwidth, the semantic fusion scheme can dynamically control the weight of the semantic features of the two users to balance their performance. Considering the different channel state information(CSI) of both users over DBC,a DBC-Aware method is developed that embeds the CSI of both users into the joint source-channel coding encoder and fusion module to adapt to the channel.Experimental results show that the proposed system outperforms the traditional broadcasting schemes.
基金a grant from the National Natural Science Foundation of China(Nos.11905239,12005248 and 12105303).
文摘With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althoughthis approach can achieve higher detection performance,it requires huge human labor and resources to maintainthe feature library.In contrast,semantic feature engineering can dynamically discover new semantic featuresand optimize feature selection by automatically analyzing the semantic information contained in the data itself,thus reducing dependence on prior knowledge.However,current semantic features still have the problem ofsemantic expression singularity,as they are extracted from a single semantic mode such as word segmentation,character segmentation,or arbitrary semantic feature extraction.This paper extracts features of web requestsfrom dual semantic granularity,and proposes a semantic feature fusion method to solve the above problems.Themethod first preprocesses web requests,and extracts word-level and character-level semantic features of URLs viaconvolutional neural network(CNN),respectively.By constructing three loss functions to reduce losses betweenfeatures,labels and categories.Experiments on the HTTP CSIC 2010,Malicious URLs and HttpParams datasetsverify the proposedmethod.Results show that compared withmachine learning,deep learningmethods and BERTmodel,the proposed method has better detection performance.And it achieved the best detection rate of 99.16%in the dataset HttpParams.
基金supported by the National Natural Science Foundation of China(62276192,62075169,62061160370)the Key Research and Development Program of Hubei Province(2020BAB113)。
文摘Image fusion aims to integrate complementary information in source images to synthesize a fused image comprehensively characterizing the imaging scene. However, existing image fusion algorithms are only applicable to strictly aligned source images and cause severe artifacts in the fusion results when input images have slight shifts or deformations. In addition,the fusion results typically only have good visual effect, but neglect the semantic requirements of high-level vision tasks.This study incorporates image registration, image fusion, and semantic requirements of high-level vision tasks into a single framework and proposes a novel image registration and fusion method, named Super Fusion. Specifically, we design a registration network to estimate bidirectional deformation fields to rectify geometric distortions of input images under the supervision of both photometric and end-point constraints. The registration and fusion are combined in a symmetric scheme, in which while mutual promotion can be achieved by optimizing the naive fusion loss, it is further enhanced by the mono-modal consistent constraint on symmetric fusion outputs. In addition, the image fusion network is equipped with the global spatial attention mechanism to achieve adaptive feature integration. Moreover, the semantic constraint based on the pre-trained segmentation model and Lovasz-Softmax loss is deployed to guide the fusion network to focus more on the semantic requirements of high-level vision tasks. Extensive experiments on image registration, image fusion,and semantic segmentation tasks demonstrate the superiority of our Super Fusion compared to the state-of-the-art alternatives.The source code and pre-trained model are publicly available at https://github.com/Linfeng-Tang/Super Fusion.
基金The authors are highly thankful to the National Social Science Foundation of China(20BXW101,18XXW015)Innovation Research Project for the Cultivation of High-Level Scientific and Technological Talents(Top-Notch Talents of theDiscipline)(ZZKY2022303)+3 种基金National Natural Science Foundation of China(Nos.62102451,62202496)Basic Frontier Innovation Project of Engineering University of People’s Armed Police(WJX202316)This work is also supported by National Natural Science Foundation of China(No.62172436)Engineering University of PAP’s Funding for Scientific Research Innovation Team,Engineering University of PAP’s Funding for Basic Scientific Research,and Engineering University of PAP’s Funding for Education and Teaching.Natural Science Foundation of Shaanxi Province(No.2023-JCYB-584).
文摘With the rapid spread of Internet information and the spread of fake news,the detection of fake news becomes more and more important.Traditional detection methods often rely on a single emotional or semantic feature to identify fake news,but these methods have limitations when dealing with news in specific domains.In order to solve the problem of weak feature correlation between data from different domains,a model for detecting fake news by integrating domain-specific emotional and semantic features is proposed.This method makes full use of the attention mechanism,grasps the correlation between different features,and effectively improves the effect of feature fusion.The algorithm first extracts the semantic features of news text through the Bi-LSTM(Bidirectional Long Short-Term Memory)layer to capture the contextual relevance of news text.Senta-BiLSTM is then used to extract emotional features and predict the probability of positive and negative emotions in the text.It then uses domain features as an enhancement feature and attention mechanism to fully capture more fine-grained emotional features associated with that domain.Finally,the fusion features are taken as the input of the fake news detection classifier,combined with the multi-task representation of information,and the MLP and Softmax functions are used for classification.The experimental results show that on the Chinese dataset Weibo21,the F1 value of this model is 0.958,4.9% higher than that of the sub-optimal model;on the English dataset FakeNewsNet,the F1 value of the detection result of this model is 0.845,1.8% higher than that of the sub-optimal model,which is advanced and feasible.
文摘Data fusion is usually an important process in multi-sensor remotely sensed imagery integration environments with the aim of enriching features lacking in the sensors involved in the fusion process. This technique has attracted much interest in many researches especially in the field of agriculture. On the other hand, deep learning (DL) based semantic segmentation shows high performance in remote sensing classification, and it requires large datasets in a supervised learning way. In the paper, a method of fusing multi-source remote sensing images with convolution neural networks (CNN) for semantic segmentation is proposed and applied to identify crops. Venezuelan Remote Sensing Satellite-2 (VRSS-2) and the high-resolution of Google Earth (GE) imageries have been used and more than 1000 sample sets have been collected for supervised learning process. The experiment results show that the crops extraction with an average overall accuracy more than 93% has been obtained, which demonstrates that data fusion combined with DL is highly feasible to crops extraction from satellite images and GE imagery, and it shows that deep learning techniques can serve as an invaluable tools for larger remote sensing data fusion frameworks, specifically for the applications in precision farming.
基金This work was supported by National Natural Science Foundation of China(No.62172308,No.U1626107,No.61972297,No.62172144,and No.62062019).
文摘Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and malicious detection,lacking the malicious Power Shell families classification and behavior analysis.Moreover,the state-of-the-art methods fail to capture fine-grained features and semantic relationships,resulting in low robustness and accuracy.To this end,we propose Power Detector,a novel malicious Power Shell script detector based on multimodal semantic fusion and deep learning.Specifically,we design four feature extraction methods to extract key features from character,token,abstract syntax tree(AST),and semantic knowledge graph.Then,we intelligently design four embeddings(i.e.,Char2Vec,Token2Vec,AST2Vec,and Rela2Vec) and construct a multi-modal fusion algorithm to concatenate feature vectors from different views.Finally,we propose a combined model based on transformer and CNN-Bi LSTM to implement Power Shell family detection.Our experiments with five types of Power Shell attacks show that PowerDetector can accurately detect various obfuscated and stealth PowerShell scripts,with a 0.9402 precision,a 0.9358 recall,and a 0.9374 F1-score.Furthermore,through singlemodal and multi-modal comparison experiments,we demonstrate that PowerDetector’s multi-modal embedding and deep learning model can achieve better accuracy and even identify more unknown attacks.
基金Ministry of Science and Technology Basic Resources Survey Special Project,Grant/Award Number:2019FY100900High-level Hospital Construction Project,Grant/Award Number:DFJH2019015+2 种基金National Natural Science Foundation of China,Grant/Award Number:61871021Guangdong Natural Science Foundation,Grant/Award Number:2019A1515011676Beijing Key Laboratory of Robotics Bionic and Functional Research。
文摘Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.
基金Basic and Advanced Research Projects of CSTC,Grant/Award Number:cstc2019jcyj-zdxmX0008Science and Technology Research Program of Chongqing Municipal Education Commission,Grant/Award Numbers:KJQN202100634,KJZDK201900605National Natural Science Foundation of China,Grant/Award Number:62006065。
文摘Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios.
基金supported in part by the National Key RD Program of China (2021YFF0602104-2,2020YFB1804604)in part by the 2020 Industrial Internet Innovation and Development Project from Ministry of Industry and Information Technology of Chinain part by the Fundamental Research Fund for the Central Universities (30918012204,30920041112).
文摘The application of unmanned driving in the Internet of Things is one of the concrete manifestations of the application of artificial intelligence technology.Image semantic segmentation can help the unmanned driving system by achieving road accessibility analysis.Semantic segmentation is also a challenging technology for image understanding and scene parsing.We focused on the challenging task of real-time semantic segmentation in this paper.In this paper,we proposed a novel fast architecture for real-time semantic segmentation named DuFNet.Starting from the existing work of Bilateral Segmentation Network(BiSeNet),DuFNet proposes a novel Semantic Information Flow(SIF)structure for context information and a novel Fringe Information Flow(FIF)structure for spatial information.We also proposed two kinds of SIF with cascaded and paralleled structures,respectively.The SIF encodes the input stage by stage in the ResNet18 backbone and provides context information for the feature fusionmodule.Features from previous stages usually contain rich low-level details but high-level semantics for later stages.Themultiple convolutions embed in Parallel SIF aggregate the corresponding features among different stages and generate a powerful global context representation with less computational cost.The FIF consists of a pooling layer and an upsampling operator followed by projection convolution layer.The concise component provides more spatial details for the network.Compared with BiSeNet,our work achieved faster speed and comparable performance with 72.34%mIoU accuracy and 78 FPS on Cityscapes Dataset based on the ResNet18 backbone.
基金Supported by the National Natural Science Foundation of China (No. 61072110)the Industrial Tackling Project of Shaanxi Province (2010K06-20)the Natural Science Foundation of Shaanxi Province (SJ08F15)
文摘Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter.
文摘There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information. Thus, there is an urgent need to retrieve, organize, and automate videos. Video retrieval is a vital process in multimedia applications such as video search engines, digital museums, and video-on-demand broadcasting. In this paper, the different approaches of video retrieval are outlined and briefly categorized. Moreover, the different methods that bridge the semantic gap in video retrieval are discussed in more details.
基金This work was supported in part by the Important Science and Technology Project of Hainan Province under Grant ZDKJ2020010in part by Frontier Exploration Project Independently Deployed by Institute of Acoustics,Chinese Academy of Sciences under Grant QYTS202015 and Grant QYTS202115.
文摘Recent convolutional neural networks(CNNs)based deep learning has significantly promoted fire detection.Existing fire detection methods can efficiently recognize and locate the fire.However,the accurate flame boundary and shape information is hard to obtain by them,which makes it difficult to conduct automated fire region analysis,prediction,and early warning.To this end,we propose a fire semantic segmentation method based on Global Position Guidance(GPG)and Multi-path explicit Edge information Interaction(MEI).Specifically,to solve the problem of local segmentation errors in low-level feature space,a top-down global position guidance module is used to restrain the offset of low-level features.Besides,an MEI module is proposed to explicitly extract and utilize the edge information to refine the coarse fire segmentation results.We compare the proposed method with existing advanced semantic segmentation and salient object detection methods.Experimental results demonstrate that the proposed method achieves 94.1%,93.6%,94.6%,95.3%,and 95.9%Intersection over Union(IoU)on five test sets respectively which outperforms the suboptimal method by a large margin.In addition,in terms of accuracy,our approach also achieves the best score.
文摘真实场景点云不仅具有点云的空间几何信息,还具有三维物体的颜色信息,现有的网络无法有效利用真实场景的局部特征以及空间几何特征信息,因此提出了一种双通道特征融合的真实场景点云语义分割方法DCFNet(dual-channel feature fusion of real scene for point cloud semantic segmentation)可用于不同场景下的室内外场景语义分割。更具体地说,为了解决不能充分提取真实场景点云颜色信息的问题,该方法采用上下两个输入通道,通道均采用相同的特征提取网络结构,其中上通道的输入是完整RGB颜色和点云坐标信息,该通道主要关注于复杂物体对象场景特征,下通道仅输入点云坐标信息,该通道主要关注于点云的空间几何特征;在每个通道中为了更好地提取局部与全局信息,改善网络性能,引入了层间融合模块和Transformer通道特征扩充模块;同时,针对现有的三维点云语义分割方法缺乏关注局部特征与全局特征的联系,导致对复杂场景的分割效果不佳的问题,对上下两个通道所提取的特征通过DCFFS(dual-channel feature fusion segmentation)模块进行融合,并对真实场景进行语义分割。对室内复杂场景和大规模室内外场景点云分割基准进行了实验,实验结果表明,提出的DCFNet分割方法在S3DIS Area5室内场景数据集以及STPLS3D室外场景数据集上,平均交并比(MIOU)分别达到71.18%和48.87%,平均准确率(MACC)和整体准确率(OACC)分别达到77.01%与86.91%,实现了真实场景的高精度点云语义分割。