期刊文献+
共找到620篇文章
< 1 2 31 >
每页显示 20 50 100
Multi-User Semantic Fusion for Semantic Communications over Degraded Broadcast Channels
1
作者 Wu Tong Chen Zhiyong +2 位作者 Tao Meixia Xia Bin Zhang Wenjun 《China Communications》 SCIE CSCD 2024年第10期86-100,共15页
Degraded broadcast channels(DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on mult... Degraded broadcast channels(DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. The transmitter extracts semantic features for two users separately and then effectively fuses them for broadcasting by leveraging semantic similarity. Unlike traditional allocation of time, power, or bandwidth, the semantic fusion scheme can dynamically control the weight of the semantic features of the two users to balance their performance. Considering the different channel state information(CSI) of both users over DBC,a DBC-Aware method is developed that embeds the CSI of both users into the joint source-channel coding encoder and fusion module to adapt to the channel.Experimental results show that the proposed system outperforms the traditional broadcasting schemes. 展开更多
关键词 channel adaptability degraded broadcasting channels semantic communications semantic fusion
下载PDF
FusionNN:A Semantic Feature Fusion Model Based on Multimodal for Web Anomaly Detection
2
作者 Li Wang Mingshan Xia +3 位作者 Hao Hu Jianfang Li Fengyao Hou Gang Chen 《Computers, Materials & Continua》 SCIE EI 2024年第5期2991-3006,共16页
With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althou... With the rapid development of the mobile communication and the Internet,the previous web anomaly detectionand identificationmodels were built relying on security experts’empirical knowledge and attack features.Althoughthis approach can achieve higher detection performance,it requires huge human labor and resources to maintainthe feature library.In contrast,semantic feature engineering can dynamically discover new semantic featuresand optimize feature selection by automatically analyzing the semantic information contained in the data itself,thus reducing dependence on prior knowledge.However,current semantic features still have the problem ofsemantic expression singularity,as they are extracted from a single semantic mode such as word segmentation,character segmentation,or arbitrary semantic feature extraction.This paper extracts features of web requestsfrom dual semantic granularity,and proposes a semantic feature fusion method to solve the above problems.Themethod first preprocesses web requests,and extracts word-level and character-level semantic features of URLs viaconvolutional neural network(CNN),respectively.By constructing three loss functions to reduce losses betweenfeatures,labels and categories.Experiments on the HTTP CSIC 2010,Malicious URLs and HttpParams datasetsverify the proposedmethod.Results show that compared withmachine learning,deep learningmethods and BERTmodel,the proposed method has better detection performance.And it achieved the best detection rate of 99.16%in the dataset HttpParams. 展开更多
关键词 Feature fusion web anomaly detection MULTIMODAL convolutional neural network(CNN) semantic feature extraction
下载PDF
A Model for Detecting Fake News by Integrating Domain-Specific Emotional and Semantic Features
3
作者 Wen Jiang Mingshu Zhang +4 位作者 Xu’an Wang Wei Bin Xiong Zhang Kelan Ren Facheng Yan 《Computers, Materials & Continua》 SCIE EI 2024年第8期2161-2179,共19页
With the rapid spread of Internet information and the spread of fake news,the detection of fake news becomes more and more important.Traditional detection methods often rely on a single emotional or semantic feature t... With the rapid spread of Internet information and the spread of fake news,the detection of fake news becomes more and more important.Traditional detection methods often rely on a single emotional or semantic feature to identify fake news,but these methods have limitations when dealing with news in specific domains.In order to solve the problem of weak feature correlation between data from different domains,a model for detecting fake news by integrating domain-specific emotional and semantic features is proposed.This method makes full use of the attention mechanism,grasps the correlation between different features,and effectively improves the effect of feature fusion.The algorithm first extracts the semantic features of news text through the Bi-LSTM(Bidirectional Long Short-Term Memory)layer to capture the contextual relevance of news text.Senta-BiLSTM is then used to extract emotional features and predict the probability of positive and negative emotions in the text.It then uses domain features as an enhancement feature and attention mechanism to fully capture more fine-grained emotional features associated with that domain.Finally,the fusion features are taken as the input of the fake news detection classifier,combined with the multi-task representation of information,and the MLP and Softmax functions are used for classification.The experimental results show that on the Chinese dataset Weibo21,the F1 value of this model is 0.958,4.9% higher than that of the sub-optimal model;on the English dataset FakeNewsNet,the F1 value of the detection result of this model is 0.845,1.8% higher than that of the sub-optimal model,which is advanced and feasible. 展开更多
关键词 Fake news detection domain-related emotional features semantic features feature fusion
下载PDF
PowerDetector:Malicious PowerShell Script Family Classification Based on Multi-Modal Semantic Fusion and Deep Learning 被引量:1
4
作者 Xiuzhang Yang Guojun Peng +2 位作者 Dongni Zhang Yuhang Gao Chenguang Li 《China Communications》 SCIE CSCD 2023年第11期202-224,共23页
Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and ... Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and malicious detection,lacking the malicious Power Shell families classification and behavior analysis.Moreover,the state-of-the-art methods fail to capture fine-grained features and semantic relationships,resulting in low robustness and accuracy.To this end,we propose Power Detector,a novel malicious Power Shell script detector based on multimodal semantic fusion and deep learning.Specifically,we design four feature extraction methods to extract key features from character,token,abstract syntax tree(AST),and semantic knowledge graph.Then,we intelligently design four embeddings(i.e.,Char2Vec,Token2Vec,AST2Vec,and Rela2Vec) and construct a multi-modal fusion algorithm to concatenate feature vectors from different views.Finally,we propose a combined model based on transformer and CNN-Bi LSTM to implement Power Shell family detection.Our experiments with five types of Power Shell attacks show that PowerDetector can accurately detect various obfuscated and stealth PowerShell scripts,with a 0.9402 precision,a 0.9358 recall,and a 0.9374 F1-score.Furthermore,through singlemodal and multi-modal comparison experiments,we demonstrate that PowerDetector’s multi-modal embedding and deep learning model can achieve better accuracy and even identify more unknown attacks. 展开更多
关键词 deep learning malicious family detection multi-modal semantic fusion POWERSHELL
下载PDF
CSMCCVA:Framework of cross-modal semantic mapping based on cognitive computing of visual and auditory sensations 被引量:1
5
作者 刘扬 Zheng Fengbin Zuo Xianyu 《High Technology Letters》 EI CAS 2016年第1期90-98,共9页
Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of co... Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of cognitive system,and establishes a brain-like cross-modal semantic mapping framework based on cognitive computing of visual and auditory sensations.The mechanism of visual-auditory multisensory integration,selective attention in thalamo-cortical,emotional control in limbic system and the memory-enhancing in hippocampal were considered in the framework.Then,the algorithms of cross-modal semantic mapping were given.Experimental results show that the framework can be effectively applied to the cross-modal semantic mapping,and also provides an important significance for brain-like computing of non-von Neumann structure. 展开更多
关键词 multimedia neural cognitive computing (MNCC) brain-like computing cross-modal semantic mapping (CSM) selective attention limbic system multisensory integration memory-enhancing mechanism
下载PDF
Bilateral U-Net semantic segmentation with spatial attention mechanism 被引量:2
6
作者 Guangzhe Zhao Yimeng Zhang +1 位作者 Maoning Ge Min Yu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期297-307,共11页
Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ... Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%. 展开更多
关键词 attention mechanism receptive field semantic fusion semantic segmentation spatial attention module U-Net
下载PDF
DuFNet:Dual Flow Network of Real-Time Semantic Segmentation for Unmanned Driving Application of Internet of Things 被引量:1
7
作者 Tao Duan Yue Liu +2 位作者 Jingze Li Zhichao Lian d Qianmu Li 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第7期223-239,共17页
The application of unmanned driving in the Internet of Things is one of the concrete manifestations of the application of artificial intelligence technology.Image semantic segmentation can help the unmanned driving sy... The application of unmanned driving in the Internet of Things is one of the concrete manifestations of the application of artificial intelligence technology.Image semantic segmentation can help the unmanned driving system by achieving road accessibility analysis.Semantic segmentation is also a challenging technology for image understanding and scene parsing.We focused on the challenging task of real-time semantic segmentation in this paper.In this paper,we proposed a novel fast architecture for real-time semantic segmentation named DuFNet.Starting from the existing work of Bilateral Segmentation Network(BiSeNet),DuFNet proposes a novel Semantic Information Flow(SIF)structure for context information and a novel Fringe Information Flow(FIF)structure for spatial information.We also proposed two kinds of SIF with cascaded and paralleled structures,respectively.The SIF encodes the input stage by stage in the ResNet18 backbone and provides context information for the feature fusionmodule.Features from previous stages usually contain rich low-level details but high-level semantics for later stages.Themultiple convolutions embed in Parallel SIF aggregate the corresponding features among different stages and generate a powerful global context representation with less computational cost.The FIF consists of a pooling layer and an upsampling operator followed by projection convolution layer.The concise component provides more spatial details for the network.Compared with BiSeNet,our work achieved faster speed and comparable performance with 72.34%mIoU accuracy and 78 FPS on Cityscapes Dataset based on the ResNet18 backbone. 展开更多
关键词 Real-time semantic segmentation convolutional neural network feature fusion unmanned driving fringe information flow
下载PDF
SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness 被引量:8
8
作者 Linfeng Tang Yuxin Deng +2 位作者 Yong Ma Jun Huang Jiayi Ma 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第12期2121-2137,共17页
Image fusion aims to integrate complementary information in source images to synthesize a fused image comprehensively characterizing the imaging scene. However, existing image fusion algorithms are only applicable to ... Image fusion aims to integrate complementary information in source images to synthesize a fused image comprehensively characterizing the imaging scene. However, existing image fusion algorithms are only applicable to strictly aligned source images and cause severe artifacts in the fusion results when input images have slight shifts or deformations. In addition,the fusion results typically only have good visual effect, but neglect the semantic requirements of high-level vision tasks.This study incorporates image registration, image fusion, and semantic requirements of high-level vision tasks into a single framework and proposes a novel image registration and fusion method, named Super Fusion. Specifically, we design a registration network to estimate bidirectional deformation fields to rectify geometric distortions of input images under the supervision of both photometric and end-point constraints. The registration and fusion are combined in a symmetric scheme, in which while mutual promotion can be achieved by optimizing the naive fusion loss, it is further enhanced by the mono-modal consistent constraint on symmetric fusion outputs. In addition, the image fusion network is equipped with the global spatial attention mechanism to achieve adaptive feature integration. Moreover, the semantic constraint based on the pre-trained segmentation model and Lovasz-Softmax loss is deployed to guide the fusion network to focus more on the semantic requirements of high-level vision tasks. Extensive experiments on image registration, image fusion,and semantic segmentation tasks demonstrate the superiority of our Super Fusion compared to the state-of-the-art alternatives.The source code and pre-trained model are publicly available at https://github.com/Linfeng-Tang/Super Fusion. 展开更多
关键词 Global spatial attention image fusion image registration mutual promotion semantic awareness
下载PDF
Adequate alignment and interaction for cross-modal retrieval
9
作者 Mingkang WANG Min MENG +1 位作者 Jigang LIU Jigang WU 《Virtual Reality & Intelligent Hardware》 EI 2023年第6期509-522,共14页
Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing... Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing.Recently,visual and semantic embedding(VSE)learning has shown promising improvements in image text retrieval tasks.Most existing VSE models employ two unrelated encoders to extract features and then use complex methods to contextualize and aggregate these features into holistic embeddings.Despite recent advances,existing approaches still suffer from two limitations:(1)without considering intermediate interactions and adequate alignment between different modalities,these models cannot guarantee the discriminative ability of representations;and(2)existing feature aggregators are susceptible to certain noisy regions,which may lead to unreasonable pooling coefficients and affect the quality of the final aggregated features.Methods To address these challenges,we propose a novel cross-modal retrieval model containing a well-designed alignment module and a novel multimodal fusion encoder that aims to learn the adequate alignment and interaction of aggregated features to effectively bridge the modality gap.Results Experiments on the Microsoft COCO and Flickr30k datasets demonstrated the superiority of our model over state-of-the-art methods. 展开更多
关键词 cross-modal retrieval Visual semantic embedding Feature aggregation Transformer
下载PDF
Semantic Segmentation Based Remote Sensing Data Fusion on Crops Detection 被引量:1
10
作者 Jose Pena Yumin Tan Wuttichai Boonpook 《Journal of Computer and Communications》 2019年第7期53-64,共12页
Data fusion is usually an important process in multi-sensor remotely sensed imagery integration environments with the aim of enriching features lacking in the sensors involved in the fusion process. This technique has... Data fusion is usually an important process in multi-sensor remotely sensed imagery integration environments with the aim of enriching features lacking in the sensors involved in the fusion process. This technique has attracted much interest in many researches especially in the field of agriculture. On the other hand, deep learning (DL) based semantic segmentation shows high performance in remote sensing classification, and it requires large datasets in a supervised learning way. In the paper, a method of fusing multi-source remote sensing images with convolution neural networks (CNN) for semantic segmentation is proposed and applied to identify crops. Venezuelan Remote Sensing Satellite-2 (VRSS-2) and the high-resolution of Google Earth (GE) imageries have been used and more than 1000 sample sets have been collected for supervised learning process. The experiment results show that the crops extraction with an average overall accuracy more than 93% has been obtained, which demonstrates that data fusion combined with DL is highly feasible to crops extraction from satellite images and GE imagery, and it shows that deep learning techniques can serve as an invaluable tools for larger remote sensing data fusion frameworks, specifically for the applications in precision farming. 展开更多
关键词 Data fusion CROPS DETECTION semantic SEGMENTATION VRSS-2
下载PDF
ST-SIGMA:Spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting 被引量:2
11
作者 Yang Fang Bei Luo +3 位作者 Ting Zhao Dong He Bingbing Jiang Qilie Liu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2022年第4期744-757,共14页
Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges... Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio-temporal semantics and interaction graph aggregation for multi-agent perception and trajectory forecasting(STSIGMA),an efficient end-to-end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST-SIGMA adopts a trident encoder-decoder architecture to learn scene semantics and agent interaction information on bird’s-eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST-SIGMA further exploits a spatio-temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST-SIGMA achieves significant improvements compared to the state-of-theart(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in realworld AD scenarios. 展开更多
关键词 feature fusion graph interaction hierarchical aggregation scene perception scene semantics trajectory forecasting
下载PDF
Social network search based on semantic analysis and learning 被引量:12
12
作者 Feifei Kou Junping Du +1 位作者 Yijiang He Lingfei Ye 《CAAI Transactions on Intelligence Technology》 2016年第4期293-302,共10页
Because of everyone's involvement in social networks, social networks are full of massive multimedia data, and events are got released and disseminated through social networks in the form of multi-modal and multi-att... Because of everyone's involvement in social networks, social networks are full of massive multimedia data, and events are got released and disseminated through social networks in the form of multi-modal and multi-attribute heterogeneous data. There have been numerous researches on social network search. Considering the spatio-temporal feature of messages and social relationships among users, we summarized an overall social network search framework from the perspective of semantics based on existing researches. For social network search, the acquisition and representation of spatio-temporal data is the basis, the semantic analysis and modeling of social network cross-media big data is an important component, deep semantic learning of social networks is the key research field, and the indexing and ranking mechanism is the indispensable part. This paper reviews the current studies in these fields, and then main challenges of social network search are given. Finally, we give an outlook to the prospect and further work of social network search. 展开更多
关键词 semantic analysis semantic learning cross-modal Social network search
下载PDF
A NOVEL FRAMEWORK FOR SOCCER GOAL DETECTION BASED ON SEMANTIC RULE
13
作者 Xie Wenjuan Tong Ming 《Journal of Electronics(China)》 2011年第4期670-674,共5页
Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Seman... Focusing on the problem of goal event detection in soccer videos,a novel method based on Hidden Markov Model(HMM) and the semantic rule is proposed.Firstly,a HMM for a goal event is constructed.Then a Normalized Semantic Weighted Sum(NSWS) rule is established by defining a new feature of shots,semantic observation weight.The test video is detected based on the HMM and the NSWS rule,respectively.Finally,a fusion scheme based on logic distance is proposed and the detection results of the HMM and the NSWS rule are fused by optimal weights in the decision level,obtaining the final result.Experimental results indicate that the proposed method achieves 96.43% precision and 100% recall,which shows the effectiveness of this letter. 展开更多
关键词 Video semantic analysis Event detection Hidden Markov Model(HMM) semantic rule Decision-level fusion
下载PDF
Semantic-Based Video Retrieval Survey
14
作者 Shaimaa Toriah Mohamed Toriah Atef Zaki Ghalwash Aliaa A. A. Youssif 《Journal of Computer and Communications》 2018年第8期28-44,共17页
There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information... There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information. Thus, there is an urgent need to retrieve, organize, and automate videos. Video retrieval is a vital process in multimedia applications such as video search engines, digital museums, and video-on-demand broadcasting. In this paper, the different approaches of video retrieval are outlined and briefly categorized. Moreover, the different methods that bridge the semantic gap in video retrieval are discussed in more details. 展开更多
关键词 semantic Video RETRIEVAL CONCEPT Detectors CONTEXT Based CONCEPT fusion semantic GAP
下载PDF
Exploiting multi-context analysis in semantic image classification
15
作者 田永鸿 黄铁军 高文 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第11期1268-1283,共16页
As the popularity of digital images is rapidly increasing on the Internet, research on technologies for semantic image classification has become an important research topic. However, the well-known content-based image... As the popularity of digital images is rapidly increasing on the Internet, research on technologies for semantic image classification has become an important research topic. However, the well-known content-based image classification methods do not overcome the so-called semantic gap problem in which low-level visual features cannot represent the high-level semantic content of images. Image classification using visual and textual information often performs poorly since the extracted textual features are often too limited to accurately represent the images. In this paper, we propose a semantic image classification ap- proach using multi-context analysis. For a given image, we model the relevant textual information as its multi-modal context, and regard the related images connected by hyperlinks as its link context. Two kinds of context analysis models, i.e., cross-modal correlation analysis and link-based correlation model, are used to capture the correlation among different modals of features and the topical dependency among images induced by the link structure. We propose a new collective classification model called relational support vector classifier (RSVC) based on the well-known Support Vector Machines (SVMs) and the link-based cor- relation model. Experiments showed that the proposed approach significantly improved classification accuracy over that of SVM classifiers using visual and/or textual features. 展开更多
关键词 Image classification Multi-context analysis cross-modal correlation analysis Link-based correlation model Linkage semantic kernels Relational support vector classifier
下载PDF
Optimized Deep Learning Model for Fire Semantic Segmentation
16
作者 Songbin Li Peng Liu +1 位作者 Qiandong Yan Ruiling Qian 《Computers, Materials & Continua》 SCIE EI 2022年第9期4999-5013,共15页
Recent convolutional neural networks(CNNs)based deep learning has significantly promoted fire detection.Existing fire detection methods can efficiently recognize and locate the fire.However,the accurate flame boundary... Recent convolutional neural networks(CNNs)based deep learning has significantly promoted fire detection.Existing fire detection methods can efficiently recognize and locate the fire.However,the accurate flame boundary and shape information is hard to obtain by them,which makes it difficult to conduct automated fire region analysis,prediction,and early warning.To this end,we propose a fire semantic segmentation method based on Global Position Guidance(GPG)and Multi-path explicit Edge information Interaction(MEI).Specifically,to solve the problem of local segmentation errors in low-level feature space,a top-down global position guidance module is used to restrain the offset of low-level features.Besides,an MEI module is proposed to explicitly extract and utilize the edge information to refine the coarse fire segmentation results.We compare the proposed method with existing advanced semantic segmentation and salient object detection methods.Experimental results demonstrate that the proposed method achieves 94.1%,93.6%,94.6%,95.3%,and 95.9%Intersection over Union(IoU)on five test sets respectively which outperforms the suboptimal method by a large margin.In addition,in terms of accuracy,our approach also achieves the best score. 展开更多
关键词 Fire semantic segmentation local segmentation errors global position guidance multi-path explicit edge information interaction feature fusion
下载PDF
基于改进DeepLabv3+的轻量化作物杂草识别方法 被引量:1
17
作者 曲福恒 李金状 +2 位作者 杨勇 康镇南 严兴旺 《石河子大学学报(自然科学版)》 CAS 北大核心 2024年第1期117-125,共9页
为在存储资源与计算能力有限的设备上实现田间作物和杂草的识别,本文提出一种基于改进DeepLabv3+的轻量化语义分割网络。首先,以MobileNet v2作为DeepLabv3+的特征提取骨干网络,提出双分支残差模块替换倒残差模块,并删除后两层卷积以降... 为在存储资源与计算能力有限的设备上实现田间作物和杂草的识别,本文提出一种基于改进DeepLabv3+的轻量化语义分割网络。首先,以MobileNet v2作为DeepLabv3+的特征提取骨干网络,提出双分支残差模块替换倒残差模块,并删除后两层卷积以降低模型参数量。其次,在空洞空间金字塔池化(Atrous Spatial Pyramid Pooling,ASPP)模块中引入分组逐点卷积,使用深度扩张卷积替换标准卷积,并将卷积后的特征图进行多尺度特征融合增强对作物和杂草深层特征的提取能力。最后,将原有的非线性激活函数替换为Leaky ReLU激活函数来提升分割精度。实验结果表明:改进后网络的mIOU达到86.75%,参数量仅为0.69M,FPS达到了98,与原始DeepLabv3+以及3个典型轻量化语义分割网络的相比,参数量最小,在对比的轻量化网络中具有最高的分割精度。 展开更多
关键词 作物和杂草识别 轻量化 语义分割 DeepLabv3+ MobileNet v2 多尺度特征融合
下载PDF
面向视频数据的多模态情感分析
18
作者 武星 殷浩宇 +2 位作者 姚骏峰 李卫民 钱权 《计算机工程》 CAS CSCD 北大核心 2024年第6期218-227,共10页
多模态情感分析旨在从文本、图像和音频数据中提取和整合语义信息,从而识别在线视频中说话者的情感状态。尽管多模态融合方案在此研究领域已取得一定成果,但是已有方法在处理模态间分布差异和关系知识的融合方面仍有欠缺,为此,提出一种... 多模态情感分析旨在从文本、图像和音频数据中提取和整合语义信息,从而识别在线视频中说话者的情感状态。尽管多模态融合方案在此研究领域已取得一定成果,但是已有方法在处理模态间分布差异和关系知识的融合方面仍有欠缺,为此,提出一种多模态情感分析方法。设计一种多模态提示门(MPG)模块,其能够将非语言信息转换为融合文本上下文的提示,利用文本信息对非语言信号的噪声进行过滤,得到包含丰富语义信息的提示,以增强模态间的信息整合。此外,提出一种实例到标签的对比学习框架,在语义层面上区分隐空间中的不同标签以进一步优化模型输出。在3个大规模情感分析数据集上的实验结果表明,该方法的二分类精度相对次优模型提高了约0.7%,三分类精度提高了超过2.5%,达到0.671。该方法能够为将多模态情感分析引入用户画像、视频理解、AI面试等领域提供参考。 展开更多
关键词 多模态情感分析 语义信息 多模态融合 上下文表征 对比学习
下载PDF
基于细节增强的双分支实时语义分割网络
19
作者 郑秋梅 牛薇薇 +1 位作者 王风华 赵丹 《计算机应用》 CSCD 北大核心 2024年第10期3058-3066,共9页
实时语义分割方法常利用双分支结构分别保存图像的浅层空间信息和深层语义信息。然而,当前基于双分支结构的实时语义分割方法重点研究语义特征的挖掘,忽略了空间特征的保持,导致网络无法精准地捕捉图像内物体的边界和纹理等细节特征,最... 实时语义分割方法常利用双分支结构分别保存图像的浅层空间信息和深层语义信息。然而,当前基于双分支结构的实时语义分割方法重点研究语义特征的挖掘,忽略了空间特征的保持,导致网络无法精准地捕捉图像内物体的边界和纹理等细节特征,最终分割效果欠佳。针对以上问题,提出基于细节增强的双分支实时语义分割网络(DEDBNet),多阶段增强空间细节信息。首先,提出细节增强双向交互(DEBIM)模块,在分支间的交互阶段使用轻量空间注意力机制增强高分辨率特征图对细节信息的表达能力,促进空间细节特征在高低两分支上的流动,以加强网络对细节信息的学习能力;其次,设计局部细节注意力特征融合模块(LDAFF),在两分支末端特征融合的过程中同时建模全局语义信息和局部空间信息,解决不同层次特征图之间细节不连续的问题;此外,引入边界损失,在不影响模型速度的情况下引导网络浅层学习物体边界信息。所提网络在Cityscapes验证集上以92.3 frame/s的帧速率(FPS)获得78.2%的平均交并比(mIoU),在CamVid测试集上以202.8 frame/s获得79.2%的mIoU;与深度双分辨率网络(DDRNet-23-slim)相比,mIoU分别提高了1.1和4.5个百分点。实验结果表明,DEDBNet能够准确地分割场景图像,且满足实时性要求。 展开更多
关键词 实时语义分割 双分支 细节增强 特征融合 注意力机制
下载PDF
基于异构图和语义融合的实体关系抽取
20
作者 唐贤伦 丁河长 +2 位作者 唐瑜泽 谢涛 罗洪平 《实验技术与管理》 CAS 北大核心 2024年第8期22-29,共8页
关系抽取是信息抽取中的一项重要任务,其目的是从非结构化文本中抽取出所有关系三元组。然而,如何有效地处理这一问题仍然是一个挑战,特别是对于关系重叠问题。为了有效处理重叠问题,该文提出一种基于异构图和语义融合的实体关系抽取方... 关系抽取是信息抽取中的一项重要任务,其目的是从非结构化文本中抽取出所有关系三元组。然而,如何有效地处理这一问题仍然是一个挑战,特别是对于关系重叠问题。为了有效处理重叠问题,该文提出一种基于异构图和语义融合的实体关系抽取方法:使用异构图将关系信息作为先验知识融入词表示,增强词表示的表示能力,使得模型能有效地处理单词实体重叠问题;使用语义融合模块将不同层次特征融合在一起作为关系分类模型的输入,使得模型能够有效地处理实体对重叠问题。所提方法在NYT和WebNLG数据集上取得了最好的效果,详细的实验也表明所提方法可以处理复杂的场景。 展开更多
关键词 实体关系抽取 异构图 语义融合 关系重叠 实体关系三元组
下载PDF
上一页 1 2 31 下一页 到第
使用帮助 返回顶部