期刊文献+
共找到1,332篇文章
< 1 2 67 >
每页显示 20 50 100
Cross-Modal Consistency with Aesthetic Similarity for Multimodal False Information Detection
1
作者 Weijian Fan Ziwei Shi 《Computers, Materials & Continua》 SCIE EI 2024年第5期2723-2741,共19页
With the explosive growth of false information on social media platforms, the automatic detection of multimodalfalse information has received increasing attention. Recent research has significantly contributed to mult... With the explosive growth of false information on social media platforms, the automatic detection of multimodalfalse information has received increasing attention. Recent research has significantly contributed to multimodalinformation exchange and fusion, with many methods attempting to integrate unimodal features to generatemultimodal news representations. However, they still need to fully explore the hierarchical and complex semanticcorrelations between different modal contents, severely limiting their performance detecting multimodal falseinformation. This work proposes a two-stage detection framework for multimodal false information detection,called ASMFD, which is based on image aesthetic similarity to segment and explores the consistency andinconsistency features of images and texts. Specifically, we first use the Contrastive Language-Image Pre-training(CLIP) model to learn the relationship between text and images through label awareness and train an imageaesthetic attribute scorer using an aesthetic attribute dataset. Then, we calculate the aesthetic similarity betweenthe image and related images and use this similarity as a threshold to divide the multimodal correlation matrixinto consistency and inconsistencymatrices. Finally, the fusionmodule is designed to identify essential features fordetectingmultimodal false information. In extensive experiments on four datasets, the performance of the ASMFDis superior to state-of-the-art baseline methods. 展开更多
关键词 Social media false information detection image aesthetic assessment cross-modal consistency
下载PDF
Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism
2
作者 Lujuan Deng Boyi Liu Zuhe Li 《Computers, Materials & Continua》 SCIE EI 2024年第1期1157-1170,共14页
Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fu... Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fusion method does not utilize the correlation information between modalities.To solve this problem,this paper proposes amodel based on amulti-head attention mechanism.First,after preprocessing the original data.Then,the feature representation is converted into a sequence of word vectors and positional encoding is introduced to better understand the semantic and sequential information in the input sequence.Next,the input coding sequence is fed into the transformer model for further processing and learning.At the transformer layer,a cross-modal attention consisting of a pair of multi-head attention modules is employed to reflect the correlation between modalities.Finally,the processed results are input into the feedforward neural network to obtain the emotional output through the classification layer.Through the above processing flow,the model can capture semantic information and contextual relationships and achieve good results in various natural language processing tasks.Our model was tested on the CMU Multimodal Opinion Sentiment and Emotion Intensity(CMU-MOSEI)and Multimodal EmotionLines Dataset(MELD),achieving an accuracy of 82.04% and F1 parameters reached 80.59% on the former dataset. 展开更多
关键词 Emotion analysis deep learning cross-modal attention mechanism
下载PDF
A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition 被引量:1
3
作者 Peizhu Gong Jin Liu +3 位作者 Zhongdai Wu Bing Han YKenWang Huihua He 《Computers, Materials & Continua》 SCIE EI 2023年第2期4203-4220,共18页
Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due... Speech emotion recognition,as an important component of humancomputer interaction technology,has received increasing attention.Recent studies have treated emotion recognition of speech signals as a multimodal task,due to its inclusion of the semantic features of two different modalities,i.e.,audio and text.However,existing methods often fail in effectively represent features and capture correlations.This paper presents a multi-level circulant cross-modal Transformer(MLCCT)formultimodal speech emotion recognition.The proposed model can be divided into three steps,feature extraction,interaction and fusion.Self-supervised embedding models are introduced for feature extraction,which give a more powerful representation of the original data than those using spectrograms or audio features such as Mel-frequency cepstral coefficients(MFCCs)and low-level descriptors(LLDs).In particular,MLCCT contains two types of feature interaction processes,where a bidirectional Long Short-term Memory(Bi-LSTM)with circulant interaction mechanism is proposed for low-level features,while a two-stream residual cross-modal Transformer block is appliedwhen high-level features are involved.Finally,we choose self-attention blocks for fusion and a fully connected layer to make predictions.To evaluate the performance of our proposed model,comprehensive experiments are conducted on three widely used benchmark datasets including IEMOCAP,MELD and CMU-MOSEI.The competitive results verify the effectiveness of our approach. 展开更多
关键词 Speech emotion recognition self-supervised embedding model cross-modal transformer self-attention
下载PDF
Mechanism of Cross-modal Information Influencing Taste 被引量:1
4
作者 Pei LIANG Jia-yu JIANG +2 位作者 Qiang LIU Su-lin ZHANG Hua-jing YANG 《Current Medical Science》 SCIE CAS 2020年第3期474-479,共6页
Studies on the integration of cross-modal information with taste perception has been mostly limited to uni-modal level.The cross-modal sensory interaction and the neural network of information processing and its contr... Studies on the integration of cross-modal information with taste perception has been mostly limited to uni-modal level.The cross-modal sensory interaction and the neural network of information processing and its control were not fully explored and the mechanisms remain poorly understood.This mini review investigated the impact of uni-modal and multi-modal information on the taste perception,from the perspective of cognitive status,such as emotion,expectation and attention,and discussed the hypothesis that the cognitive status is the key step for visual sense to exert influence on taste.This work may help researchers better understand the mechanism of cross-modal information processing and further develop neutrally-based artificial intelligent(AI)system. 展开更多
关键词 cross-modal information integration cognitive status taste perception
下载PDF
CSMCCVA:Framework of cross-modal semantic mapping based on cognitive computing of visual and auditory sensations 被引量:1
5
作者 刘扬 Zheng Fengbin Zuo Xianyu 《High Technology Letters》 EI CAS 2016年第1期90-98,共9页
Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of co... Cross-modal semantic mapping and cross-media retrieval are key problems of the multimedia search engine.This study analyzes the hierarchy,the functionality,and the structure in the visual and auditory sensations of cognitive system,and establishes a brain-like cross-modal semantic mapping framework based on cognitive computing of visual and auditory sensations.The mechanism of visual-auditory multisensory integration,selective attention in thalamo-cortical,emotional control in limbic system and the memory-enhancing in hippocampal were considered in the framework.Then,the algorithms of cross-modal semantic mapping were given.Experimental results show that the framework can be effectively applied to the cross-modal semantic mapping,and also provides an important significance for brain-like computing of non-von Neumann structure. 展开更多
关键词 multimedia neural cognitive computing (MNCC) brain-like computing cross-modal semantic mapping (CSM) selective attention limbic system multisensory integration memory-enhancing mechanism
下载PDF
Use of sensory substitution devices as a model system for investigating cross-modal neuroplasticity in humans 被引量:1
6
作者 Amy C.Nau Matthew C.Murphy Kevin C.Chan 《Neural Regeneration Research》 SCIE CAS CSCD 2015年第11期1717-1719,共3页
Blindness provides an unparalleled opportunity to study plasticity of the nervous system in humans.Seminal work in this area examined the often dramatic modifications to the visual cortex that result when visual input... Blindness provides an unparalleled opportunity to study plasticity of the nervous system in humans.Seminal work in this area examined the often dramatic modifications to the visual cortex that result when visual input is completely absent from birth or very early in life(Kupers and Ptito,2014).More recent studies explored what happens to the visual pathways in the context of acquired blindness.This is particularly relevant as the majority of diseases that cause vision loss occur in the elderly. 展开更多
关键词 Use of sensory substitution devices as a model system for investigating cross-modal neuroplasticity in humans BOLD
下载PDF
TECMH:Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval
7
作者 Qiqi Li Longfei Ma +2 位作者 Zheng Jiang Mingyong Li Bo Jin 《Computers, Materials & Continua》 SCIE EI 2023年第5期3713-3728,共16页
In recent years,cross-modal hash retrieval has become a popular research field because of its advantages of high efficiency and low storage.Cross-modal retrieval technology can be applied to search engines,crossmodalm... In recent years,cross-modal hash retrieval has become a popular research field because of its advantages of high efficiency and low storage.Cross-modal retrieval technology can be applied to search engines,crossmodalmedical processing,etc.The existing main method is to use amulti-label matching paradigm to finish the retrieval tasks.However,such methods do not use fine-grained information in the multi-modal data,which may lead to suboptimal results.To avoid cross-modal matching turning into label matching,this paper proposes an end-to-end fine-grained cross-modal hash retrieval method,which can focus more on the fine-grained semantic information of multi-modal data.First,the method refines the image features and no longer uses multiple labels to represent text features but uses BERT for processing.Second,this method uses the inference capabilities of the transformer encoder to generate global fine-grained features.Finally,in order to better judge the effect of the fine-grained model,this paper uses the datasets in the image text matching field instead of the traditional label-matching datasets.This article experiment on Microsoft COCO(MS-COCO)and Flickr30K datasets and compare it with the previous classicalmethods.The experimental results show that this method can obtain more advanced results in the cross-modal hash retrieval field. 展开更多
关键词 Deep learning cross-modal retrieval hash learning TRANSFORMER
下载PDF
Cross-Modal Hashing Retrieval Based on Deep Residual Network
8
作者 Zhiyi Li Xiaomian Xu +1 位作者 Du Zhang Peng Zhang 《Computer Systems Science & Engineering》 SCIE EI 2021年第2期383-405,共23页
In the era of big data rich inWe Media,the single mode retrieval system has been unable to meet people’s demand for information retrieval.This paper proposes a new solution to the problem of feature extraction and un... In the era of big data rich inWe Media,the single mode retrieval system has been unable to meet people’s demand for information retrieval.This paper proposes a new solution to the problem of feature extraction and unified mapping of different modes:A Cross-Modal Hashing retrieval algorithm based on Deep Residual Network(CMHR-DRN).The model construction is divided into two stages:The first stage is the feature extraction of different modal data,including the use of Deep Residual Network(DRN)to extract the image features,using the method of combining TF-IDF with the full connection network to extract the text features,and the obtained image and text features used as the input of the second stage.In the second stage,the image and text features are mapped into Hash functions by supervised learning,and the image and text features are mapped to the common binary Hamming space.In the process of mapping,the distance measurement of the original distance measurement and the common feature space are kept unchanged as far as possible to improve the accuracy of Cross-Modal Retrieval.In training the model,adaptive moment estimation(Adam)is used to calculate the adaptive learning rate of each parameter,and the stochastic gradient descent(SGD)is calculated to obtain the minimum loss function.The whole training process is completed on Caffe deep learning framework.Experiments show that the proposed algorithm CMHR-DRN based on Deep Residual Network has better retrieval performance and stronger advantages than other Cross-Modal algorithms CMFH,CMDN and CMSSH. 展开更多
关键词 Deep residual network cross-modal retrieval HASHING cross-modal hashing retrieval based on deep residual network
下载PDF
ViT2CMH:Vision Transformer Cross-Modal Hashing for Fine-Grained Vision-Text Retrieval
9
作者 Mingyong Li Qiqi Li +1 位作者 Zheng Jiang Yan Ma 《Computer Systems Science & Engineering》 SCIE EI 2023年第8期1401-1414,共14页
In recent years,the development of deep learning has further improved hash retrieval technology.Most of the existing hashing methods currently use Convolutional Neural Networks(CNNs)and Recurrent Neural Networks(RNNs)... In recent years,the development of deep learning has further improved hash retrieval technology.Most of the existing hashing methods currently use Convolutional Neural Networks(CNNs)and Recurrent Neural Networks(RNNs)to process image and text information,respectively.This makes images or texts subject to local constraints,and inherent label matching cannot capture finegrained information,often leading to suboptimal results.Driven by the development of the transformer model,we propose a framework called ViT2CMH mainly based on the Vision Transformer to handle deep Cross-modal Hashing tasks rather than CNNs or RNNs.Specifically,we use a BERT network to extract text features and use the vision transformer as the image network of the model.Finally,the features are transformed into hash codes for efficient and fast retrieval.We conduct extensive experiments on Microsoft COCO(MS-COCO)and Flickr30K,comparing with baselines of some hashing methods and image-text matching methods,showing that our method has better performance. 展开更多
关键词 Hash learning cross-modal retrieval fine-grained matching TRANSFORMER
下载PDF
Adequate alignment and interaction for cross-modal retrieval
10
作者 Mingkang WANG Min MENG +1 位作者 Jigang LIU Jigang WU 《Virtual Reality & Intelligent Hardware》 EI 2023年第6期509-522,共14页
Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing... Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing.Recently,visual and semantic embedding(VSE)learning has shown promising improvements in image text retrieval tasks.Most existing VSE models employ two unrelated encoders to extract features and then use complex methods to contextualize and aggregate these features into holistic embeddings.Despite recent advances,existing approaches still suffer from two limitations:(1)without considering intermediate interactions and adequate alignment between different modalities,these models cannot guarantee the discriminative ability of representations;and(2)existing feature aggregators are susceptible to certain noisy regions,which may lead to unreasonable pooling coefficients and affect the quality of the final aggregated features.Methods To address these challenges,we propose a novel cross-modal retrieval model containing a well-designed alignment module and a novel multimodal fusion encoder that aims to learn the adequate alignment and interaction of aggregated features to effectively bridge the modality gap.Results Experiments on the Microsoft COCO and Flickr30k datasets demonstrated the superiority of our model over state-of-the-art methods. 展开更多
关键词 cross-modal retrieval Visual semantic embedding Feature aggregation Transformer
下载PDF
Review of Visible-Infrared Cross-Modality Person Re-Identification
11
作者 Yinyin Zhang 《Journal of New Media》 2023年第1期23-31,共9页
Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cros... Person re-identification(ReID)is a sub-problem under image retrieval.It is a technology that uses computer vision to identify a specific pedestrian in a collection of pictures or videos.The pedestrian image under cross-device is taken from a monitored pedestrian image.At present,most ReID methods deal with the matching between visible and visible images,but with the continuous improvement of security monitoring system,more and more infrared cameras are used to monitor at night or in dim light.Due to the image differences between infrared camera and RGB camera,there is a huge visual difference between cross-modality images,so the traditional ReID method is difficult to apply in this scene.In view of this situation,studying the pedestrian matching between visible and infrared modalities is particularly crucial.Visible-infrared person re-identification(VI-ReID)was first proposed in 2017,and then attracted more and more attention,and many advanced methods emerged. 展开更多
关键词 Person re-identification cross-modality
下载PDF
跨境电商“全托管浪潮”背景下我国外贸企业的应对策略
12
作者 黄炫洲 《西部学刊》 2024年第13期145-149,共5页
跨境电商全托管模式近两年风生水起,迅速成长为我国跨境电商出口业态的主流模式,重构跨境电商价值链给我国的外贸生态圈带来了全面而深远的影响。这种新模式为企业开拓海外市场提供了极大便利,助推我国跨境电商出口订单快速增长,但同时... 跨境电商全托管模式近两年风生水起,迅速成长为我国跨境电商出口业态的主流模式,重构跨境电商价值链给我国的外贸生态圈带来了全面而深远的影响。这种新模式为企业开拓海外市场提供了极大便利,助推我国跨境电商出口订单快速增长,但同时也让企业丧失了大部分营销自主权,压缩了企业利润,加剧了价格竞争,在业界引发了褒贬不一的评价。外贸企业要及时调整经营布局和产品开发理念,拓宽和创新营销渠道,重视数智化赋能,加快数字化转型升级,方可有效应对全托管浪潮带来的挑战与不利影响,化弊为利,实现外贸高质量发展。 展开更多
关键词 全托管模式 跨境电商 外贸企业
下载PDF
基于VMD的广义三次互相关管道泄漏定位检测
13
作者 王冬梅 童影力 +1 位作者 何壮 路敬祎 《压力容器》 北大核心 2024年第2期72-80,共9页
针对天然气管道泄漏检测声波定位技术中,二次互相关时延估计算法存在较大误差的问题,提出了一种基于变分模态分解(VMD)结合广义三次互相关的时延估计算法。该方法首先利用VMD算法对两路信号进行分解并重构信号;其次,在二次互相关的基础... 针对天然气管道泄漏检测声波定位技术中,二次互相关时延估计算法存在较大误差的问题,提出了一种基于变分模态分解(VMD)结合广义三次互相关的时延估计算法。该方法首先利用VMD算法对两路信号进行分解并重构信号;其次,在二次互相关的基础上再进行一次相关,并在互相关算法的峰值检测阶段引入希尔伯特变换(HT),对峰值进行尖锐化处理,成为一种新型的广义三次互相关时延估计算法。通过对平台搭建的油气管道泄漏检测系统采集数据进行模拟试验,分析了各算法的精度。试验表明,相较于二次互相关,改进广义三次互相关时延估计算法定位平均精度有明显的提升,有着更高的精度和更好的抗噪性能,在天然气管道泄漏定位方面有着更广泛的应用前景。 展开更多
关键词 管道泄漏检测 变分模态分解 广义三次互相关 希尔伯特变换(HT)
下载PDF
交叉裂隙砂岩样单轴力学特性与扩展破坏 被引量:1
14
作者 沈顺超 肖桃李 +1 位作者 折海成 陈祥 《科学技术与工程》 北大核心 2024年第9期3766-3772,共7页
天然岩体中存在大量交叉裂隙,严重影响岩体构筑物的变形、破坏和失稳规律。以类砂岩材料制作主、次不同裂隙倾角岩样为研究对象,通过采用单轴压缩和数字图像处理(digital image processing,DIC)摄像等试验方法,研究交叉裂隙产状对岩样... 天然岩体中存在大量交叉裂隙,严重影响岩体构筑物的变形、破坏和失稳规律。以类砂岩材料制作主、次不同裂隙倾角岩样为研究对象,通过采用单轴压缩和数字图像处理(digital image processing,DIC)摄像等试验方法,研究交叉裂隙产状对岩样的力学特性和破坏模式的影响。结果表明:与单一裂隙相比,单裂隙岩样的峰值强度和弹性模量总体上大于含交叉裂隙,表明交叉次裂隙会进一步削弱岩样力学性质。当α≤45°和α=60°且β≤45°时,次裂隙尖端裂纹萌生基本上早于主裂隙,次裂隙尖端裂纹未发生扩展贯通,主裂隙尖端裂纹发生贯通;在α=60°且β≥60°和α=90°时,主裂隙尖端裂纹萌生基本上早于次裂隙,次裂隙尖端裂纹扩展贯通,主裂隙尖端裂纹未发生贯通;裂纹主要沿上下方向扩展,水平向裂纹较少。其中α为交叉裂隙主裂隙与水平方向倾角,β为次裂隙与主裂缝方向倾角。岩样以张拉破坏为主,α为0°和90°时,岩样全部表现为张拉破坏,α为30°、45°和60°时,岩样随β增大表现为从剪切破坏—拉剪复合破坏—张拉破坏的变化,主破坏裂纹受主、次裂隙水平向投影较长的一条控制。 展开更多
关键词 单轴压缩 交叉裂隙 裂纹扩展 破坏模式
下载PDF
超声速边界层自然转捩影响因素研究
15
作者 樊佳坤 谢露 +2 位作者 徐家宽 乔磊 白俊强 《中国民航大学学报》 CAS 2024年第5期28-35,51,共9页
为了研究马赫数、雷诺数、后掠角、壁面温度和压力梯度等因素对超声速边界层自然转捩横流模态和Oblique T-S模态的影响,在平板、翼型和无限展长后掠翼等二维或准三维流动中使用线性稳定性理论(LST,linear stability theory)和eN转捩预... 为了研究马赫数、雷诺数、后掠角、壁面温度和压力梯度等因素对超声速边界层自然转捩横流模态和Oblique T-S模态的影响,在平板、翼型和无限展长后掠翼等二维或准三维流动中使用线性稳定性理论(LST,linear stability theory)和eN转捩预测方法,通过变参数分析研究了这几类影响因素分别对横流模态和Oblique T-S模态N因子的作用效果。基于文中算例所设置的来流和壁面条件进行计算,计算结果表明,横流模态N因子增长速率分别与来流马赫数、雷诺数呈正相关;随着后掠角从30°增加到60°,横流模态N因子值的增长不断加快,超过60°以后驻波N因子值增长开始减弱,而行波N因子值在60°~75°范围内变化不明显,可以推断行波N因子值开始衰减的临界后掠角更大;Oblique T-S模态N因子值的增长速率分别与来流雷诺数、壁面温度与来流温度的比值呈正相关,当来流马赫数越大或者顺压力梯度越大时增长速率减小。因此,来流马赫数的变化对自然转捩的影响与所分析的不稳定模态类型有关,而来流雷诺数的变化对自然转捩横流模态和Oblique T-S模态N因子的影响趋势是一致的。 展开更多
关键词 横流模态 Oblique T-S模态 马赫数 雷诺数 后掠角 壁面温度
下载PDF
基于HCCSMO的永磁同步直线电机无传感器控制
16
作者 原东昇 周扬 +1 位作者 尹忠刚 白聪 《电力电子技术》 2024年第6期32-36,45,共6页
基于滑模观测器(SMO)的永磁同步直线电机(PMSLM)无传感器控制存在抖振和估计反电动势谐波,会造成速度和位置的估计误差。为解决上述问题,提出了一种谐波交叉消除滑模观测器(HCCSMO)。首先采用Sigmoid函数替代符号函数以削弱抖振,然后引... 基于滑模观测器(SMO)的永磁同步直线电机(PMSLM)无传感器控制存在抖振和估计反电动势谐波,会造成速度和位置的估计误差。为解决上述问题,提出了一种谐波交叉消除滑模观测器(HCCSMO)。首先采用Sigmoid函数替代符号函数以削弱抖振,然后引入谐波交叉消除二阶广义积分器(SOGI)代替低通滤波器对估计反电动势进行处理,抑制由定位力产生的谐波分量,减小SMO的速度与位置估计误差。仿真与实验结果均表明所提出的方法能够有效抑制SMO的估计误差,提高速度和位置观测精度。 展开更多
关键词 电机 无传感器控制 谐波交叉消除滑模观测器
下载PDF
基于数据挖掘的交叉学科教学模式探索与实践
17
作者 霍鑫 孟姣 +2 位作者 鲁佳 张华 史维佳 《现代信息科技》 2024年第14期190-193,198,共5页
为解决大数据时代国计民生相关的复杂问题,顺应教育改革、健康中国的战略方针,加强交叉学科建设和发展已经成为科技发展的必然趋势。新工科融合新医科能够形成医工交叉的学科背景,对此进行教学模式探索与实践设计,通过人工智能相关的专... 为解决大数据时代国计民生相关的复杂问题,顺应教育改革、健康中国的战略方针,加强交叉学科建设和发展已经成为科技发展的必然趋势。新工科融合新医科能够形成医工交叉的学科背景,对此进行教学模式探索与实践设计,通过人工智能相关的专业课程挖掘生物信号中的隐藏特征,实现对疾病进行辅助诊断的同时为生物标志物的研究提供参考。该教学模式致力于发挥数字科学技术的引领作用,推动医学工程进入智能时代,为新时代培养更多背景复合型人才。 展开更多
关键词 学科融合 医工交叉 教学模式 人才培养
下载PDF
Attention-Enhanced Voice Portrait Model Using Generative Adversarial Network
18
作者 Jingyi Mao Yuchen Zhou +3 位作者 YifanWang Junyu Li Ziqing Liu Fanliang Bu 《Computers, Materials & Continua》 SCIE EI 2024年第4期837-855,共19页
Voice portrait technology has explored and established the relationship between speakers’ voices and their facialfeatures, aiming to generate corresponding facial characteristics by providing the voice of an unknown ... Voice portrait technology has explored and established the relationship between speakers’ voices and their facialfeatures, aiming to generate corresponding facial characteristics by providing the voice of an unknown speaker.Due to its powerful advantages in image generation, Generative Adversarial Networks (GANs) have now beenwidely applied across various fields. The existing Voice2Face methods for voice portraits are primarily based onGANs trained on voice-face paired datasets. However, voice portrait models solely constructed on GANs facelimitations in image generation quality and struggle to maintain facial similarity. Additionally, the training processis relatively unstable, thereby affecting the overall generative performance of the model. To overcome the abovechallenges,wepropose a novel deepGenerativeAdversarialNetworkmodel for audio-visual synthesis, namedAVPGAN(Attention-enhanced Voice Portrait Model using Generative Adversarial Network). This model is based ona convolutional attention mechanism and is capable of generating corresponding facial images from the voice ofan unknown speaker. Firstly, to address the issue of training instability, we integrate convolutional neural networkswith deep GANs. In the network architecture, we apply spectral normalization to constrain the variation of thediscriminator, preventing issues such as mode collapse. Secondly, to enhance the model’s ability to extract relevantfeatures between the two modalities, we propose a voice portrait model based on convolutional attention. Thismodel learns the mapping relationship between voice and facial features in a common space from both channeland spatial dimensions independently. Thirdly, to enhance the quality of generated faces, we have incorporated adegradation removal module and utilized pretrained facial GANs as facial priors to repair and enhance the clarityof the generated facial images. Experimental results demonstrate that our AVP-GAN achieved a cosine similarity of0.511, outperforming the performance of our comparison model, and effectively achieved the generation of highqualityfacial images corresponding to a speaker’s voice. 展开更多
关键词 cross-modal generation GANs voice portrait technology face synthesis
下载PDF
Fake News Detection Based on Text-Modal Dominance and Fusing Multiple Multi-Model Clues
19
作者 Li fang Fu Huanxin Peng +1 位作者 Changjin Ma Yuhan Liu 《Computers, Materials & Continua》 SCIE EI 2024年第3期4399-4416,共18页
In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure in... In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure information has proven to be effective in fake news detection and how to combine it while reducing the noise information is critical.Unfortunately,existing approaches fail to handle these problems.This paper proposes a multi-model fake news detection framework based on Tex-modal Dominance and fusing Multiple Multi-model Cues(TD-MMC),which utilizes three valuable multi-model clues:text-model importance,text-image complementary,and text-image inconsistency.TD-MMC is dominated by textural content and assisted by image information while using social network information to enhance text representation.To reduce the irrelevant social structure’s information interference,we use a unidirectional cross-modal attention mechanism to selectively learn the social structure’s features.A cross-modal attention mechanism is adopted to obtain text-image cross-modal features while retaining textual features to reduce the loss of important information.In addition,TD-MMC employs a new multi-model loss to improve the model’s generalization ability.Extensive experiments have been conducted on two public real-world English and Chinese datasets,and the results show that our proposed model outperforms the state-of-the-art methods on classification evaluation metrics. 展开更多
关键词 Fake news detection cross-modal attention mechanism multi-modal fusion social network transfer learning
下载PDF
Guided-YNet: Saliency Feature-Guided Interactive Feature Enhancement Lung Tumor Segmentation Network
20
作者 Tao Zhou Yunfeng Pan +3 位作者 Huiling Lu Pei Dang Yujie Guo Yaxing Wang 《Computers, Materials & Continua》 SCIE EI 2024年第9期4813-4832,共20页
Multimodal lung tumor medical images can provide anatomical and functional information for the same lesion.Such as Positron Emission Computed Tomography(PET),Computed Tomography(CT),and PET-CT.How to utilize the lesio... Multimodal lung tumor medical images can provide anatomical and functional information for the same lesion.Such as Positron Emission Computed Tomography(PET),Computed Tomography(CT),and PET-CT.How to utilize the lesion anatomical and functional information effectively and improve the network segmentation performance are key questions.To solve the problem,the Saliency Feature-Guided Interactive Feature Enhancement Lung Tumor Segmentation Network(Guide-YNet)is proposed in this paper.Firstly,a double-encoder single-decoder U-Net is used as the backbone in this model,a single-coder single-decoder U-Net is used to generate the saliency guided feature using PET image and transmit it into the skip connection of the backbone,and the high sensitivity of PET images to tumors is used to guide the network to accurately locate lesions.Secondly,a Cross Scale Feature Enhancement Module(CSFEM)is designed to extract multi-scale fusion features after downsampling.Thirdly,a Cross-Layer Interactive Feature Enhancement Module(CIFEM)is designed in the encoder to enhance the spatial position information and semantic information.Finally,a Cross-Dimension Cross-Layer Feature Enhancement Module(CCFEM)is proposed in the decoder,which effectively extractsmultimodal image features through global attention and multi-dimension local attention.The proposed method is verified on the lung multimodal medical image datasets,and the results showthat theMean Intersection overUnion(MIoU),Accuracy(Acc),Dice Similarity Coefficient(Dice),Volumetric overlap error(Voe),Relative volume difference(Rvd)of the proposed method on lung lesion segmentation are 87.27%,93.08%,97.77%,95.92%,89.28%,and 88.68%,respectively.It is of great significance for computer-aided diagnosis. 展开更多
关键词 Medical image segmentation U-Net saliency feature guidance cross-modal feature enhancement cross-dimension feature enhancement
下载PDF
上一页 1 2 67 下一页 到第
使用帮助 返回顶部