期刊文献+
共找到15,455篇文章
< 1 2 250 >
每页显示 20 50 100
A Hand Features Based Fusion Recognition Network with Enhancing Multi-Modal Correlation
1
作者 Wei Wu Yuan Zhang +2 位作者 Yunpeng Li Chuanyang Li YanHao 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期537-555,共19页
Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and ... Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and recognition performance of the system can be enhanced through judiciously leveraging the correlation among multimodal features.Nevertheless,two issues persist in multi-modal feature fusion recognition:Firstly,the enhancement of recognition performance in fusion recognition has not comprehensively considered the inter-modality correlations among distinct modalities.Secondly,during modal fusion,improper weight selection diminishes the salience of crucial modal features,thereby diminishing the overall recognition performance.To address these two issues,we introduce an enhanced DenseNet multimodal recognition network founded on feature-level fusion.The information from the three modalities is fused akin to RGB,and the input network augments the correlation between modes through channel correlation.Within the enhanced DenseNet network,the Efficient Channel Attention Network(ECA-Net)dynamically adjusts the weight of each channel to amplify the salience of crucial information in each modal feature.Depthwise separable convolution markedly reduces the training parameters and further enhances the feature correlation.Experimental evaluations were conducted on four multimodal databases,comprising six unimodal databases,including multispectral palmprint and palm vein databases from the Chinese Academy of Sciences.The Equal Error Rates(EER)values were 0.0149%,0.0150%,0.0099%,and 0.0050%,correspondingly.In comparison to other network methods for palmprint,palm vein,and finger vein fusion recognition,this approach substantially enhances recognition performance,rendering it suitable for high-security environments with practical applicability.The experiments in this article utilized amodest sample database comprising 200 individuals.The subsequent phase involves preparing for the extension of the method to larger databases. 展开更多
关键词 BIOMETRICS multi-modal CORRELATION deep learning feature-level fusion
下载PDF
A Comprehensive Survey on Deep Learning Multi-Modal Fusion:Methods,Technologies and Applications
2
作者 Tianzhe Jiao Chaopeng Guo +2 位作者 Xiaoyue Feng Yuming Chen Jie Song 《Computers, Materials & Continua》 SCIE EI 2024年第7期1-35,共35页
Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant resear... Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges. 展开更多
关键词 multi-modal fusion REPRESENTATION TRANSLATION ALIGNMENT deep learning comparative analysis
下载PDF
Towards trustworthy multi-modal motion prediction:Holistic evaluation and interpretability of outputs
3
作者 Sandra Carrasco Limeros Sylwia Majchrowska +3 位作者 Joakim Johnander Christoffer Petersson MiguelÁngel Sotelo David Fernández Llorca 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第3期557-572,共16页
Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of po... Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of possible future trajectories can be consid-erable(multi-modal).Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpret-ability.Moreover,the metrics used in current benchmarks do not evaluate all aspects of the problem,such as the diversity and admissibility of the output.The authors aim to advance towards the design of trustworthy motion prediction systems,based on some of the re-quirements for the design of Trustworthy Artificial Intelligence.The focus is on evaluation criteria,robustness,and interpretability of outputs.First,the evaluation metrics are comprehensively analysed,the main gaps of current benchmarks are identified,and a new holistic evaluation framework is proposed.Then,a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system.To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework,an intent prediction layer that can be attached to multi-modal motion prediction models is proposed.The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions.The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autono-mous vehicles,advancing the field towards greater safety and reliability. 展开更多
关键词 autonomous vehicles EVALUATION INTERPRETABILITY multi-modal motion prediction ROBUSTNESS trustworthy AI
下载PDF
Individual consistency in spatiotemporal characteristics of migratory Whimbrels in the East Asian-Australasian Flyway
4
作者 Siwei An Fenliang Kuang +9 位作者 Wei Wu Chris J.Hassell Jonathan T.Coleman Zijing Gao Xuena Sun Yue Yuan Grace Maglio Kar-Sin K.Leung Xuesong Feng Zhijun Ma 《Avian Research》 SCIE CSCD 2024年第3期308-315,共8页
Many migratory birds exhibit interannual consistency in migration schedules,routes and stopover sites.Detecting the interannual consistency in spatiotemporal characteristics helps understand the maintenance of migrati... Many migratory birds exhibit interannual consistency in migration schedules,routes and stopover sites.Detecting the interannual consistency in spatiotemporal characteristics helps understand the maintenance of migration and enables the implementation of targeted conservation measures.We tracked the migration of Whimbrel(Numenius phaeopus)in the East Asian-Australasian Flyway and collected spatiotemporal data from individuals that were tracked for at least two years.Wilcoxon non-parametric tests were used to compare the interannual variations in the dates of departure from and arrival at breeding/nonbreeding sites,and the inter-annual variation in the longitudes when the same individual across the same latitudes.Whimbrels exhibited a high degree of consistency in the use of breeding,nonbreeding,and stopover sites between years.The variation of arrival dates at nonbreeding sites was significantly larger than that of the departure dates from nonbreeding and breeding sites.Repeatedly used stopover sites by the same individuals in multiple years were concentrated in the Yellow Sea coast during northward migration,but were more widespread during southward migration.The stopover duration at repeatedly used sites was significantly longer than that at sites used only once.When flying across the Yellow Sea,Whimbrels breeding in Sakha(Yakutia)exhibited the highest consistency in migration routes in both autumn and spring.Moreover,the consistency in migration routes of Yakutia breeding birds was generally higher than that of birds breeding in Chukotka.Our results suggest that the northward migration schedule of the Whimbrels is mainly controlled by endogenous factors,while the southward migration schedule is less affected by endogenous factors.The repeated use of stopover sites in the Yellow Sea coast suggests this region is important for the migration of Whimbrel,and thus has high conservation value. 展开更多
关键词 consistency Migration route Migration timing SHOREBIRD
下载PDF
Differentiating consistency of meningioma based on grey-level histogram analysis of T2WI
5
作者 ZHOU Fengyu HAN Tao +4 位作者 LIU Xianwang DONG Wenjie LU Ting WANG Yuanyuan ZHOU Junlin 《中国医学影像技术》 CSCD 北大核心 2024年第8期1151-1155,共5页
Objective To observe the value of grey-level histogram analysis based on T2WI for differentiating consistency of meningioma.Methods Data of 109 patients with meningioma were retrospectively analyzed.The patients were ... Objective To observe the value of grey-level histogram analysis based on T2WI for differentiating consistency of meningioma.Methods Data of 109 patients with meningioma were retrospectively analyzed.The patients were divided into hard group(n=71)and soft group(n=38)according to the consistency of tumors.Tumor ROI was outlined on axial T2WI showing the largest tumor section,gray levels were extracted and histogram analysis was performed.The value of each histogram parameter were compared between groups.Then receiver operating characteristic curve was drawn,and the area under the curve(AUC)was calculated to evaluate the efficiency for differentiating soft and hard meningioma.Results P 1,P 10,P 50,P 90,P 99 and the mean grey levels on T2WI in soft group were all higher than those in hard group(all P<0.05),while the variance,the kurtosis and the skewness were not significantly different between groups(all P>0.05).The differentiating efficiency of P 1,P 10,P 50,P 90,P 99 and the mean grey levels on T2WI were all fine,with AUC of 0.774 to 0.833,and no significant difference was found(all P>0.05).Conclusion Parameters of grey-level histogram analysis such as P 1,P 10,P 50,P 90,P 99 and the mean values based on T2WI were all valuable for differentiating soft and hard meningioma. 展开更多
关键词 MENINGIOMA consistency magnetic resonance imaging
下载PDF
Multi-modal knowledge graph inference via media convergence and logic rule
6
作者 Feng Lin Dongmei Li +5 位作者 Wenbin Zhang Dongsheng Shi Yuanzhou Jiao Qianzhong Chen Yiying Lin Wentao Zhu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第1期211-221,共11页
Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the intro... Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features. 展开更多
关键词 logic rule media convergence multi-modal knowledge graph inference representation learning
下载PDF
Research on Multi-modal In-Vehicle Intelligent Personal Assistant Design
7
作者 WANG Jia-rou TANG Cheng-xin SHUAI Liang-ying 《印刷与数字媒体技术研究》 CAS 北大核心 2024年第4期136-146,共11页
Intelligent personal assistants play a pivotal role in in-vehicle systems,significantly enhancing life efficiency,driving safety,and decision-making support.In this study,the multi-modal design elements of intelligent... Intelligent personal assistants play a pivotal role in in-vehicle systems,significantly enhancing life efficiency,driving safety,and decision-making support.In this study,the multi-modal design elements of intelligent personal assistants within the context of visual,auditory,and somatosensory interactions with drivers were discussed.Their impact on the driver’s psychological state through various modes such as visual imagery,voice interaction,and gesture interaction were explored.The study also introduced innovative designs for in-vehicle intelligent personal assistants,incorporating design principles such as driver-centricity,prioritizing passenger safety,and utilizing timely feedback as a criterion.Additionally,the study employed design methods like driver behavior research and driving situation analysis to enhance the emotional connection between drivers and their vehicles,ultimately improving driver satisfaction and trust. 展开更多
关键词 Intelligent personal assistants multi-modal design User psychology In-vehicle interaction Voice interaction Emotional design
下载PDF
Generative Multi-Modal Mutual Enhancement Video Semantic Communications
8
作者 Yuanle Chen Haobo Wang +3 位作者 Chunyu Liu Linyi Wang Jiaxin Liu Wei Wu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2985-3009,共25页
Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the... Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent. 展开更多
关键词 Generative adversarial networks multi-modal mutual enhancement video semantic transmission deep learning
下载PDF
Cross-Modal Consistency with Aesthetic Similarity for Multimodal False Information Detection
9
作者 Weijian Fan Ziwei Shi 《Computers, Materials & Continua》 SCIE EI 2024年第5期2723-2741,共19页
With the explosive growth of false information on social media platforms, the automatic detection of multimodalfalse information has received increasing attention. Recent research has significantly contributed to mult... With the explosive growth of false information on social media platforms, the automatic detection of multimodalfalse information has received increasing attention. Recent research has significantly contributed to multimodalinformation exchange and fusion, with many methods attempting to integrate unimodal features to generatemultimodal news representations. However, they still need to fully explore the hierarchical and complex semanticcorrelations between different modal contents, severely limiting their performance detecting multimodal falseinformation. This work proposes a two-stage detection framework for multimodal false information detection,called ASMFD, which is based on image aesthetic similarity to segment and explores the consistency andinconsistency features of images and texts. Specifically, we first use the Contrastive Language-Image Pre-training(CLIP) model to learn the relationship between text and images through label awareness and train an imageaesthetic attribute scorer using an aesthetic attribute dataset. Then, we calculate the aesthetic similarity betweenthe image and related images and use this similarity as a threshold to divide the multimodal correlation matrixinto consistency and inconsistencymatrices. Finally, the fusionmodule is designed to identify essential features fordetectingmultimodal false information. In extensive experiments on four datasets, the performance of the ASMFDis superior to state-of-the-art baseline methods. 展开更多
关键词 Social media false information detection image aesthetic assessment cross-modal consistency
下载PDF
Decoding the inconsistency of six cropland maps in China
10
作者 Yifeng Cui Ronggao Liu +6 位作者 Zhichao Li Chao Zhang Xiao-Peng Song Jilin Yang Le Yu Mengxi Chen Jinwei Dong 《The Crop Journal》 SCIE CSCD 2024年第1期281-294,共14页
Accurate cropland information is critical for agricultural planning and production,especially in foodstressed countries like China.Although widely used medium-to-high-resolution satellite-based cropland maps have been... Accurate cropland information is critical for agricultural planning and production,especially in foodstressed countries like China.Although widely used medium-to-high-resolution satellite-based cropland maps have been developed from various remotely sensed data sources over the past few decades,considerable discrepancies exist among these products both in total area and in spatial distribution of croplands,impeding further applications of these datasets.The factors influencing their inconsistency are also unknown.In this study,we evaluated the consistency and accuracy of six cropland maps widely used in China in circa 2020,including three state-of-the-art 10-m products(i.e.,Google Dynamic World,ESRI Land Cover,and ESA WorldCover)and three 30-m ones(i.e.,GLC_FCS30,GlobeLand 30,and CLCD).We also investigated the effects of landscape fragmentation,climate,and agricultural management.Validation using a ground-truth sample revealed that the 10-m-resolution WorldCover provided the highest accuracy(92.3%).These maps collectively overestimated Chinese cropland area by up to 56%.Up to 37%of the land showed spatial inconsistency among the maps,concentrated mainly in mountainous regions and attributed to the varying accuracy of cropland maps,cropland fragmentation and management practices such as irrigation.Our work shed light on the promotion of future cropland mapping efforts,especially in highly inconsistent regions. 展开更多
关键词 consistency and accuracy 10-and 30 m Cropland mapping Agricultural management China
下载PDF
Identifying Brand Consistency by Product Differentiation Using CNN
11
作者 Hung-Hsiang Wang Chih-Ping Chen 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期685-709,共25页
This paper presents a new method of using a convolutional neural network(CNN)in machine learning to identify brand consistency by product appearance variation.In Experiment 1,we collected fifty mouse devices from the ... This paper presents a new method of using a convolutional neural network(CNN)in machine learning to identify brand consistency by product appearance variation.In Experiment 1,we collected fifty mouse devices from the past thirty-five years from a renowned company to build a dataset consisting of product pictures with pre-defined design features of their appearance and functions.Results show that it is a challenge to distinguish periods for the subtle evolution of themouse devices with such traditionalmethods as time series analysis and principal component analysis(PCA).In Experiment 2,we applied deep learning to predict the extent to which the product appearance variation ofmouse devices of various brands.The investigation collected 6,042 images ofmouse devices and divided theminto the Early Stage and the Late Stage.Results show the highest accuracy of 81.4%with the CNNmodel,and the evaluation score of brand style consistency is 0.36,implying that the brand consistency score converted by the CNN accuracy rate is not always perfect in the real world.The relationship between product appearance variation,brand style consistency,and evaluation score is beneficial for predicting new product styles and future product style roadmaps.In addition,the CNN heat maps highlight the critical areas of design features of different styles,providing alternative clues related to the blurred boundary.The study provides insights into practical problems for designers,manufacturers,and marketers in product design.It not only contributes to the scientific understanding of design development,but also provides industry professionals with practical tools and methods to improve the design process and maintain brand consistency.Designers can use these techniques to find features that influence brand style.Then,capture these features as innovative design elements and maintain core brand values. 展开更多
关键词 Machine learning product differentiation brand consistency principal component analysis convolutional neural network computer mouse
下载PDF
Unsupervised multi-modal image translation based on the squeeze-and-excitation mechanism and feature attention module
12
作者 胡振涛 HU Chonghao +1 位作者 YANG Haoran SHUAI Weiwei 《High Technology Letters》 EI CAS 2024年第1期23-30,共8页
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera... The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable. 展开更多
关键词 multi-modal image translation generative adversarial network(GAN) squeezeand-excitation(SE)mechanism feature attention(FA)module
下载PDF
M3SC:A Generic Dataset for Mixed Multi-Modal(MMM)Sensing and Communication Integration 被引量:3
13
作者 Xiang Cheng Ziwei Huang +6 位作者 Lu Bai Haotian Zhang Mingran Sun Boxun Liu Sijiang Li Jianan Zhang Minson Lee 《China Communications》 SCIE CSCD 2023年第11期13-29,共17页
The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication ... The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research.This paper develops a novel simulation dataset,named M3SC,for mixed multi-modal(MMM)sensing-communication integration,and the generation framework of the M3SC dataset is further given.To obtain multimodal sensory data in physical space and communication data in electromagnetic space,we utilize Air-Sim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data.Furthermore,the in-depth integration and precise alignment of AirSim,WaveFarer,andWireless InSite are achieved.The M3SC dataset covers various weather conditions,multiplex frequency bands,and different times of the day.Currently,the M3SC dataset contains 1500 snapshots,including 80 RGB images,160 depth maps,80 LiDAR point clouds,256 sets of mmWave waveforms with 8 radar point clouds,and 72 channel impulse response(CIR)matrices per snapshot,thus totaling 120,000 RGB images,240,000 depth maps,120,000 LiDAR point clouds,384,000 sets of mmWave waveforms with 12,000 radar point clouds,and 108,000 CIR matrices.The data processing result presents the multi-modal sensory information and communication channel statistical properties.Finally,the MMM sensing-communication application,which can be supported by the M3SC dataset,is discussed. 展开更多
关键词 multi-modal sensing RAY-TRACING sensing-communication integration simulation dataset
下载PDF
Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images 被引量:2
14
作者 Mengyu WANG Zhiyuan YAN +2 位作者 Yingchao FENG Wenhui DIAO Xian SUN 《Journal of Geodesy and Geoinformation Science》 CSCD 2023年第4期27-39,共13页
Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively u... Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation. 展开更多
关键词 multi-modal MULTI-TASK semantic segmentation height estimation convolutional neural network
下载PDF
PowerDetector:Malicious PowerShell Script Family Classification Based on Multi-Modal Semantic Fusion and Deep Learning 被引量:1
15
作者 Xiuzhang Yang Guojun Peng +2 位作者 Dongni Zhang Yuhang Gao Chenguang Li 《China Communications》 SCIE CSCD 2023年第11期202-224,共23页
Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and ... Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and malicious detection,lacking the malicious Power Shell families classification and behavior analysis.Moreover,the state-of-the-art methods fail to capture fine-grained features and semantic relationships,resulting in low robustness and accuracy.To this end,we propose Power Detector,a novel malicious Power Shell script detector based on multimodal semantic fusion and deep learning.Specifically,we design four feature extraction methods to extract key features from character,token,abstract syntax tree(AST),and semantic knowledge graph.Then,we intelligently design four embeddings(i.e.,Char2Vec,Token2Vec,AST2Vec,and Rela2Vec) and construct a multi-modal fusion algorithm to concatenate feature vectors from different views.Finally,we propose a combined model based on transformer and CNN-Bi LSTM to implement Power Shell family detection.Our experiments with five types of Power Shell attacks show that PowerDetector can accurately detect various obfuscated and stealth PowerShell scripts,with a 0.9402 precision,a 0.9358 recall,and a 0.9374 F1-score.Furthermore,through singlemodal and multi-modal comparison experiments,we demonstrate that PowerDetector’s multi-modal embedding and deep learning model can achieve better accuracy and even identify more unknown attacks. 展开更多
关键词 deep learning malicious family detection multi-modal semantic fusion POWERSHELL
下载PDF
A survey of multi-modal learning theory
16
作者 HUANG Yu HUANG Longbo 《中山大学学报(自然科学版)(中英文)》 CAS CSCD 北大核心 2023年第5期38-49,共12页
Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empi... Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empirical performance,the theoretical foundations of deep multi-modal learning have yet to be fully explored.In this paper,we will undertake a comprehensive survey of recent developments in multi-modal learning theories,focusing on the fundamental properties that govern this field.Our goal is to provide a thorough collection of current theoretical tools for analyzing multi-modal learning,to clarify their implications for practitioners,and to suggest future directions for the establishment of a solid theoretical foundation for deep multi-modal learning. 展开更多
关键词 multi-modal learning machine learning theory OPTIMIZATION GENERALIZATION
下载PDF
Multi-Modal Military Event Extraction Based on Knowledge Fusion
17
作者 Yuyuan Xiang Yangli Jia +1 位作者 Xiangliang Zhang Zhenling Zhang 《Computers, Materials & Continua》 SCIE EI 2023年第10期97-114,共18页
Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event eleme... Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event elements from multi-modal data remains a challenging task due to the presence of a large number of images and overlapping event elements in the data.Although researchers have proposed various methods to accomplish this task,most existing event extraction models cannot address these challenges because they are only applicable to text scenarios.To solve the above issues,this paper proposes a multi-modal event extraction method based on knowledge fusion.Specifically,for event-type recognition,we use a meticulous pipeline approach that integrates multiple pre-trained models.This approach enables a more comprehensive capture of the multidimensional event semantic features present in military texts,thereby enhancing the interconnectedness of information between trigger words and events.For event element extraction,we propose a method for constructing a priori templates that combine event types with corresponding trigger words.This approach facilitates the acquisition of fine-grained input samples containing event trigger words,thus enabling the model to understand the semantic relationships between elements in greater depth.Furthermore,a fusion method for spatial mapping of textual event elements and image elements is proposed to reduce the category number overload and effectively achieve multi-modal knowledge fusion.The experimental results based on the CCKS 2022 dataset show that our method has achieved competitive results,with a comprehensive evaluation value F1-score of 53.4%for the model.These results validate the effectiveness of our method in extracting event elements from multi-modal data. 展开更多
关键词 Event extraction multi-modal knowledge fusion pre-trained models
下载PDF
A Data Consistency Insurance Method for Smart Contract
18
作者 Jing Deng Xiaofei Xing +4 位作者 Guoqiang Deng Ning Hu Shen Su Le Wang Md Zakirul Alam Bhuiyan 《Computers, Materials & Continua》 SCIE EI 2023年第9期3783-3795,共13页
As one of the major threats to the current DeFi(Decentralized Finance)ecosystem,reentrant attack induces data inconsistency of the victim smart contract,enabling attackers to steal on-chain assets from DeFi projects,w... As one of the major threats to the current DeFi(Decentralized Finance)ecosystem,reentrant attack induces data inconsistency of the victim smart contract,enabling attackers to steal on-chain assets from DeFi projects,which could terribly do harm to the confidence of the blockchain investors.However,protecting DeFi projects from the reentrant attack is very difficult,since generating a call loop within the highly automatic DeFi ecosystem could be very practicable.Existing researchers mainly focus on the detection of the reentrant vulnerabilities in the code testing,and no method could promise the non-existent of reentrant vulnerabilities.In this paper,we introduce the database lock mechanism to isolate the correlated smart contract states from other operations in the same contract,so that we can prevent the attackers from abusing the inconsistent smart contract state.Compared to the existing resolutions of front-running,code audit,andmodifier,our method guarantees protection resultswith better flexibility.And we further evaluate our method on a number of de facto reentrant attacks observed from Etherscan.The results prove that our method could efficiently prevent the reentrant attack with less running cost. 展开更多
关键词 Blockchain smart contract data consistency reentrancy attack
下载PDF
A multi-modal clustering method for traditonal Chinese medicine clinical data via media convergence
19
作者 Jingna Si Ziwei Tian +6 位作者 Dongmei Li Lei Zhang Lei Yao Wenjuan Jiang Jia Liu Runshun Zhang Xiaoping Zhang 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期390-400,共11页
Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaini... Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaining consistent and complementary information among multiple modalities through media convergence can provide technical support for clustering.This article presents an approach based on Media Convergence and Graph convolution Encoder Clustering(MCGEC)for traditonal Chinese medicine(TCM)clinical data.It feeds modal information and graph structure from media information into a multi-modal graph convolution encoder to obtain the media feature representation learnt from multiple modalities.MCGEC captures latent information from various modalities by fusion and optimises the feature representations and network architecture with learnt clustering labels.The experiment is conducted on real-world multimodal TCM clinical data,including information like images and text.MCGEC has improved clustering results compared to the generic single-modal clustering methods and the current more advanced multi-modal clustering methods.MCGEC applied to TCM clinical datasets can achieve better results.Integrating multimedia features into clustering algorithms offers significant benefits compared to single-modal clustering approaches that simply concatenate features from different modalities.It provides practical technical support for multi-modal clustering in the TCM field incorporating multimedia features. 展开更多
关键词 graph convolutional encoder media convergence multi-modal clustering traditional Chinese medicine
下载PDF
DCRL-KG: Distributed Multi-Modal Knowledge Graph Retrieval Platform Based on Collaborative Representation Learning
20
作者 Leilei Li Yansheng Fu +6 位作者 Dongjie Zhu Xiaofang Li Yundong Sun Jianrui Ding Mingrui Wu Ning Cao Russell Higgs 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3295-3307,共13页
The knowledge graph with relational abundant information has been widely used as the basic data support for the retrieval platforms.Image and text descriptions added to the knowledge graph enrich the node information,... The knowledge graph with relational abundant information has been widely used as the basic data support for the retrieval platforms.Image and text descriptions added to the knowledge graph enrich the node information,which accounts for the advantage of the multi-modal knowledge graph.In the field of cross-modal retrieval platforms,multi-modal knowledge graphs can help to improve retrieval accuracy and efficiency because of the abundant relational infor-mation provided by knowledge graphs.The representation learning method is sig-nificant to the application of multi-modal knowledge graphs.This paper proposes a distributed collaborative vector retrieval platform(DCRL-KG)using the multi-modal knowledge graph VisualSem as the foundation to achieve efficient and high-precision multimodal data retrieval.Firstly,use distributed technology to classify and store the data in the knowledge graph to improve retrieval efficiency.Secondly,this paper uses BabelNet to expand the knowledge graph through multi-ple filtering processes and increase the diversification of information.Finally,this paper builds a variety of retrieval models to achieve the fusion of retrieval results through linear combination methods to achieve high-precision language retrieval and image retrieval.The paper uses sentence retrieval and image retrieval experi-ments to prove that the platform can optimize the storage structure of the multi-modal knowledge graph and have good performance in multi-modal space. 展开更多
关键词 multi-modal retrieval distributed storage knowledge graph
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部