期刊文献+
共找到329,445篇文章
< 1 2 250 >
每页显示 20 50 100
Multi-Modal Data Analysis Based Game Player Experience Modeling Using LSTM-DNN 被引量:1
1
作者 Sehar Shahzad Farooq Mustansar Fiaz +4 位作者 Irfan Mehmood Ali Kashif Bashir Raheel Nawaz KyungJoong Kim Soon Ki Jung 《Computers, Materials & Continua》 SCIE EI 2021年第9期4087-4108,共22页
Game player modeling is a paradigm of computational models to exploit players’behavior and experience using game and player analytics.Player modeling refers to descriptions of players based on frameworks of data deri... Game player modeling is a paradigm of computational models to exploit players’behavior and experience using game and player analytics.Player modeling refers to descriptions of players based on frameworks of data derived from the interaction of a player’s behavior within the game as well as the player’s experience with the game.Player behavior focuses on dynamic and static information gathered at the time of gameplay.Player experience concerns the association of the human player during gameplay,which is based on cognitive and affective physiological measurements collected from sensors mounted on the player’s body or in the player’s surroundings.In this paper,player experience modeling is studied based on the board puzzle game“Candy Crush Saga”using cognitive data of players accessed by physiological and peripheral devices.Long Short-Term Memory-based Deep Neural Network(LSTM-DNN)is used to predict players’effective states in terms of valence,arousal,dominance,and liking by employing the concept of transfer learning.Transfer learning focuses on gaining knowledge while solving one problem and using the same knowledge to solve different but related problems.The homogeneous transfer learning approach has not been implemented in the game domain before,and this novel study opens a new research area for the game industry where the main challenge is predicting the significance of innovative games for entertainment and players’engagement.Relevant not only from a player’s point of view,it is also a benchmark study for game developers who have been facing problems of“cold start”for innovative games that strengthen the game industrial economy. 展开更多
关键词 Game player modeling experience modeling player analytics deep learning LSTM game play data Candy Crush Saga
下载PDF
A Hand Features Based Fusion Recognition Network with Enhancing Multi-Modal Correlation
2
作者 Wei Wu Yuan Zhang +2 位作者 Yunpeng Li Chuanyang Li YanHao 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期537-555,共19页
Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and ... Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and recognition performance of the system can be enhanced through judiciously leveraging the correlation among multimodal features.Nevertheless,two issues persist in multi-modal feature fusion recognition:Firstly,the enhancement of recognition performance in fusion recognition has not comprehensively considered the inter-modality correlations among distinct modalities.Secondly,during modal fusion,improper weight selection diminishes the salience of crucial modal features,thereby diminishing the overall recognition performance.To address these two issues,we introduce an enhanced DenseNet multimodal recognition network founded on feature-level fusion.The information from the three modalities is fused akin to RGB,and the input network augments the correlation between modes through channel correlation.Within the enhanced DenseNet network,the Efficient Channel Attention Network(ECA-Net)dynamically adjusts the weight of each channel to amplify the salience of crucial information in each modal feature.Depthwise separable convolution markedly reduces the training parameters and further enhances the feature correlation.Experimental evaluations were conducted on four multimodal databases,comprising six unimodal databases,including multispectral palmprint and palm vein databases from the Chinese Academy of Sciences.The Equal Error Rates(EER)values were 0.0149%,0.0150%,0.0099%,and 0.0050%,correspondingly.In comparison to other network methods for palmprint,palm vein,and finger vein fusion recognition,this approach substantially enhances recognition performance,rendering it suitable for high-security environments with practical applicability.The experiments in this article utilized amodest sample database comprising 200 individuals.The subsequent phase involves preparing for the extension of the method to larger databases. 展开更多
关键词 BIOMETRICS multi-modal CORRELATION deep learning feature-level fusion
下载PDF
M3SC:A Generic Dataset for Mixed Multi-Modal(MMM)Sensing and Communication Integration 被引量:3
3
作者 Xiang Cheng Ziwei Huang +6 位作者 Lu Bai Haotian Zhang Mingran Sun Boxun Liu Sijiang Li Jianan Zhang Minson Lee 《China Communications》 SCIE CSCD 2023年第11期13-29,共17页
The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication ... The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research.This paper develops a novel simulation dataset,named M3SC,for mixed multi-modal(MMM)sensing-communication integration,and the generation framework of the M3SC dataset is further given.To obtain multimodal sensory data in physical space and communication data in electromagnetic space,we utilize Air-Sim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data.Furthermore,the in-depth integration and precise alignment of AirSim,WaveFarer,andWireless InSite are achieved.The M3SC dataset covers various weather conditions,multiplex frequency bands,and different times of the day.Currently,the M3SC dataset contains 1500 snapshots,including 80 RGB images,160 depth maps,80 LiDAR point clouds,256 sets of mmWave waveforms with 8 radar point clouds,and 72 channel impulse response(CIR)matrices per snapshot,thus totaling 120,000 RGB images,240,000 depth maps,120,000 LiDAR point clouds,384,000 sets of mmWave waveforms with 12,000 radar point clouds,and 108,000 CIR matrices.The data processing result presents the multi-modal sensory information and communication channel statistical properties.Finally,the MMM sensing-communication application,which can be supported by the M3SC dataset,is discussed. 展开更多
关键词 multi-modal sensing RAY-TRACING sensing-communication integration simulation dataset
下载PDF
Multi-modal knowledge graph inference via media convergence and logic rule
4
作者 Feng Lin Dongmei Li +5 位作者 Wenbin Zhang Dongsheng Shi Yuanzhou Jiao Qianzhong Chen Yiying Lin Wentao Zhu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第1期211-221,共11页
Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the intro... Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features. 展开更多
关键词 logic rule media convergence multi-modal knowledge graph inference representation learning
下载PDF
Generative Multi-Modal Mutual Enhancement Video Semantic Communications
5
作者 Yuanle Chen Haobo Wang +3 位作者 Chunyu Liu Linyi Wang Jiaxin Liu Wei Wu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2985-3009,共25页
Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the... Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent. 展开更多
关键词 Generative adversarial networks multi-modal mutual enhancement video semantic transmission deep learning
下载PDF
Applying Hybrid Clustering in Pulsar Candidate Sifting with Multi-modality for FAST Survey
6
作者 Zi-Yi You Yun-Rong Pan +11 位作者 Zhi Ma Li Zhang Shuo Xiao Dan-Dan Zhang Shi-Jun Dang Ru-Shuang Zhao Pei Wang Ai-Jun Dong Jia-Tao Jiang Ji-Bing Leng Wei-An Li Si-Yao Li 《Research in Astronomy and Astrophysics》 SCIE CAS CSCD 2024年第3期283-296,共14页
Pulsar search is always the basis of pulsar navigation,gravitational wave detection and other research topics.Currently,the volume of pulsar candidates collected by the Five-hundred-meter Aperture Spherical radio Tele... Pulsar search is always the basis of pulsar navigation,gravitational wave detection and other research topics.Currently,the volume of pulsar candidates collected by the Five-hundred-meter Aperture Spherical radio Telescope(FAST)shows an explosive growth rate that has brought challenges for its pulsar candidate filtering system.Particularly,the multi-view heterogeneous data and class imbalance between true pulsars and non-pulsar candidates have negative effects on traditional single-modal supervised classification methods.In this study,a multi-modal and semi-supervised learning based on a pulsar candidate sifting algorithm is presented,which adopts a hybrid ensemble clustering scheme of density-based and partition-based methods combined with a feature-level fusion strategy for input data and a data partition strategy for parallelization.Experiments on both High Time Resolution Universe SurveyⅡ(HTRU2)and actual FAST observation data demonstrate that the proposed algorithm could excellently identify pulsars:On HTRU2,the precision and recall rates of its parallel mode reach0.981 and 0.988 respectively.On FAST data,those of its parallel mode reach 0.891 and 0.961,meanwhile,the running time also significantly decreases with the increment of parallel nodes within limits.Thus,we can conclude that our algorithm could be a feasible idea for large scale pulsar candidate sifting for FAST drift scan observation. 展开更多
关键词 METHODS data analysis-surveys-methods numerical
下载PDF
Unsupervised multi-modal image translation based on the squeeze-and-excitation mechanism and feature attention module
7
作者 胡振涛 HU Chonghao +1 位作者 YANG Haoran SHUAI Weiwei 《High Technology Letters》 EI CAS 2024年第1期23-30,共8页
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera... The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable. 展开更多
关键词 multi-modal image translation generative adversarial network(GAN) squeezeand-excitation(SE)mechanism feature attention(FA)module
下载PDF
Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
8
作者 王超 蔡思佳 +1 位作者 史北祥 崇志宏 《Journal of Computer Science & Technology》 SCIE EI CSCD 2023年第6期1223-1236,共14页
The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology.One of the research directions is employing relations among multi-modal data to enhanc... The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology.One of the research directions is employing relations among multi-modal data to enhance perfor-mance.However,the reliance on manually annotated multi-modal datasets results in a high cost of data labeling.In this paper,the topic semantics of images is proposed to alleviate the above problem.First,topic-related images can be auto-matically collected from the Internet by search engines.Second,topic semantics is sufficient to encode the relations be-tween multi-modal data such as texts and images.Specifically,we propose a visual topic semantic enhanced translation(VTSE)model that utilizes topic-related images to construct a cross-lingual and cross-modal semantic space,allowing the VTSE model to simultaneously integrate the syntactic structure and semantic features.In the above process,topic similar texts and images are wrapped into groups so that the model can extract more robust topic semantics from a set of similar images and then further optimize the feature integration.The results show that our model outperforms competitive base-lines by a large margin on the Multi30k and the Ambiguous COCO datasets.Our model can use external images to bring gains to translation,improving data efficiency. 展开更多
关键词 multi-modal machine translation visual topic semantics data efficiency
原文传递
A multi-modal clustering method for traditonal Chinese medicine clinical data via media convergence
9
作者 Jingna Si Ziwei Tian +6 位作者 Dongmei Li Lei Zhang Lei Yao Wenjuan Jiang Jia Liu Runshun Zhang Xiaoping Zhang 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期390-400,共11页
Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaini... Media convergence is a media change led by technological innovation.Applying media convergence technology to the study of clustering in Chinese medicine can significantly exploit the advantages of media fusion.Obtaining consistent and complementary information among multiple modalities through media convergence can provide technical support for clustering.This article presents an approach based on Media Convergence and Graph convolution Encoder Clustering(MCGEC)for traditonal Chinese medicine(TCM)clinical data.It feeds modal information and graph structure from media information into a multi-modal graph convolution encoder to obtain the media feature representation learnt from multiple modalities.MCGEC captures latent information from various modalities by fusion and optimises the feature representations and network architecture with learnt clustering labels.The experiment is conducted on real-world multimodal TCM clinical data,including information like images and text.MCGEC has improved clustering results compared to the generic single-modal clustering methods and the current more advanced multi-modal clustering methods.MCGEC applied to TCM clinical datasets can achieve better results.Integrating multimedia features into clustering algorithms offers significant benefits compared to single-modal clustering approaches that simply concatenate features from different modalities.It provides practical technical support for multi-modal clustering in the TCM field incorporating multimedia features. 展开更多
关键词 graph convolutional encoder media convergence multi-modal clustering traditional Chinese medicine
下载PDF
Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images 被引量:1
10
作者 Mengyu WANG Zhiyuan YAN +2 位作者 Yingchao FENG Wenhui DIAO Xian SUN 《Journal of Geodesy and Geoinformation Science》 CSCD 2023年第4期27-39,共13页
Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively u... Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation. 展开更多
关键词 multi-modal MULTI-TASK semantic segmentation height estimation convolutional neural network
下载PDF
Method of Multi-Mode Sensor Data Fusion with an Adaptive Deep Coupling Convolutional Auto-Encoder
11
作者 Xiaoxiong Feng Jianhua Liu 《Journal of Sensor Technology》 2023年第4期69-85,共17页
To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features e... To address the difficulties in fusing multi-mode sensor data for complex industrial machinery, an adaptive deep coupling convolutional auto-encoder (ADCCAE) fusion method was proposed. First, the multi-mode features extracted synchronously by the CCAE were stacked and fed to the multi-channel convolution layers for fusion. Then, the fused data was passed to all connection layers for compression and fed to the Softmax module for classification. Finally, the coupling loss function coefficients and the network parameters were optimized through an adaptive approach using the gray wolf optimization (GWO) algorithm. Experimental comparisons showed that the proposed ADCCAE fusion model was superior to existing models for multi-mode data fusion. 展开更多
关键词 multi-mode data Fusion Coupling Convolutional Auto-Encoder Adaptive Optimization Deep Learning
下载PDF
A survey of multi-modal learning theory
12
作者 HUANG Yu HUANG Longbo 《中山大学学报(自然科学版)(中英文)》 CAS CSCD 北大核心 2023年第5期38-49,共12页
Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empi... Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empirical performance,the theoretical foundations of deep multi-modal learning have yet to be fully explored.In this paper,we will undertake a comprehensive survey of recent developments in multi-modal learning theories,focusing on the fundamental properties that govern this field.Our goal is to provide a thorough collection of current theoretical tools for analyzing multi-modal learning,to clarify their implications for practitioners,and to suggest future directions for the establishment of a solid theoretical foundation for deep multi-modal learning. 展开更多
关键词 multi-modal learning machine learning theory OPTIMIZATION GENERALIZATION
下载PDF
基于re3data的中英科学数据仓储平台对比研究
13
作者 袁烨 陈媛媛 《数字图书馆论坛》 2024年第2期13-23,共11页
以re3data为数据获取源,选取中英两国406个科学数据仓储为研究对象,从分布特征、责任类型、仓储许可、技术标准及质量标准等5个方面、11个指标对两国科学数据仓储的建设情况进行对比分析,试图为我国数据仓储的可持续发展提出建议:广泛... 以re3data为数据获取源,选取中英两国406个科学数据仓储为研究对象,从分布特征、责任类型、仓储许可、技术标准及质量标准等5个方面、11个指标对两国科学数据仓储的建设情况进行对比分析,试图为我国数据仓储的可持续发展提出建议:广泛联结国内外异质机构,推进多学科领域的交流与合作,有效扩充仓储许可权限与类型,优化技术标准的应用现况,提高元数据使用的灵活性。 展开更多
关键词 科学数据 数据仓储平台 re3data 中国 英国
下载PDF
PowerDetector:Malicious PowerShell Script Family Classification Based on Multi-Modal Semantic Fusion and Deep Learning
14
作者 Xiuzhang Yang Guojun Peng +2 位作者 Dongni Zhang Yuhang Gao Chenguang Li 《China Communications》 SCIE CSCD 2023年第11期202-224,共23页
Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and ... Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and malicious detection,lacking the malicious Power Shell families classification and behavior analysis.Moreover,the state-of-the-art methods fail to capture fine-grained features and semantic relationships,resulting in low robustness and accuracy.To this end,we propose Power Detector,a novel malicious Power Shell script detector based on multimodal semantic fusion and deep learning.Specifically,we design four feature extraction methods to extract key features from character,token,abstract syntax tree(AST),and semantic knowledge graph.Then,we intelligently design four embeddings(i.e.,Char2Vec,Token2Vec,AST2Vec,and Rela2Vec) and construct a multi-modal fusion algorithm to concatenate feature vectors from different views.Finally,we propose a combined model based on transformer and CNN-Bi LSTM to implement Power Shell family detection.Our experiments with five types of Power Shell attacks show that PowerDetector can accurately detect various obfuscated and stealth PowerShell scripts,with a 0.9402 precision,a 0.9358 recall,and a 0.9374 F1-score.Furthermore,through singlemodal and multi-modal comparison experiments,we demonstrate that PowerDetector’s multi-modal embedding and deep learning model can achieve better accuracy and even identify more unknown attacks. 展开更多
关键词 deep learning malicious family detection multi-modal semantic fusion POWERSHELL
下载PDF
Multi-Modal Military Event Extraction Based on Knowledge Fusion
15
作者 Yuyuan Xiang Yangli Jia +1 位作者 Xiangliang Zhang Zhenling Zhang 《Computers, Materials & Continua》 SCIE EI 2023年第10期97-114,共18页
Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event eleme... Event extraction stands as a significant endeavor within the realm of information extraction,aspiring to automatically extract structured event information from vast volumes of unstructured text.Extracting event elements from multi-modal data remains a challenging task due to the presence of a large number of images and overlapping event elements in the data.Although researchers have proposed various methods to accomplish this task,most existing event extraction models cannot address these challenges because they are only applicable to text scenarios.To solve the above issues,this paper proposes a multi-modal event extraction method based on knowledge fusion.Specifically,for event-type recognition,we use a meticulous pipeline approach that integrates multiple pre-trained models.This approach enables a more comprehensive capture of the multidimensional event semantic features present in military texts,thereby enhancing the interconnectedness of information between trigger words and events.For event element extraction,we propose a method for constructing a priori templates that combine event types with corresponding trigger words.This approach facilitates the acquisition of fine-grained input samples containing event trigger words,thus enabling the model to understand the semantic relationships between elements in greater depth.Furthermore,a fusion method for spatial mapping of textual event elements and image elements is proposed to reduce the category number overload and effectively achieve multi-modal knowledge fusion.The experimental results based on the CCKS 2022 dataset show that our method has achieved competitive results,with a comprehensive evaluation value F1-score of 53.4%for the model.These results validate the effectiveness of our method in extracting event elements from multi-modal data. 展开更多
关键词 Event extraction multi-modal knowledge fusion pre-trained models
下载PDF
DCRL-KG: Distributed Multi-Modal Knowledge Graph Retrieval Platform Based on Collaborative Representation Learning
16
作者 Leilei Li Yansheng Fu +6 位作者 Dongjie Zhu Xiaofang Li Yundong Sun Jianrui Ding Mingrui Wu Ning Cao Russell Higgs 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3295-3307,共13页
The knowledge graph with relational abundant information has been widely used as the basic data support for the retrieval platforms.Image and text descriptions added to the knowledge graph enrich the node information,... The knowledge graph with relational abundant information has been widely used as the basic data support for the retrieval platforms.Image and text descriptions added to the knowledge graph enrich the node information,which accounts for the advantage of the multi-modal knowledge graph.In the field of cross-modal retrieval platforms,multi-modal knowledge graphs can help to improve retrieval accuracy and efficiency because of the abundant relational infor-mation provided by knowledge graphs.The representation learning method is sig-nificant to the application of multi-modal knowledge graphs.This paper proposes a distributed collaborative vector retrieval platform(DCRL-KG)using the multi-modal knowledge graph VisualSem as the foundation to achieve efficient and high-precision multimodal data retrieval.Firstly,use distributed technology to classify and store the data in the knowledge graph to improve retrieval efficiency.Secondly,this paper uses BabelNet to expand the knowledge graph through multi-ple filtering processes and increase the diversification of information.Finally,this paper builds a variety of retrieval models to achieve the fusion of retrieval results through linear combination methods to achieve high-precision language retrieval and image retrieval.The paper uses sentence retrieval and image retrieval experi-ments to prove that the platform can optimize the storage structure of the multi-modal knowledge graph and have good performance in multi-modal space. 展开更多
关键词 multi-modal retrieval distributed storage knowledge graph
下载PDF
Multi-Modal Scene Matching Location Algorithm Based on M2Det
17
作者 Jiwei Fan Xiaogang Yang +2 位作者 Ruitao Lu Qingge Li Siyu Wang 《Computers, Materials & Continua》 SCIE EI 2023年第10期1031-1052,共22页
In recent years,many visual positioning algorithms have been proposed based on computer vision and they have achieved good results.However,these algorithms have a single function,cannot perceive the environment,and ha... In recent years,many visual positioning algorithms have been proposed based on computer vision and they have achieved good results.However,these algorithms have a single function,cannot perceive the environment,and have poor versatility,and there is a certain mismatch phenomenon,which affects the positioning accuracy.Therefore,this paper proposes a location algorithm that combines a target recognition algorithm with a depth feature matching algorithm to solve the problem of unmanned aerial vehicle(UAV)environment perception and multi-modal image-matching fusion location.This algorithm was based on the single-shot object detector based on multi-level feature pyramid network(M2Det)algorithm and replaced the original visual geometry group(VGG)feature extraction network with the ResNet-101 network to improve the feature extraction capability of the network model.By introducing a depth feature matching algorithm,the algorithm shares neural network weights and realizes the design of UAV target recognition and a multi-modal image-matching fusion positioning algorithm.When the reference image and the real-time image were mismatched,the dynamic adaptive proportional constraint and the random sample consensus consistency algorithm(DAPC-RANSAC)were used to optimize the matching results to improve the correct matching efficiency of the target.Using the multi-modal registration data set,the proposed algorithm was compared and analyzed to verify its superiority and feasibility.The results show that the algorithm proposed in this paper can effectively deal with the matching between multi-modal images(visible image–infrared image,infrared image–satellite image,visible image–satellite image),and the contrast,scale,brightness,ambiguity deformation,and other changes had good stability and robustness.Finally,the effectiveness and practicability of the algorithm proposed in this paper were verified in an aerial test scene of an S1000 sixrotor UAV. 展开更多
关键词 Visual positioning multi-modal scene matching unmanned aerial vehicle
下载PDF
Data Secure Storage Mechanism for IIoT Based on Blockchain 被引量:1
18
作者 Jin Wang Guoshu Huang +2 位作者 R.Simon Sherratt Ding Huang Jia Ni 《Computers, Materials & Continua》 SCIE EI 2024年第3期4029-4048,共20页
With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapi... With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT. 展开更多
关键词 Blockchain IIoT data storage cryptographic commitment
下载PDF
Robust Symmetry Prediction with Multi-Modal Feature Fusion for Partial Shapes
19
作者 Junhua Xi Kouquan Zheng +3 位作者 Yifan Zhong Longjiang Li Zhiping Cai Jinjing Chen 《Intelligent Automation & Soft Computing》 SCIE 2023年第3期3099-3111,共13页
In geometry processing,symmetry research benefits from global geo-metric features of complete shapes,but the shape of an object captured in real-world applications is often incomplete due to the limited sensor resoluti... In geometry processing,symmetry research benefits from global geo-metric features of complete shapes,but the shape of an object captured in real-world applications is often incomplete due to the limited sensor resolution,single viewpoint,and occlusion.Different from the existing works predicting symmetry from the complete shape,we propose a learning approach for symmetry predic-tion based on a single RGB-D image.Instead of directly predicting the symmetry from incomplete shapes,our method consists of two modules,i.e.,the multi-mod-al feature fusion module and the detection-by-reconstruction module.Firstly,we build a channel-transformer network(CTN)to extract cross-fusion features from the RGB-D as the multi-modal feature fusion module,which helps us aggregate features from the color and the depth separately.Then,our self-reconstruction net-work based on a 3D variational auto-encoder(3D-VAE)takes the global geo-metric features as input,followed by a prediction symmetry network to detect the symmetry.Our experiments are conducted on three public datasets:ShapeNet,YCB,and ScanNet,we demonstrate that our method can produce reliable and accurate results. 展开更多
关键词 Symmetry prediction multi-modal feature fusion partial shapes
下载PDF
Enhanced prediction of anisotropic deformation behavior using machine learning with data augmentation 被引量:1
20
作者 Sujeong Byun Jinyeong Yu +3 位作者 Seho Cheon Seong Ho Lee Sung Hyuk Park Taekyung Lee 《Journal of Magnesium and Alloys》 SCIE EI CAS CSCD 2024年第1期186-196,共11页
Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary w... Mg alloys possess an inherent plastic anisotropy owing to the selective activation of deformation mechanisms depending on the loading condition.This characteristic results in a diverse range of flow curves that vary with a deformation condition.This study proposes a novel approach for accurately predicting an anisotropic deformation behavior of wrought Mg alloys using machine learning(ML)with data augmentation.The developed model combines four key strategies from data science:learning the entire flow curves,generative adversarial networks(GAN),algorithm-driven hyperparameter tuning,and gated recurrent unit(GRU)architecture.The proposed model,namely GAN-aided GRU,was extensively evaluated for various predictive scenarios,such as interpolation,extrapolation,and a limited dataset size.The model exhibited significant predictability and improved generalizability for estimating the anisotropic compressive behavior of ZK60 Mg alloys under 11 annealing conditions and for three loading directions.The GAN-aided GRU results were superior to those of previous ML models and constitutive equations.The superior performance was attributed to hyperparameter optimization,GAN-based data augmentation,and the inherent predictivity of the GRU for extrapolation.As a first attempt to employ ML techniques other than artificial neural networks,this study proposes a novel perspective on predicting the anisotropic deformation behaviors of wrought Mg alloys. 展开更多
关键词 Plastic anisotropy Compression ANNEALING Machine learning data augmentation
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部