Transition metal phosphides(TMPs)have been regarded as alternative hydrogen evolution reaction(HER)and oxygen evolution reaction(OER)catalysts owing to their comparable activity to those of noble metal-based catalysts...Transition metal phosphides(TMPs)have been regarded as alternative hydrogen evolution reaction(HER)and oxygen evolution reaction(OER)catalysts owing to their comparable activity to those of noble metal-based catalysts.TMPs have been produced in various morphologies,including hollow and porous nanostructures,which are features deemed desirable for electrocatalytic materials.Templated synthesis routes are often responsible for such morphologies.This paper reviews the latest advances and existing challenges in the synthesis of TMP-based OER and HER catalysts through templated methods.A comprehensive review of the structure-property-performance of TMP-based HER and OER catalysts prepared using different templates is presented.The discussion proceeds according to application,first by HER and further divided among the types of templates used-from hard templates,sacrificial templates,and soft templates to the emerging dynamic hydrogen bubble template.OER catalysts are then reviewed and grouped according to their morphology.Finally,prospective research directions for the synthesis of hollow and porous TMP-based catalysts,such as improvements on both activity and stability of TMPs,design of environmentally benign templates and processes,and analysis of the reaction mechanism through advanced material characterization techniques and theoretical calculations,are suggested.展开更多
Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibilit...Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibility to use mobile platforms to detect the location and motion of the vehicle over a larger area.To this end,different models have shown the ability to recognize and track vehicles.However,these methods are not mature enough to produce accurate results in complex road scenes.Therefore,this paper presents an algorithm that combines state-of-the-art techniques for identifying and tracking vehicles in conjunction with image bursts.The extracted frames were converted to grayscale,followed by the application of a georeferencing algorithm to embed coordinate information into the images.The masking technique eliminated irrelevant data and reduced the computational cost of the overall monitoring system.Next,Sobel edge detection combined with Canny edge detection and Hough line transform has been applied for noise reduction.After preprocessing,the blob detection algorithm helped detect the vehicles.Vehicles of varying sizes have been detected by implementing a dynamic thresholding scheme.Detection was done on the first image of every burst.Then,to track vehicles,the model of each vehicle was made to find its matches in the succeeding images using the template matching algorithm.To further improve the tracking accuracy by incorporating motion information,Scale Invariant Feature Transform(SIFT)features have been used to find the best possible match among multiple matches.An accuracy rate of 87%for detection and 80%accuracy for tracking in the A1 Motorway Netherland dataset has been achieved.For the Vehicle Aerial Imaging from Drone(VAID)dataset,an accuracy rate of 86%for detection and 78%accuracy for tracking has been achieved.展开更多
As an ultra-wide bandgap semiconductor,diamond garners significant interest due to its exceptional physical properties^([1–3]).These superior characteristics make diamonds highly promising for applications in power e...As an ultra-wide bandgap semiconductor,diamond garners significant interest due to its exceptional physical properties^([1–3]).These superior characteristics make diamonds highly promising for applications in power electronics^([4]),deep-ultraviolet detectors^([5]),high-energy particle detectors^([6]),and quantum devices based on color centers^([7]).展开更多
Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and ...Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and recognition performance of the system can be enhanced through judiciously leveraging the correlation among multimodal features.Nevertheless,two issues persist in multi-modal feature fusion recognition:Firstly,the enhancement of recognition performance in fusion recognition has not comprehensively considered the inter-modality correlations among distinct modalities.Secondly,during modal fusion,improper weight selection diminishes the salience of crucial modal features,thereby diminishing the overall recognition performance.To address these two issues,we introduce an enhanced DenseNet multimodal recognition network founded on feature-level fusion.The information from the three modalities is fused akin to RGB,and the input network augments the correlation between modes through channel correlation.Within the enhanced DenseNet network,the Efficient Channel Attention Network(ECA-Net)dynamically adjusts the weight of each channel to amplify the salience of crucial information in each modal feature.Depthwise separable convolution markedly reduces the training parameters and further enhances the feature correlation.Experimental evaluations were conducted on four multimodal databases,comprising six unimodal databases,including multispectral palmprint and palm vein databases from the Chinese Academy of Sciences.The Equal Error Rates(EER)values were 0.0149%,0.0150%,0.0099%,and 0.0050%,correspondingly.In comparison to other network methods for palmprint,palm vein,and finger vein fusion recognition,this approach substantially enhances recognition performance,rendering it suitable for high-security environments with practical applicability.The experiments in this article utilized amodest sample database comprising 200 individuals.The subsequent phase involves preparing for the extension of the method to larger databases.展开更多
Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant resear...Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.展开更多
Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of po...Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of possible future trajectories can be consid-erable(multi-modal).Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpret-ability.Moreover,the metrics used in current benchmarks do not evaluate all aspects of the problem,such as the diversity and admissibility of the output.The authors aim to advance towards the design of trustworthy motion prediction systems,based on some of the re-quirements for the design of Trustworthy Artificial Intelligence.The focus is on evaluation criteria,robustness,and interpretability of outputs.First,the evaluation metrics are comprehensively analysed,the main gaps of current benchmarks are identified,and a new holistic evaluation framework is proposed.Then,a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system.To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework,an intent prediction layer that can be attached to multi-modal motion prediction models is proposed.The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions.The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autono-mous vehicles,advancing the field towards greater safety and reliability.展开更多
Mill vibration is a common problem in rolling production,which directly affects the thickness accuracy of the strip and may even lead to strip fracture accidents in serious cases.The existing vibration prediction mode...Mill vibration is a common problem in rolling production,which directly affects the thickness accuracy of the strip and may even lead to strip fracture accidents in serious cases.The existing vibration prediction models do not consider the features contained in the data,resulting in limited improvement of model accuracy.To address these challenges,this paper proposes a multi-dimensional multi-modal cold rolling vibration time series prediction model(MDMMVPM)based on the deep fusion of multi-level networks.In the model,the long-term and short-term modal features of multi-dimensional data are considered,and the appropriate prediction algorithms are selected for different data features.Based on the established prediction model,the effects of tension and rolling force on mill vibration are analyzed.Taking the 5th stand of a cold mill in a steel mill as the research object,the innovative model is applied to predict the mill vibration for the first time.The experimental results show that the correlation coefficient(R^(2))of the model proposed in this paper is 92.5%,and the root-mean-square error(RMSE)is 0.0011,which significantly improves the modeling accuracy compared with the existing models.The proposed model is also suitable for the hot rolling process,which provides a new method for the prediction of strip rolling vibration.展开更多
Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the intro...Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.展开更多
Intelligent personal assistants play a pivotal role in in-vehicle systems,significantly enhancing life efficiency,driving safety,and decision-making support.In this study,the multi-modal design elements of intelligent...Intelligent personal assistants play a pivotal role in in-vehicle systems,significantly enhancing life efficiency,driving safety,and decision-making support.In this study,the multi-modal design elements of intelligent personal assistants within the context of visual,auditory,and somatosensory interactions with drivers were discussed.Their impact on the driver’s psychological state through various modes such as visual imagery,voice interaction,and gesture interaction were explored.The study also introduced innovative designs for in-vehicle intelligent personal assistants,incorporating design principles such as driver-centricity,prioritizing passenger safety,and utilizing timely feedback as a criterion.Additionally,the study employed design methods like driver behavior research and driving situation analysis to enhance the emotional connection between drivers and their vehicles,ultimately improving driver satisfaction and trust.展开更多
Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the...Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.展开更多
The joint entity relation extraction model which integrates the semantic information of relation is favored by relevant researchers because of its effectiveness in solving the overlapping of entities,and the method of...The joint entity relation extraction model which integrates the semantic information of relation is favored by relevant researchers because of its effectiveness in solving the overlapping of entities,and the method of defining the semantic template of relation manually is particularly prominent in the extraction effect because it can obtain the deep semantic information of relation.However,this method has some problems,such as relying on expert experience and poor portability.Inspired by the rule-based entity relation extraction method,this paper proposes a joint entity relation extraction model based on a relation semantic template automatically constructed,which is abbreviated as RSTAC.This model refines the extraction rules of relation semantic templates from relation corpus through dependency parsing and realizes the automatic construction of relation semantic templates.Based on the relation semantic template,the process of relation classification and triplet extraction is constrained,and finally,the entity relation triplet is obtained.The experimental results on the three major Chinese datasets of DuIE,SanWen,and FinRE showthat the RSTAC model successfully obtains rich deep semantics of relation,improves the extraction effect of entity relation triples,and the F1 scores are increased by an average of 0.96% compared with classical joint extraction models such as CasRel,TPLinker,and RFBFN.展开更多
BACKGROUND There is still considerable heterogeneity regarding which features of cryptoglandular anal fistula on magnetic resonance imaging(MRI)and endoanal ultrasound(EAUS)are relevant to surgical decision-making.As ...BACKGROUND There is still considerable heterogeneity regarding which features of cryptoglandular anal fistula on magnetic resonance imaging(MRI)and endoanal ultrasound(EAUS)are relevant to surgical decision-making.As a con-sequence,the quality and completeness of the report are highly dependent on the training and experience of the examiners.AIM To develop a structured MRI and EAUS template(SMART)reporting the minimum dataset of information for the treatment of anal fistulas.METHODS This modified Delphi survey based on the RAND-UCLA appropriateness for consensus-building was conducted between May and August 2023.One hundred and fifty-one articles selected from a systematic review of the lite-rature formed the database to generate the evidence-based statements for the Delphi study.Fourteen questions were anonymously voted by an interdisciplinary multidisciplinary group for a maximum of three iterative rounds.The degree of agreement was scored on a numeric 0–10 scale.Group consensus was defined as a score≥8 for≥80%of the panelists.RESULTS Eleven scientific societies(3 radiological and 8 surgical)endorsed the study.After three rounds of voting,the experts(69 colorectal surgeons,23 radiologists,2 anatomists,and 1 gastroenterologist)achieved consensus for 12 of 14 statements(85.7%).Based on the results of the Delphi process,the six following features of anal fistulas were included in the SMART:Primary tract,secondary extension,internal opening,presence of collection,coexisting le-sions,and sphincters morphology.CONCLUSION A structured template,SMART,was developed to standardize imaging reporting of fistula-in-ano in a simple,systematic,time-efficient way,providing the minimum dataset of information and visual diagram useful to refer-ring physicians.展开更多
Background: As the population age structure gradually ages, more and more elderly people were found to have pulmonary nodules during physical examinations. Most elderly people had underlying diseases such as heart, lu...Background: As the population age structure gradually ages, more and more elderly people were found to have pulmonary nodules during physical examinations. Most elderly people had underlying diseases such as heart, lung, brain and blood vessels and cannot tolerate surgery. Computed tomography (CT)-guided percutaneous core needle biopsy (CNB) was the first choice for pathological diagnosis and subsequent targeted drugs, immune drugs or ablation treatment. CT-guided percutaneous CNB requires clinicians with rich CNB experience to ensure high CNB accuracy, but it was easy to cause complications such as pneumothorax and hemorrhage. Three-dimensional (3D) printing coplanar template (PCT) combined with CT-guided percutaneous pulmonary CNB biopsy has been used in clinical practice, but there was no prospective, randomized controlled study. Methods: Elderly patients with lung nodules admitted to the Department of Oncology of our hospital from January 2019 to January 2023 were selected. A total of 225 elderly patients were screened, and 30 patients were included after screening. They were randomly divided into experimental group (Group A: 30 cases) and control group (Group B: 30 cases). Group A was given 3D-PCT combined with CT-guided percutaneous pulmonary CNB biopsy, Group B underwent CT-guided percutaneous pulmonary CNB. The primary outcome measure of this study was the accuracy of diagnostic CNB, and the secondary outcome measures were CNB time, number of CNB needles, number of pathological tissues and complications. Results: The diagnostic accuracy of group A and group B was 96.67% and 76.67%, respectively (P = 0.026). There were statistical differences between group A and group B in average CNB time (P = 0.001), number of CNB (1 vs more than 1, P = 0.029), and pathological tissue obtained by CNB (3 vs 1, P = 0.040). There was no statistical difference in the incidence of pneumothorax and hemorrhage between the two groups (P > 0.05). Conclusions: 3D-PCT combined with CT-guided percutaneous CNB can improve the puncture accuracy of elderly patients, shorten the puncture time, reduce the number of punctures, and increase the amount of puncture pathological tissue, without increasing pneumothorax and hemorrhage complications. We look forward to verifying this in a phase III randomized controlled clinical study. .展开更多
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication ...The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research.This paper develops a novel simulation dataset,named M3SC,for mixed multi-modal(MMM)sensing-communication integration,and the generation framework of the M3SC dataset is further given.To obtain multimodal sensory data in physical space and communication data in electromagnetic space,we utilize Air-Sim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data.Furthermore,the in-depth integration and precise alignment of AirSim,WaveFarer,andWireless InSite are achieved.The M3SC dataset covers various weather conditions,multiplex frequency bands,and different times of the day.Currently,the M3SC dataset contains 1500 snapshots,including 80 RGB images,160 depth maps,80 LiDAR point clouds,256 sets of mmWave waveforms with 8 radar point clouds,and 72 channel impulse response(CIR)matrices per snapshot,thus totaling 120,000 RGB images,240,000 depth maps,120,000 LiDAR point clouds,384,000 sets of mmWave waveforms with 12,000 radar point clouds,and 108,000 CIR matrices.The data processing result presents the multi-modal sensory information and communication channel statistical properties.Finally,the MMM sensing-communication application,which can be supported by the M3SC dataset,is discussed.展开更多
The development of economical,efficient,and robust electrocatalysts toward the hydrogen evolution reaction(HER)is highly imperative for the rapid advancement of renewable H2 energy-associated technologies.Extensive ut...The development of economical,efficient,and robust electrocatalysts toward the hydrogen evolution reaction(HER)is highly imperative for the rapid advancement of renewable H2 energy-associated technologies.Extensive utilization of the heterointerface effect can endow the catalysts with remarkably boosted electrocatalytic performance due to the modified electronic state of active sites.Herein,we demonstrate deliberate crafting of CoP/CoO heterojunction porous nanotubes(abbreviated as CoP/CoO PNTs hereafter)using a self-sacrificial template-engaged strategy.Precise control over the Kirkendall diffusion process of the presynthesized cobalt–aspartic acid complex nanowires is indispensable for the formation of CoP/CoO heterostructures.The topochemical transformation strategy of the reactive templates enables uniform and maximized construction of CoP/CoO heterojunctions throughout all the porous nanotubes.The establishment of CoP/CoO heterojunctions could considerably modify the electronic configuration of the active sites and also improve the electric conductivity,which endows the resultant CoP/CoO PNTs with enhanced intrinsic activity.Simultaneously,the hollow and porous nanotube architectures allow sufficient accessibility of exterior/interior surfaces and molecular permeability,drastically promoting the reaction kinetics.Consequently,when used as HER electrocatalysts,the well-designed CoP/CoO PNTs show Pt-like activity,with an overpotential of only 61 mV at 10mA cm^(−2) and excellent stability in 1.0M KOH medium,exceeding those of the vast majority of the previously reported nonprecious candidates.Density functional theory calculations further substantiate that the construction of CoP/CoO heterojunctions enables optimization of the Gibbs free energies for water adsorption and H adsorption,resulting in boosted HER intrinsic activity.The present study may provide in-depth insights into the fundamental mechanisms of heterojunction-induced electronic regulation,which may pave the way for the rational design of advanced Earth-abundant electrocatalysts in the future.展开更多
Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively u...Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.展开更多
Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and ...Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and malicious detection,lacking the malicious Power Shell families classification and behavior analysis.Moreover,the state-of-the-art methods fail to capture fine-grained features and semantic relationships,resulting in low robustness and accuracy.To this end,we propose Power Detector,a novel malicious Power Shell script detector based on multimodal semantic fusion and deep learning.Specifically,we design four feature extraction methods to extract key features from character,token,abstract syntax tree(AST),and semantic knowledge graph.Then,we intelligently design four embeddings(i.e.,Char2Vec,Token2Vec,AST2Vec,and Rela2Vec) and construct a multi-modal fusion algorithm to concatenate feature vectors from different views.Finally,we propose a combined model based on transformer and CNN-Bi LSTM to implement Power Shell family detection.Our experiments with five types of Power Shell attacks show that PowerDetector can accurately detect various obfuscated and stealth PowerShell scripts,with a 0.9402 precision,a 0.9358 recall,and a 0.9374 F1-score.Furthermore,through singlemodal and multi-modal comparison experiments,we demonstrate that PowerDetector’s multi-modal embedding and deep learning model can achieve better accuracy and even identify more unknown attacks.展开更多
Copper azide(CA), as a primary explosive with high energy density, has not been practically used so far because of its high electrostatic sensitivity. The Cu2O@HKUST-1 core-shell structure hybrid material was synthesi...Copper azide(CA), as a primary explosive with high energy density, has not been practically used so far because of its high electrostatic sensitivity. The Cu2O@HKUST-1 core-shell structure hybrid material was synthesized by the “bottle around ship” methodology in this research by regulating the dissolution rate of Cu2O and the generation rate of metal-organic framework(MOF) materials. Cu2O@HKUST-1 was carbonized to form a Cu O@porous carbon(CuO@PC) composite material. CuO@PC was synthesized into a copper azide(CA) @PC composite energetic material through a gas-solid phase in-situ azidation reaction.CA is encapsulated in PC framework, which acts as a nanoscale Faraday cage, and its excellent electrical conductivity prevents electrostatic charges from accumulating on the energetic material’s surface. The CA@PC composite energetic material has a CA content of 89.6%, and its electrostatic safety is nearly 30times that of pure CA(1.47 mJ compared to 0.05 mJ). CA@PC delivers an outstanding balance of safety and energy density compared to similar materials.展开更多
Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empi...Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empirical performance,the theoretical foundations of deep multi-modal learning have yet to be fully explored.In this paper,we will undertake a comprehensive survey of recent developments in multi-modal learning theories,focusing on the fundamental properties that govern this field.Our goal is to provide a thorough collection of current theoretical tools for analyzing multi-modal learning,to clarify their implications for practitioners,and to suggest future directions for the establishment of a solid theoretical foundation for deep multi-modal learning.展开更多
基金the support from the CIPHER Project(IIID 2018-008)funded by the Commission on Higher Education-Philippine California Advanced Research Institutes(CHED-PCARI)。
文摘Transition metal phosphides(TMPs)have been regarded as alternative hydrogen evolution reaction(HER)and oxygen evolution reaction(OER)catalysts owing to their comparable activity to those of noble metal-based catalysts.TMPs have been produced in various morphologies,including hollow and porous nanostructures,which are features deemed desirable for electrocatalytic materials.Templated synthesis routes are often responsible for such morphologies.This paper reviews the latest advances and existing challenges in the synthesis of TMP-based OER and HER catalysts through templated methods.A comprehensive review of the structure-property-performance of TMP-based HER and OER catalysts prepared using different templates is presented.The discussion proceeds according to application,first by HER and further divided among the types of templates used-from hard templates,sacrificial templates,and soft templates to the emerging dynamic hydrogen bubble template.OER catalysts are then reviewed and grouped according to their morphology.Finally,prospective research directions for the synthesis of hollow and porous TMP-based catalysts,such as improvements on both activity and stability of TMPs,design of environmentally benign templates and processes,and analysis of the reaction mechanism through advanced material characterization techniques and theoretical calculations,are suggested.
基金supported by a grant from the Basic Science Research Program through the National Research Foundation(NRF)(2021R1F1A1063634)funded by the Ministry of Science and ICT(MSIT),Republic of KoreaThe authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Group Funding Program Grant Code(NU/RG/SERC/13/40)+2 种基金Also,the authors are thankful to Prince Satam bin Abdulaziz University for supporting this study via funding from Prince Satam bin Abdulaziz University project number(PSAU/2024/R/1445)This work was also supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2023R54)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Road traffic monitoring is an imperative topic widely discussed among researchers.Systems used to monitor traffic frequently rely on cameras mounted on bridges or roadsides.However,aerial images provide the flexibility to use mobile platforms to detect the location and motion of the vehicle over a larger area.To this end,different models have shown the ability to recognize and track vehicles.However,these methods are not mature enough to produce accurate results in complex road scenes.Therefore,this paper presents an algorithm that combines state-of-the-art techniques for identifying and tracking vehicles in conjunction with image bursts.The extracted frames were converted to grayscale,followed by the application of a georeferencing algorithm to embed coordinate information into the images.The masking technique eliminated irrelevant data and reduced the computational cost of the overall monitoring system.Next,Sobel edge detection combined with Canny edge detection and Hough line transform has been applied for noise reduction.After preprocessing,the blob detection algorithm helped detect the vehicles.Vehicles of varying sizes have been detected by implementing a dynamic thresholding scheme.Detection was done on the first image of every burst.Then,to track vehicles,the model of each vehicle was made to find its matches in the succeeding images using the template matching algorithm.To further improve the tracking accuracy by incorporating motion information,Scale Invariant Feature Transform(SIFT)features have been used to find the best possible match among multiple matches.An accuracy rate of 87%for detection and 80%accuracy for tracking in the A1 Motorway Netherland dataset has been achieved.For the Vehicle Aerial Imaging from Drone(VAID)dataset,an accuracy rate of 86%for detection and 78%accuracy for tracking has been achieved.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFB3608600)the Beijing Municipal Science and Technology Commission(Grant No.Z181100004418009)the National Natural Science Foundation of China(Grant No.61927806)。
文摘As an ultra-wide bandgap semiconductor,diamond garners significant interest due to its exceptional physical properties^([1–3]).These superior characteristics make diamonds highly promising for applications in power electronics^([4]),deep-ultraviolet detectors^([5]),high-energy particle detectors^([6]),and quantum devices based on color centers^([7]).
基金funded by the National Natural Science Foundation of China(61991413)the China Postdoctoral Science Foundation(2019M651142)+1 种基金the Natural Science Foundation of Liaoning Province(2021-KF-12-07)the Natural Science Foundations of Liaoning Province(2023-MS-322).
文摘Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and recognition performance of the system can be enhanced through judiciously leveraging the correlation among multimodal features.Nevertheless,two issues persist in multi-modal feature fusion recognition:Firstly,the enhancement of recognition performance in fusion recognition has not comprehensively considered the inter-modality correlations among distinct modalities.Secondly,during modal fusion,improper weight selection diminishes the salience of crucial modal features,thereby diminishing the overall recognition performance.To address these two issues,we introduce an enhanced DenseNet multimodal recognition network founded on feature-level fusion.The information from the three modalities is fused akin to RGB,and the input network augments the correlation between modes through channel correlation.Within the enhanced DenseNet network,the Efficient Channel Attention Network(ECA-Net)dynamically adjusts the weight of each channel to amplify the salience of crucial information in each modal feature.Depthwise separable convolution markedly reduces the training parameters and further enhances the feature correlation.Experimental evaluations were conducted on four multimodal databases,comprising six unimodal databases,including multispectral palmprint and palm vein databases from the Chinese Academy of Sciences.The Equal Error Rates(EER)values were 0.0149%,0.0150%,0.0099%,and 0.0050%,correspondingly.In comparison to other network methods for palmprint,palm vein,and finger vein fusion recognition,this approach substantially enhances recognition performance,rendering it suitable for high-security environments with practical applicability.The experiments in this article utilized amodest sample database comprising 200 individuals.The subsequent phase involves preparing for the extension of the method to larger databases.
基金supported by the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the National Natural Science Foundation of China(Grant No.62302086).
文摘Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.
基金European Commission,Joint Research Center,Grant/Award Number:HUMAINTMinisterio de Ciencia e Innovación,Grant/Award Number:PID2020‐114924RB‐I00Comunidad de Madrid,Grant/Award Number:S2018/EMT‐4362 SEGVAUTO 4.0‐CM。
文摘Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of possible future trajectories can be consid-erable(multi-modal).Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpret-ability.Moreover,the metrics used in current benchmarks do not evaluate all aspects of the problem,such as the diversity and admissibility of the output.The authors aim to advance towards the design of trustworthy motion prediction systems,based on some of the re-quirements for the design of Trustworthy Artificial Intelligence.The focus is on evaluation criteria,robustness,and interpretability of outputs.First,the evaluation metrics are comprehensively analysed,the main gaps of current benchmarks are identified,and a new holistic evaluation framework is proposed.Then,a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system.To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework,an intent prediction layer that can be attached to multi-modal motion prediction models is proposed.The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions.The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autono-mous vehicles,advancing the field towards greater safety and reliability.
基金Project(2023JH26-10100002)supported by the Liaoning Science and Technology Major Project,ChinaProjects(U21A20117,52074085)supported by the National Natural Science Foundation of China+1 种基金Project(2022JH2/101300008)supported by the Liaoning Applied Basic Research Program Project,ChinaProject(22567612H)supported by the Hebei Provincial Key Laboratory Performance Subsidy Project,China。
文摘Mill vibration is a common problem in rolling production,which directly affects the thickness accuracy of the strip and may even lead to strip fracture accidents in serious cases.The existing vibration prediction models do not consider the features contained in the data,resulting in limited improvement of model accuracy.To address these challenges,this paper proposes a multi-dimensional multi-modal cold rolling vibration time series prediction model(MDMMVPM)based on the deep fusion of multi-level networks.In the model,the long-term and short-term modal features of multi-dimensional data are considered,and the appropriate prediction algorithms are selected for different data features.Based on the established prediction model,the effects of tension and rolling force on mill vibration are analyzed.Taking the 5th stand of a cold mill in a steel mill as the research object,the innovative model is applied to predict the mill vibration for the first time.The experimental results show that the correlation coefficient(R^(2))of the model proposed in this paper is 92.5%,and the root-mean-square error(RMSE)is 0.0011,which significantly improves the modeling accuracy compared with the existing models.The proposed model is also suitable for the hot rolling process,which provides a new method for the prediction of strip rolling vibration.
基金National College Students’Training Programs of Innovation and Entrepreneurship,Grant/Award Number:S202210022060the CACMS Innovation Fund,Grant/Award Number:CI2021A00512the National Nature Science Foundation of China under Grant,Grant/Award Number:62206021。
文摘Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.
文摘Intelligent personal assistants play a pivotal role in in-vehicle systems,significantly enhancing life efficiency,driving safety,and decision-making support.In this study,the multi-modal design elements of intelligent personal assistants within the context of visual,auditory,and somatosensory interactions with drivers were discussed.Their impact on the driver’s psychological state through various modes such as visual imagery,voice interaction,and gesture interaction were explored.The study also introduced innovative designs for in-vehicle intelligent personal assistants,incorporating design principles such as driver-centricity,prioritizing passenger safety,and utilizing timely feedback as a criterion.Additionally,the study employed design methods like driver behavior research and driving situation analysis to enhance the emotional connection between drivers and their vehicles,ultimately improving driver satisfaction and trust.
基金supported by the National Key Research and Development Project under Grant 2020YFB1807602Key Program of Marine Economy Development Special Foundation of Department of Natural Resources of Guangdong Province(GDNRC[2023]24)the National Natural Science Foundation of China under Grant 62271267.
文摘Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.
基金supported by the National Natural Science Foundation of China(Nos.U1804263,U1736214,62172435)the Zhongyuan Science and Technology Innovation Leading Talent Project(No.214200510019).
文摘The joint entity relation extraction model which integrates the semantic information of relation is favored by relevant researchers because of its effectiveness in solving the overlapping of entities,and the method of defining the semantic template of relation manually is particularly prominent in the extraction effect because it can obtain the deep semantic information of relation.However,this method has some problems,such as relying on expert experience and poor portability.Inspired by the rule-based entity relation extraction method,this paper proposes a joint entity relation extraction model based on a relation semantic template automatically constructed,which is abbreviated as RSTAC.This model refines the extraction rules of relation semantic templates from relation corpus through dependency parsing and realizes the automatic construction of relation semantic templates.Based on the relation semantic template,the process of relation classification and triplet extraction is constrained,and finally,the entity relation triplet is obtained.The experimental results on the three major Chinese datasets of DuIE,SanWen,and FinRE showthat the RSTAC model successfully obtains rich deep semantics of relation,improves the extraction effect of entity relation triples,and the F1 scores are increased by an average of 0.96% compared with classical joint extraction models such as CasRel,TPLinker,and RFBFN.
文摘BACKGROUND There is still considerable heterogeneity regarding which features of cryptoglandular anal fistula on magnetic resonance imaging(MRI)and endoanal ultrasound(EAUS)are relevant to surgical decision-making.As a con-sequence,the quality and completeness of the report are highly dependent on the training and experience of the examiners.AIM To develop a structured MRI and EAUS template(SMART)reporting the minimum dataset of information for the treatment of anal fistulas.METHODS This modified Delphi survey based on the RAND-UCLA appropriateness for consensus-building was conducted between May and August 2023.One hundred and fifty-one articles selected from a systematic review of the lite-rature formed the database to generate the evidence-based statements for the Delphi study.Fourteen questions were anonymously voted by an interdisciplinary multidisciplinary group for a maximum of three iterative rounds.The degree of agreement was scored on a numeric 0–10 scale.Group consensus was defined as a score≥8 for≥80%of the panelists.RESULTS Eleven scientific societies(3 radiological and 8 surgical)endorsed the study.After three rounds of voting,the experts(69 colorectal surgeons,23 radiologists,2 anatomists,and 1 gastroenterologist)achieved consensus for 12 of 14 statements(85.7%).Based on the results of the Delphi process,the six following features of anal fistulas were included in the SMART:Primary tract,secondary extension,internal opening,presence of collection,coexisting le-sions,and sphincters morphology.CONCLUSION A structured template,SMART,was developed to standardize imaging reporting of fistula-in-ano in a simple,systematic,time-efficient way,providing the minimum dataset of information and visual diagram useful to refer-ring physicians.
文摘Background: As the population age structure gradually ages, more and more elderly people were found to have pulmonary nodules during physical examinations. Most elderly people had underlying diseases such as heart, lung, brain and blood vessels and cannot tolerate surgery. Computed tomography (CT)-guided percutaneous core needle biopsy (CNB) was the first choice for pathological diagnosis and subsequent targeted drugs, immune drugs or ablation treatment. CT-guided percutaneous CNB requires clinicians with rich CNB experience to ensure high CNB accuracy, but it was easy to cause complications such as pneumothorax and hemorrhage. Three-dimensional (3D) printing coplanar template (PCT) combined with CT-guided percutaneous pulmonary CNB biopsy has been used in clinical practice, but there was no prospective, randomized controlled study. Methods: Elderly patients with lung nodules admitted to the Department of Oncology of our hospital from January 2019 to January 2023 were selected. A total of 225 elderly patients were screened, and 30 patients were included after screening. They were randomly divided into experimental group (Group A: 30 cases) and control group (Group B: 30 cases). Group A was given 3D-PCT combined with CT-guided percutaneous pulmonary CNB biopsy, Group B underwent CT-guided percutaneous pulmonary CNB. The primary outcome measure of this study was the accuracy of diagnostic CNB, and the secondary outcome measures were CNB time, number of CNB needles, number of pathological tissues and complications. Results: The diagnostic accuracy of group A and group B was 96.67% and 76.67%, respectively (P = 0.026). There were statistical differences between group A and group B in average CNB time (P = 0.001), number of CNB (1 vs more than 1, P = 0.029), and pathological tissue obtained by CNB (3 vs 1, P = 0.040). There was no statistical difference in the incidence of pneumothorax and hemorrhage between the two groups (P > 0.05). Conclusions: 3D-PCT combined with CT-guided percutaneous CNB can improve the puncture accuracy of elderly patients, shorten the puncture time, reduce the number of punctures, and increase the amount of puncture pathological tissue, without increasing pneumothorax and hemorrhage complications. We look forward to verifying this in a phase III randomized controlled clinical study. .
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
基金This work was supported in part by the Ministry National Key Research and Development Project(Grant No.2020AAA0108101)the National Natural Science Foundation of China(Grants No.62125101,62341101,62001018,and 62301011)+1 种基金Shandong Natural Science Foundation(Grant No.ZR2023YQ058)the New Cornerstone Science Foundation through the XPLORER PRIZE.The authors would like to thank Mengyuan Lu and Zengrui Han for their help in the construction of electromagnetic space in Wireless InSite simulation platform and Weibo Wen,Qi Duan,and Yong Yu for their help in the construction of phys ical space in AirSim simulation platform.
文摘The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research.This paper develops a novel simulation dataset,named M3SC,for mixed multi-modal(MMM)sensing-communication integration,and the generation framework of the M3SC dataset is further given.To obtain multimodal sensory data in physical space and communication data in electromagnetic space,we utilize Air-Sim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data.Furthermore,the in-depth integration and precise alignment of AirSim,WaveFarer,andWireless InSite are achieved.The M3SC dataset covers various weather conditions,multiplex frequency bands,and different times of the day.Currently,the M3SC dataset contains 1500 snapshots,including 80 RGB images,160 depth maps,80 LiDAR point clouds,256 sets of mmWave waveforms with 8 radar point clouds,and 72 channel impulse response(CIR)matrices per snapshot,thus totaling 120,000 RGB images,240,000 depth maps,120,000 LiDAR point clouds,384,000 sets of mmWave waveforms with 12,000 radar point clouds,and 108,000 CIR matrices.The data processing result presents the multi-modal sensory information and communication channel statistical properties.Finally,the MMM sensing-communication application,which can be supported by the M3SC dataset,is discussed.
基金supported by the National Natural Science Foundation of China(Grant Nos.21972068,21875112,and 22075290)the Nanjing IPE Institute of Green Manufacturing Industrythe Beijing Natural Science Foundation(Grant No.Z200012).
文摘The development of economical,efficient,and robust electrocatalysts toward the hydrogen evolution reaction(HER)is highly imperative for the rapid advancement of renewable H2 energy-associated technologies.Extensive utilization of the heterointerface effect can endow the catalysts with remarkably boosted electrocatalytic performance due to the modified electronic state of active sites.Herein,we demonstrate deliberate crafting of CoP/CoO heterojunction porous nanotubes(abbreviated as CoP/CoO PNTs hereafter)using a self-sacrificial template-engaged strategy.Precise control over the Kirkendall diffusion process of the presynthesized cobalt–aspartic acid complex nanowires is indispensable for the formation of CoP/CoO heterostructures.The topochemical transformation strategy of the reactive templates enables uniform and maximized construction of CoP/CoO heterojunctions throughout all the porous nanotubes.The establishment of CoP/CoO heterojunctions could considerably modify the electronic configuration of the active sites and also improve the electric conductivity,which endows the resultant CoP/CoO PNTs with enhanced intrinsic activity.Simultaneously,the hollow and porous nanotube architectures allow sufficient accessibility of exterior/interior surfaces and molecular permeability,drastically promoting the reaction kinetics.Consequently,when used as HER electrocatalysts,the well-designed CoP/CoO PNTs show Pt-like activity,with an overpotential of only 61 mV at 10mA cm^(−2) and excellent stability in 1.0M KOH medium,exceeding those of the vast majority of the previously reported nonprecious candidates.Density functional theory calculations further substantiate that the construction of CoP/CoO heterojunctions enables optimization of the Gibbs free energies for water adsorption and H adsorption,resulting in boosted HER intrinsic activity.The present study may provide in-depth insights into the fundamental mechanisms of heterojunction-induced electronic regulation,which may pave the way for the rational design of advanced Earth-abundant electrocatalysts in the future.
基金National Key R&D Program of China(No.2022ZD0118401).
文摘Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.
基金This work was supported by National Natural Science Foundation of China(No.62172308,No.U1626107,No.61972297,No.62172144,and No.62062019).
文摘Power Shell has been widely deployed in fileless malware and advanced persistent threat(APT)attacks due to its high stealthiness and live-off-theland technique.However,existing works mainly focus on deobfuscation and malicious detection,lacking the malicious Power Shell families classification and behavior analysis.Moreover,the state-of-the-art methods fail to capture fine-grained features and semantic relationships,resulting in low robustness and accuracy.To this end,we propose Power Detector,a novel malicious Power Shell script detector based on multimodal semantic fusion and deep learning.Specifically,we design four feature extraction methods to extract key features from character,token,abstract syntax tree(AST),and semantic knowledge graph.Then,we intelligently design four embeddings(i.e.,Char2Vec,Token2Vec,AST2Vec,and Rela2Vec) and construct a multi-modal fusion algorithm to concatenate feature vectors from different views.Finally,we propose a combined model based on transformer and CNN-Bi LSTM to implement Power Shell family detection.Our experiments with five types of Power Shell attacks show that PowerDetector can accurately detect various obfuscated and stealth PowerShell scripts,with a 0.9402 precision,a 0.9358 recall,and a 0.9374 F1-score.Furthermore,through singlemodal and multi-modal comparison experiments,we demonstrate that PowerDetector’s multi-modal embedding and deep learning model can achieve better accuracy and even identify more unknown attacks.
基金the financial support by Postgraduate Research & Practice Innovation Program from Jiangsu Science and Technology Department under Grant number KYCX19_0320。
文摘Copper azide(CA), as a primary explosive with high energy density, has not been practically used so far because of its high electrostatic sensitivity. The Cu2O@HKUST-1 core-shell structure hybrid material was synthesized by the “bottle around ship” methodology in this research by regulating the dissolution rate of Cu2O and the generation rate of metal-organic framework(MOF) materials. Cu2O@HKUST-1 was carbonized to form a Cu O@porous carbon(CuO@PC) composite material. CuO@PC was synthesized into a copper azide(CA) @PC composite energetic material through a gas-solid phase in-situ azidation reaction.CA is encapsulated in PC framework, which acts as a nanoscale Faraday cage, and its excellent electrical conductivity prevents electrostatic charges from accumulating on the energetic material’s surface. The CA@PC composite energetic material has a CA content of 89.6%, and its electrostatic safety is nearly 30times that of pure CA(1.47 mJ compared to 0.05 mJ). CA@PC delivers an outstanding balance of safety and energy density compared to similar materials.
基金Supported by Technology and Innovation Major Project of the Ministry of Science and Technology of China(2020AAA0108400, 2020AAA0108403)Tsinghua Precision Medicine Foundation(10001020109)。
文摘Deep multi-modal learning,a rapidly growing field with a wide range of practical applications,aims to effectively utilize and integrate information from multiple sources,known as modalities.Despite its impressive empirical performance,the theoretical foundations of deep multi-modal learning have yet to be fully explored.In this paper,we will undertake a comprehensive survey of recent developments in multi-modal learning theories,focusing on the fundamental properties that govern this field.Our goal is to provide a thorough collection of current theoretical tools for analyzing multi-modal learning,to clarify their implications for practitioners,and to suggest future directions for the establishment of a solid theoretical foundation for deep multi-modal learning.