Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and ...Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and recognition performance of the system can be enhanced through judiciously leveraging the correlation among multimodal features.Nevertheless,two issues persist in multi-modal feature fusion recognition:Firstly,the enhancement of recognition performance in fusion recognition has not comprehensively considered the inter-modality correlations among distinct modalities.Secondly,during modal fusion,improper weight selection diminishes the salience of crucial modal features,thereby diminishing the overall recognition performance.To address these two issues,we introduce an enhanced DenseNet multimodal recognition network founded on feature-level fusion.The information from the three modalities is fused akin to RGB,and the input network augments the correlation between modes through channel correlation.Within the enhanced DenseNet network,the Efficient Channel Attention Network(ECA-Net)dynamically adjusts the weight of each channel to amplify the salience of crucial information in each modal feature.Depthwise separable convolution markedly reduces the training parameters and further enhances the feature correlation.Experimental evaluations were conducted on four multimodal databases,comprising six unimodal databases,including multispectral palmprint and palm vein databases from the Chinese Academy of Sciences.The Equal Error Rates(EER)values were 0.0149%,0.0150%,0.0099%,and 0.0050%,correspondingly.In comparison to other network methods for palmprint,palm vein,and finger vein fusion recognition,this approach substantially enhances recognition performance,rendering it suitable for high-security environments with practical applicability.The experiments in this article utilized amodest sample database comprising 200 individuals.The subsequent phase involves preparing for the extension of the method to larger databases.展开更多
Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant resear...Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.展开更多
Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of po...Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of possible future trajectories can be consid-erable(multi-modal).Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpret-ability.Moreover,the metrics used in current benchmarks do not evaluate all aspects of the problem,such as the diversity and admissibility of the output.The authors aim to advance towards the design of trustworthy motion prediction systems,based on some of the re-quirements for the design of Trustworthy Artificial Intelligence.The focus is on evaluation criteria,robustness,and interpretability of outputs.First,the evaluation metrics are comprehensively analysed,the main gaps of current benchmarks are identified,and a new holistic evaluation framework is proposed.Then,a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system.To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework,an intent prediction layer that can be attached to multi-modal motion prediction models is proposed.The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions.The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autono-mous vehicles,advancing the field towards greater safety and reliability.展开更多
Mill vibration is a common problem in rolling production,which directly affects the thickness accuracy of the strip and may even lead to strip fracture accidents in serious cases.The existing vibration prediction mode...Mill vibration is a common problem in rolling production,which directly affects the thickness accuracy of the strip and may even lead to strip fracture accidents in serious cases.The existing vibration prediction models do not consider the features contained in the data,resulting in limited improvement of model accuracy.To address these challenges,this paper proposes a multi-dimensional multi-modal cold rolling vibration time series prediction model(MDMMVPM)based on the deep fusion of multi-level networks.In the model,the long-term and short-term modal features of multi-dimensional data are considered,and the appropriate prediction algorithms are selected for different data features.Based on the established prediction model,the effects of tension and rolling force on mill vibration are analyzed.Taking the 5th stand of a cold mill in a steel mill as the research object,the innovative model is applied to predict the mill vibration for the first time.The experimental results show that the correlation coefficient(R^(2))of the model proposed in this paper is 92.5%,and the root-mean-square error(RMSE)is 0.0011,which significantly improves the modeling accuracy compared with the existing models.The proposed model is also suitable for the hot rolling process,which provides a new method for the prediction of strip rolling vibration.展开更多
Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the intro...Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.展开更多
Intelligent personal assistants play a pivotal role in in-vehicle systems,significantly enhancing life efficiency,driving safety,and decision-making support.In this study,the multi-modal design elements of intelligent...Intelligent personal assistants play a pivotal role in in-vehicle systems,significantly enhancing life efficiency,driving safety,and decision-making support.In this study,the multi-modal design elements of intelligent personal assistants within the context of visual,auditory,and somatosensory interactions with drivers were discussed.Their impact on the driver’s psychological state through various modes such as visual imagery,voice interaction,and gesture interaction were explored.The study also introduced innovative designs for in-vehicle intelligent personal assistants,incorporating design principles such as driver-centricity,prioritizing passenger safety,and utilizing timely feedback as a criterion.Additionally,the study employed design methods like driver behavior research and driving situation analysis to enhance the emotional connection between drivers and their vehicles,ultimately improving driver satisfaction and trust.展开更多
Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the...Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.展开更多
Multi-modal 3D object detection has achieved remarkable progress,but it is often limited in practical industrial production because of its high cost and low efficiency.The multi-view camera-based method provides a fea...Multi-modal 3D object detection has achieved remarkable progress,but it is often limited in practical industrial production because of its high cost and low efficiency.The multi-view camera-based method provides a feasible solution due to its low cost.However,camera data lacks geometric depth,and only using camera data to obtain high accuracy is challenging.This paper proposes a multi-modal Bird-Eye-View(BEV)distillation framework(MMDistill)to make a trade-off between them.MMDistill is a carefully crafted two-stage distillation framework based on teacher and student models for learning cross-modal knowledge and generating multi-modal features.It can improve the performance of unimodal detectors without introducing additional costs during inference.Specifically,our method can effectively solve the cross-gap caused by the heterogeneity between data.Furthermore,we further propose a Light Detection and Ranging(LiDAR)-guided geometric compensation module,which can assist the student model in obtaining effective geometric features and reduce the gap between different modalities.Our proposed method generally requires fewer computational resources and faster inference speed than traditional multi-modal models.This advancement enables multi-modal technology to be applied more widely in practical scenarios.Through experiments,we validate the effectiveness and superiority of MMDistill on the nuScenes dataset,achieving an improvement of 4.1%mean Average Precision(mAP)and 4.6%NuScenes Detection Score(NDS)over the baseline detector.In addition,we also present detailed ablation studies to validate our method.展开更多
This paper is centered on the extent to which contemporary Chinese science fiction is related to ancient Chinese mythologies according to the previous scholarly discussion and how these ancient mythologies are utilize...This paper is centered on the extent to which contemporary Chinese science fiction is related to ancient Chinese mythologies according to the previous scholarly discussion and how these ancient mythologies are utilized specifically in the futuristic narratives of modern Chinese science fiction.By referring to academic dialogues,this paper argues that ancient mythologies can be recreated in modern science fiction and create modern mythologies in futuristic narratives to present or deal with modern human fears.Based on this argument,this paper then continues to explore what kinds of modern mythologies science fiction might deliver.The Chinese film The Wandering Earth(2019)will be discussed in terms of its mythological symbols and metaphors.This paper proposes a new approach through which to reconnect past stories with futuristic narratives and builds a frame in which to contextualize ancient mythologies in contemporary Chinese culture.展开更多
This article employs the theory of conceptual metaphor to conduct a thorough analysis and exploration of translation strategies for the metaphors in Song Ci. Ci, a treasure in ancient Chinese literature, makes frequen...This article employs the theory of conceptual metaphor to conduct a thorough analysis and exploration of translation strategies for the metaphors in Song Ci. Ci, a treasure in ancient Chinese literature, makes frequent use of metaphor. Its unique form of expression and abundant artistic effects often create difficulties in translation. This study selects some representative metaphor examples from Song Ci to explore the patterns and strategies in the translation process. The study reveals that in face of the metaphors in Song Ci, translators need to comprehensively apply strategies such as retaining the metaphor, replacing the metaphor, and translating the literal meaning to ensure the quality of translation while restoring the aesthetic charm of the original metaphor.展开更多
Poetry,as a crucial form of literary expression,often employs metaphor as a common rhetorical device.According to the perspective of cognitive metaphor theory,metaphor transcends its traditional linguistic boundaries ...Poetry,as a crucial form of literary expression,often employs metaphor as a common rhetorical device.According to the perspective of cognitive metaphor theory,metaphor transcends its traditional linguistic boundaries and is recognized as a profound cognitive mechanism,manifesting as a broader cognitive phenomenon.This article is based on metaphorical examples of“flowers”in Chinese and English poetry,carefully selecting representative cases for in-depth analysis.The aim is to compare the imagery of“flowers”in Chinese and English poetry,observe their similarities and differences,thereby fostering a better understanding of poetry in both languages.Through this study,we not only delve into the intricacies of metaphor within poetry but also shed light on the distinct interpretations of the symbol of“flowers”in different cultural contexts,expanding our appreciation for the cultural diversity inherent in poetry.展开更多
Using the multimodal metaphor theory,this article studies the multimodal metaphor of emotion.Emotions can be divided into positive emotions and negative emotions.Positive emotion metaphors include happiness metaphors ...Using the multimodal metaphor theory,this article studies the multimodal metaphor of emotion.Emotions can be divided into positive emotions and negative emotions.Positive emotion metaphors include happiness metaphors and love metaphors,while negative emotion metaphors include anger metaphors,fear metaphors and sadness metaphors.They intuitively represent the source domain through physical signs,sensory effects,orientation dynamics and physical presentation close to the actual life,and the emotional multimodal metaphors in emojis have narrative and social functions.展开更多
Based on the theory of Forceville’s multi-modal metaphor,this paper adopts qualitative and quantitative research methods to analyze 60 social safety ads both in China and America,trying to demonstrate the similaritie...Based on the theory of Forceville’s multi-modal metaphor,this paper adopts qualitative and quantitative research methods to analyze 60 social safety ads both in China and America,trying to demonstrate the similarities and differences between the chosen social safety ads in using multi-modal metaphor and discussing the factors that caused these differences.展开更多
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
Understanding modern poetry is a big problem for average readers because of all kinds of rhetorical devices filled with them,especially metaphors,which are hard for the readers to understand clearly.Thus,grasping meta...Understanding modern poetry is a big problem for average readers because of all kinds of rhetorical devices filled with them,especially metaphors,which are hard for the readers to understand clearly.Thus,grasping metaphors would provide a possible solution to enhance the understanding of modern poetry.Based on the motivation to better understand modern poetry,the paper explores the 11 mechanisms of metaphors in modern poetry with examples in detail,i.e.,blending,mapping,frame shifting,image schema,conceptual integration,contextual grounding,inter-subjectivity,embodied cognition,recursiveness,juxtaposition,shape-moulding,by borrowing some concepts from Cognitive Linguistics,literary studies,and Rhetorics,which can be adopted as means and methods to understand modern poems.展开更多
In contrast to traditional linguistics,which views metaphor primarily as a rhetorical device,cognitive linguistics sees it as a fundamental cognitive process inherent to human thoughts.The conceptual metaphor theory,p...In contrast to traditional linguistics,which views metaphor primarily as a rhetorical device,cognitive linguistics sees it as a fundamental cognitive process inherent to human thoughts.The conceptual metaphor theory,proposed by Lakoff and Johnson,continues to be a subject of intense scholarly debate.Numerous studies have examined conceptual metaphor from various perspectives,but research specifically focusing on advertising discourse is still limited.Advertising is constantly present in our lives,whether we are conscious of it or not.It holds a significant influence in our daily lives and communication.As we know that advertisers tend to make use of metaphors,and an analysis of their usages can provide valuable insights into their role in advertising discourse.The purpose of this paper is to extend the application of conceptual metaphor theory to the unique context of Macao’s casino advertising.Through a small-scale study,the author aims to identify types of metaphors used in casino advertising and gain an understanding of how they are strategically employed to attract customers.展开更多
The application of conceptual metaphor theories to the teaching of English reading and vocabulary provides a new field of research to foreign language teaching. This paper analyzes the possibility of metaphor teaching...The application of conceptual metaphor theories to the teaching of English reading and vocabulary provides a new field of research to foreign language teaching. This paper analyzes the possibility of metaphor teaching through a small scale case study. It shows that the individual difference should be emphasized in metaphor teaching. This paper also provides empirical study evidences for the further research in the future.展开更多
Metaphors can serve as cognitive instruments.In the process of conceptualization,metaphors would reflex the way people think and affect the way they perceive things.Conceptual metaphors are often used to help layman u...Metaphors can serve as cognitive instruments.In the process of conceptualization,metaphors would reflex the way people think and affect the way they perceive things.Conceptual metaphors are often used to help layman understand abstract and complex political issues better.The author studies Martin Luther King,Jr's famous speech "I Have a Dream" from the views of cognitive linguistics,and mainly analyzes the conceptual metaphors in this political speech.展开更多
"Metaphors We Live By"was written by George Lakoff and Mark Johnson in Chicago in 1980 published by University of Chicago Press. Lakoff and Johnson's"Metaphors We Live By"(1980) is an important..."Metaphors We Live By"was written by George Lakoff and Mark Johnson in Chicago in 1980 published by University of Chicago Press. Lakoff and Johnson's"Metaphors We Live By"(1980) is an important contribution to the study of metaphor.展开更多
The thesis aims to English idioms with metaphors.English idioms should be one of the special forms in English,which is one of the most popular English language ways and obtains the clear characteristics.The research h...The thesis aims to English idioms with metaphors.English idioms should be one of the special forms in English,which is one of the most popular English language ways and obtains the clear characteristics.The research hereby should be operated for better understanding.展开更多
基金funded by the National Natural Science Foundation of China(61991413)the China Postdoctoral Science Foundation(2019M651142)+1 种基金the Natural Science Foundation of Liaoning Province(2021-KF-12-07)the Natural Science Foundations of Liaoning Province(2023-MS-322).
文摘Fusing hand-based features in multi-modal biometric recognition enhances anti-spoofing capabilities.Additionally,it leverages inter-modal correlation to enhance recognition performance.Concurrently,the robustness and recognition performance of the system can be enhanced through judiciously leveraging the correlation among multimodal features.Nevertheless,two issues persist in multi-modal feature fusion recognition:Firstly,the enhancement of recognition performance in fusion recognition has not comprehensively considered the inter-modality correlations among distinct modalities.Secondly,during modal fusion,improper weight selection diminishes the salience of crucial modal features,thereby diminishing the overall recognition performance.To address these two issues,we introduce an enhanced DenseNet multimodal recognition network founded on feature-level fusion.The information from the three modalities is fused akin to RGB,and the input network augments the correlation between modes through channel correlation.Within the enhanced DenseNet network,the Efficient Channel Attention Network(ECA-Net)dynamically adjusts the weight of each channel to amplify the salience of crucial information in each modal feature.Depthwise separable convolution markedly reduces the training parameters and further enhances the feature correlation.Experimental evaluations were conducted on four multimodal databases,comprising six unimodal databases,including multispectral palmprint and palm vein databases from the Chinese Academy of Sciences.The Equal Error Rates(EER)values were 0.0149%,0.0150%,0.0099%,and 0.0050%,correspondingly.In comparison to other network methods for palmprint,palm vein,and finger vein fusion recognition,this approach substantially enhances recognition performance,rendering it suitable for high-security environments with practical applicability.The experiments in this article utilized amodest sample database comprising 200 individuals.The subsequent phase involves preparing for the extension of the method to larger databases.
基金supported by the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the National Natural Science Foundation of China(Grant No.62302086).
文摘Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.
基金European Commission,Joint Research Center,Grant/Award Number:HUMAINTMinisterio de Ciencia e Innovación,Grant/Award Number:PID2020‐114924RB‐I00Comunidad de Madrid,Grant/Award Number:S2018/EMT‐4362 SEGVAUTO 4.0‐CM。
文摘Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning.This task is very complex,as the behaviour of road agents depends on many factors and the number of possible future trajectories can be consid-erable(multi-modal).Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpret-ability.Moreover,the metrics used in current benchmarks do not evaluate all aspects of the problem,such as the diversity and admissibility of the output.The authors aim to advance towards the design of trustworthy motion prediction systems,based on some of the re-quirements for the design of Trustworthy Artificial Intelligence.The focus is on evaluation criteria,robustness,and interpretability of outputs.First,the evaluation metrics are comprehensively analysed,the main gaps of current benchmarks are identified,and a new holistic evaluation framework is proposed.Then,a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system.To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework,an intent prediction layer that can be attached to multi-modal motion prediction models is proposed.The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions.The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autono-mous vehicles,advancing the field towards greater safety and reliability.
基金Project(2023JH26-10100002)supported by the Liaoning Science and Technology Major Project,ChinaProjects(U21A20117,52074085)supported by the National Natural Science Foundation of China+1 种基金Project(2022JH2/101300008)supported by the Liaoning Applied Basic Research Program Project,ChinaProject(22567612H)supported by the Hebei Provincial Key Laboratory Performance Subsidy Project,China。
文摘Mill vibration is a common problem in rolling production,which directly affects the thickness accuracy of the strip and may even lead to strip fracture accidents in serious cases.The existing vibration prediction models do not consider the features contained in the data,resulting in limited improvement of model accuracy.To address these challenges,this paper proposes a multi-dimensional multi-modal cold rolling vibration time series prediction model(MDMMVPM)based on the deep fusion of multi-level networks.In the model,the long-term and short-term modal features of multi-dimensional data are considered,and the appropriate prediction algorithms are selected for different data features.Based on the established prediction model,the effects of tension and rolling force on mill vibration are analyzed.Taking the 5th stand of a cold mill in a steel mill as the research object,the innovative model is applied to predict the mill vibration for the first time.The experimental results show that the correlation coefficient(R^(2))of the model proposed in this paper is 92.5%,and the root-mean-square error(RMSE)is 0.0011,which significantly improves the modeling accuracy compared with the existing models.The proposed model is also suitable for the hot rolling process,which provides a new method for the prediction of strip rolling vibration.
基金National College Students’Training Programs of Innovation and Entrepreneurship,Grant/Award Number:S202210022060the CACMS Innovation Fund,Grant/Award Number:CI2021A00512the National Nature Science Foundation of China under Grant,Grant/Award Number:62206021。
文摘Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.
文摘Intelligent personal assistants play a pivotal role in in-vehicle systems,significantly enhancing life efficiency,driving safety,and decision-making support.In this study,the multi-modal design elements of intelligent personal assistants within the context of visual,auditory,and somatosensory interactions with drivers were discussed.Their impact on the driver’s psychological state through various modes such as visual imagery,voice interaction,and gesture interaction were explored.The study also introduced innovative designs for in-vehicle intelligent personal assistants,incorporating design principles such as driver-centricity,prioritizing passenger safety,and utilizing timely feedback as a criterion.Additionally,the study employed design methods like driver behavior research and driving situation analysis to enhance the emotional connection between drivers and their vehicles,ultimately improving driver satisfaction and trust.
基金supported by the National Key Research and Development Project under Grant 2020YFB1807602Key Program of Marine Economy Development Special Foundation of Department of Natural Resources of Guangdong Province(GDNRC[2023]24)the National Natural Science Foundation of China under Grant 62271267.
文摘Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.
基金supported by the National Natural Science Foundation of China(GrantNo.62302086)the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the Fundamental Research Funds for the Central Universities(Grant No.N2317005).
文摘Multi-modal 3D object detection has achieved remarkable progress,but it is often limited in practical industrial production because of its high cost and low efficiency.The multi-view camera-based method provides a feasible solution due to its low cost.However,camera data lacks geometric depth,and only using camera data to obtain high accuracy is challenging.This paper proposes a multi-modal Bird-Eye-View(BEV)distillation framework(MMDistill)to make a trade-off between them.MMDistill is a carefully crafted two-stage distillation framework based on teacher and student models for learning cross-modal knowledge and generating multi-modal features.It can improve the performance of unimodal detectors without introducing additional costs during inference.Specifically,our method can effectively solve the cross-gap caused by the heterogeneity between data.Furthermore,we further propose a Light Detection and Ranging(LiDAR)-guided geometric compensation module,which can assist the student model in obtaining effective geometric features and reduce the gap between different modalities.Our proposed method generally requires fewer computational resources and faster inference speed than traditional multi-modal models.This advancement enables multi-modal technology to be applied more widely in practical scenarios.Through experiments,we validate the effectiveness and superiority of MMDistill on the nuScenes dataset,achieving an improvement of 4.1%mean Average Precision(mAP)and 4.6%NuScenes Detection Score(NDS)over the baseline detector.In addition,we also present detailed ablation studies to validate our method.
文摘This paper is centered on the extent to which contemporary Chinese science fiction is related to ancient Chinese mythologies according to the previous scholarly discussion and how these ancient mythologies are utilized specifically in the futuristic narratives of modern Chinese science fiction.By referring to academic dialogues,this paper argues that ancient mythologies can be recreated in modern science fiction and create modern mythologies in futuristic narratives to present or deal with modern human fears.Based on this argument,this paper then continues to explore what kinds of modern mythologies science fiction might deliver.The Chinese film The Wandering Earth(2019)will be discussed in terms of its mythological symbols and metaphors.This paper proposes a new approach through which to reconnect past stories with futuristic narratives and builds a frame in which to contextualize ancient mythologies in contemporary Chinese culture.
文摘This article employs the theory of conceptual metaphor to conduct a thorough analysis and exploration of translation strategies for the metaphors in Song Ci. Ci, a treasure in ancient Chinese literature, makes frequent use of metaphor. Its unique form of expression and abundant artistic effects often create difficulties in translation. This study selects some representative metaphor examples from Song Ci to explore the patterns and strategies in the translation process. The study reveals that in face of the metaphors in Song Ci, translators need to comprehensively apply strategies such as retaining the metaphor, replacing the metaphor, and translating the literal meaning to ensure the quality of translation while restoring the aesthetic charm of the original metaphor.
文摘Poetry,as a crucial form of literary expression,often employs metaphor as a common rhetorical device.According to the perspective of cognitive metaphor theory,metaphor transcends its traditional linguistic boundaries and is recognized as a profound cognitive mechanism,manifesting as a broader cognitive phenomenon.This article is based on metaphorical examples of“flowers”in Chinese and English poetry,carefully selecting representative cases for in-depth analysis.The aim is to compare the imagery of“flowers”in Chinese and English poetry,observe their similarities and differences,thereby fostering a better understanding of poetry in both languages.Through this study,we not only delve into the intricacies of metaphor within poetry but also shed light on the distinct interpretations of the symbol of“flowers”in different cultural contexts,expanding our appreciation for the cultural diversity inherent in poetry.
文摘Using the multimodal metaphor theory,this article studies the multimodal metaphor of emotion.Emotions can be divided into positive emotions and negative emotions.Positive emotion metaphors include happiness metaphors and love metaphors,while negative emotion metaphors include anger metaphors,fear metaphors and sadness metaphors.They intuitively represent the source domain through physical signs,sensory effects,orientation dynamics and physical presentation close to the actual life,and the emotional multimodal metaphors in emojis have narrative and social functions.
文摘Based on the theory of Forceville’s multi-modal metaphor,this paper adopts qualitative and quantitative research methods to analyze 60 social safety ads both in China and America,trying to demonstrate the similarities and differences between the chosen social safety ads in using multi-modal metaphor and discussing the factors that caused these differences.
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
文摘Understanding modern poetry is a big problem for average readers because of all kinds of rhetorical devices filled with them,especially metaphors,which are hard for the readers to understand clearly.Thus,grasping metaphors would provide a possible solution to enhance the understanding of modern poetry.Based on the motivation to better understand modern poetry,the paper explores the 11 mechanisms of metaphors in modern poetry with examples in detail,i.e.,blending,mapping,frame shifting,image schema,conceptual integration,contextual grounding,inter-subjectivity,embodied cognition,recursiveness,juxtaposition,shape-moulding,by borrowing some concepts from Cognitive Linguistics,literary studies,and Rhetorics,which can be adopted as means and methods to understand modern poems.
文摘In contrast to traditional linguistics,which views metaphor primarily as a rhetorical device,cognitive linguistics sees it as a fundamental cognitive process inherent to human thoughts.The conceptual metaphor theory,proposed by Lakoff and Johnson,continues to be a subject of intense scholarly debate.Numerous studies have examined conceptual metaphor from various perspectives,but research specifically focusing on advertising discourse is still limited.Advertising is constantly present in our lives,whether we are conscious of it or not.It holds a significant influence in our daily lives and communication.As we know that advertisers tend to make use of metaphors,and an analysis of their usages can provide valuable insights into their role in advertising discourse.The purpose of this paper is to extend the application of conceptual metaphor theory to the unique context of Macao’s casino advertising.Through a small-scale study,the author aims to identify types of metaphors used in casino advertising and gain an understanding of how they are strategically employed to attract customers.
文摘The application of conceptual metaphor theories to the teaching of English reading and vocabulary provides a new field of research to foreign language teaching. This paper analyzes the possibility of metaphor teaching through a small scale case study. It shows that the individual difference should be emphasized in metaphor teaching. This paper also provides empirical study evidences for the further research in the future.
文摘Metaphors can serve as cognitive instruments.In the process of conceptualization,metaphors would reflex the way people think and affect the way they perceive things.Conceptual metaphors are often used to help layman understand abstract and complex political issues better.The author studies Martin Luther King,Jr's famous speech "I Have a Dream" from the views of cognitive linguistics,and mainly analyzes the conceptual metaphors in this political speech.
文摘"Metaphors We Live By"was written by George Lakoff and Mark Johnson in Chicago in 1980 published by University of Chicago Press. Lakoff and Johnson's"Metaphors We Live By"(1980) is an important contribution to the study of metaphor.
文摘The thesis aims to English idioms with metaphors.English idioms should be one of the special forms in English,which is one of the most popular English language ways and obtains the clear characteristics.The research hereby should be operated for better understanding.