Knowledge Enhanced Pre-Training Model for Vision-Language-Navigation Task 被引量：1

导出

摘要 Vision-Language-Navigation(VLN) task is a cross-modality task that combines natural language processing and computer vision. This task requires the agent to automatically move to the destination according to the natural language instruction and the observed surrounding visual information. To make the best decision, in every step during the navigation, the agent should pay more attention to understanding the objects, the object attributes, and the object relationships. But most current methods process all received textual and visual information equally. Therefore, this paper integrates more detailed semantic connections between visual and textual information through three pre-training tasks(object prediction, object attributes prediction, and object relationship prediction). The model will learn better fusion representation and alignment between these two types of information to improve the success rate(SR) and generalization. The experiments show that compared with the former baseline models, the SR on the unseen validation set(Val Unseen) increased by 7%, and the SR weighted by path length(SPL) increased by 7%;the SR on the test set(Test) increased 4%, SPL increased by 3%.

作者 HUANG Jitao ZENG Guohui HUANG Bo GAO Yongbin LIU Jin SHI Zhicai

机构地区 College of Electrical and Electronic Engineering Shanghai Key Laboratory of Integrated Administration Technologies for Information Security

出处《Wuhan University Journal of Natural Sciences》 CAS CSCD 2021年第2期147-155,共9页 武汉大学学报（自然科学英文版）

基金 Supported by the National Natural Science Foundation of China (62006150) Songjiang District Science and Technology Research Project (19SJKJGG83) Shanghai Young Science and Technology Talents Sailing Program (19YF1418400)。

关键词 pre-training cross-modality deep learning scene graph

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献7

1郭荣荣,闵素芹.“线上学习”舆情分析与在线教学提升策略[J].中国传媒大学学报（自然科学版）,2020,27(6):48-54. 被引量：2
2夏玉芹,单雪微.基于Python的简单文本情感分析[J].阴山学刊（自然科学版）,2018,32(4):58-62. 被引量：15
3曾小芹,余宏.基于Python的商品评论文本情感分析[J].电脑知识与技术,2020,16(8):181-183. 被引量：13
4余同瑞,金冉,韩晓臻,李家辉,郁婷.自然语言处理预训练模型的研究综述[J].计算机工程与应用,2020,56(23):12-22. 被引量：49
5王恩慧.基于情感倾向点互信息算法的情感分析方法研究[J].科学技术创新,2021(12):89-90. 被引量：1
6李杨,尹天光.高校学生评教异化原因分析及解决方法研究[J].教育教学论坛,2021(21):21-24. 被引量：2
7赵志升,靳晓松,温童童,梁俊花.基于Python-Snownlp的新闻评论数据分析[J].科技传播,2018,10(18):104-105. 被引量：11

引证文献1

1陈国心.情感分析在学生评教中的应用[J].创新教育研究,2021,9(4):1125-1132.

1韩悦,姜文凯,秦丽莉.概念型教学法的发展研究(2001-2020)——一项基于CiteSpace的可视化分析[J].牡丹江教育学院学报,2020(12):39-40.
2Zhai Pengfei,Xu Lijun,Nan Shuai,Li Weixing,Li Zongzhen,Ai Wensi,Hu Peipei,Zeng Jian,Zhang Shengxia,Liu Li,Liu Jie.Latent Tracks in Two TiO_(2) Polymorphs Induced by Swift Heavy Ions[J].IMP & HIRFL Annual Report,2019(1):105-106.
3Ye XIE,Al SAVVARISAL,Antonios TSOURDOS,Dan ZHANG,Jason GU.Review of hybrid electric powered aircraft,its conceptual design and energy management methodologies[J].Chinese Journal of Aeronautics,2021,34(4):432-450. 被引量：9
4Aly Abdelaziz,Giovanni Grasselli.How believable are published laboratory data?A deeper look into system-compliance and elastic modulus[J].Journal of Rock Mechanics and Geotechnical Engineering,2021,13(3):487-499.
5Messan Kokou Amedome,Yao Ako Patrick Mensah,Kokou Vonor,Nidain Maneh,Kossi Dzidzinyo,Kassoula Batomaguéla Nonon Saa,Koffi Didier Ayena,Komi Balo.Knowledge, Attitudes and Practices of Health Care Staff about Glaucoma in Lomé[J].Open Journal of Ophthalmology,2021,11(2):163-175.
6Chuangui Cao,Chengcheng Han,Qiang Lin.Creating Knowledge-Based Diagnostic Models by Mining Textual Diagnostic Reports of SPECT Scans[J].Journal of Computer and Communications,2021,9(5):10-19.
7Huayan Pei,Guanghui Yan,Huanmin Wang.Reputational preference and other-regarding preference based rewarding mechanism promotes cooperation in spatial social dilemmas[J].Chinese Physics B,2021,30(5):206-214.
8Wenyang Yi,Yufeng Lu,Suijuan Zhong,Mei Zhang,Le Sun,Hao Dong,Mengdi Wang,Min Wei,Haohuan Xie,Hongqiang Qu,Rongmei Peng,Jing Hong,Ziqin Yao,Yunyun Tong,Wei Wang,Qiang Ma,Zeyuan Liu,Yuqian Ma,Shouzhen Li,Chonghai Yin,Jianwei Liu,Chao Ma,Xiaoqun Wang,Qian Wu,Tian Xue.A single-cell transcriptome atlas of the aging human and macaque retina[J].National Science Review,2021,8(4):43-60. 被引量：1
9Dongyi Wang,Feifei Liu,Tong Liu,Shulin Sun,Qiong He,Lei Zhou.Efficient generation of complex vectorial optical fields with metasurfaces[J].Light(Science & Applications),2021,10(4):681-694. 被引量：14
10Nathaniel Christian-Miller,Andrew T Lenis,Katherine E Fero,Josef Madrigral,Sriram V Eleswarapu,Karim Chamie,Peyman Benharash.Risk factors for penile fracture compared with a surgical control cohort in the United States: the role of substance abuse[J].Asian Journal of Andrology,2021,23(3):236-239.

Wuhan University Journal of Natural Sciences

2021年第2期

浏览历史

内容加载中请稍等...

Knowledge Enhanced Pre-Training Model for Vision-Language-Navigation Task 被引量：1

同被引文献7

引证文献1

相关作者

相关机构

相关主题

浏览历史