Fine-Grained Features for Image Captioning

下载PDF

导出

摘要 Image captioning involves two different major modalities(image and sentence)that convert a given image into a language that adheres to visual semantics.Almost all methods first extract image features to reduce the difficulty of visual semantic embedding and then use the caption model to generate fluent sentences.The Convolutional Neural Network(CNN)is often used to extract image features in image captioning,and the use of object detection networks to extract region features has achieved great success.However,the region features retrieved by this method are object-level and do not pay attention to fine-grained details because of the detection model’s limitation.We offer an approach to address this issue that more properly generates captions by fusing fine-grained features and region features.First,we extract fine-grained features using a panoramic segmentation algorithm.Second,we suggest two fusion methods and contrast their fusion outcomes.An X-linear Attention Network(X-LAN)serves as the foundation for both fusion methods.According to experimental findings on the COCO dataset,the two-branch fusion approach is superior.It is important to note that on the COCO Karpathy test split,CIDEr is increased up to 134.3%in comparison to the baseline,highlighting the potency and viability of our method.

作者 Mengyue Shao Jie Feng Jie Wu Haixiang Zhang Yayu Zheng

机构地区 Zhejiang Sci-Tech University Zhejiang University of Technology

出处《Computers, Materials & Continua》 SCIE EI 2023年第6期4697-4712,共16页 计算机、材料和连续体（英文）

基金 supported in part by the National Natural Science Foundation of China(NSFC)under Grant 6150140 in part by the Youth Innovation Project(21032158-Y)of Zhejiang Sci-Tech University.

关键词 Image captioning region features fine-grained features FUSION

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Zhi Xu,Le Li,Xiaoxia Chen,Chenhong Fang,Guyu Xiao.Mesoporous Zeolitic Imidazolate Frameworks[J].CCS Chemistry,2022,4(9):2906-2913.
2Feng Yan,Wushouer Silamu,Yanbing Li.MVCE-Net: Multi-View Region Feature and Caption Enhancement Co-Attention Network for Visual Question Answering[J].Computers, Materials & Continua,2023(7):65-80.
3郎垚璞,刘庆纲,王奇,周兴林,贾光一.Method of measuring one-dimensional photonic crystal period-structure-film thickness based on Bloch surface wave enhanced Goos–H?nchen shift[J].Chinese Physics B,2023,32(1):545-552.
4Qian Hong,Yue Wang,Huan Li,Yuechen Zhao,Weiyu Guo,Xiuli Wang.Probing Filters to Interpret CNN Semantic Configurations by Occlusion[J].国际计算机前沿大会会议论文集,2021(2):103-115.
5Shangfei Yao,Linji Yang,Shasha Shi,Yang Zhou,Min Long,Wenhao Zhang,Songlin Cai,Ciyuan Huang,Tao Liu,Bingsuo Zou.A Two-in-One Annealing Enables Dopant Free Block Copolymer Based Organic Solar Cells with over 16%Efficiency[J].Chinese Journal of Chemistry,2023,41(7):877-877.
6Saman Riaz,Ali Arshad,Shahab S.Band,Amir Mosavi.Deep Bimodal Fusion Approach for Apparent Personality Analysis[J].Computers, Materials & Continua,2023(4):2301-2312.
7Bo Zeng,Chengqun Song,Cheng Jun,Yuhang Kang.DFPC-SLAM:A Dynamic Feature Point Constraints-Based SLAM Using Stereo Vision for Dynamic Environment[J].Guidance, Navigation and Control,2023,3(1):46-60.
8Zhiping Liang.Fake News Detection Based on Multimodal Inputs[J].Computers, Materials & Continua,2023(5):4519-4534. 被引量：1
9Imoleayo Ezekiel GBODE,Toju Esther BABALOLA,Gulilat Tefera DIRO,Joseph Daniel INTSIFUL.Assessment of ERA5 and ERA-Interim in Reproducing Mean and Extreme Climates over West Africa[J].Advances in Atmospheric Sciences,2023,40(4):570-586. 被引量：1
10唐渔,何志琴,周宇辉,吴钦木,王霄.基于Se-ResNet50特征编码器的公共环境图像描述生成[J].计算机应用研究,2023,40(6):1864-1869. 被引量：4

Computers, Materials & Continua

2023年第6期

浏览历史

内容加载中请稍等...

Fine-Grained Features for Image Captioning

相关作者

相关机构

相关主题

浏览历史