PCATNet: Position-Class Awareness Transformer for Image Captioning

下载PDF

导出

摘要 Existing image captioning models usually build the relation between visual information and words to generate captions,which lack spatial infor-mation and object classes.To address the issue,we propose a novel Position-Class Awareness Transformer(PCAT)network which can serve as a bridge between the visual features and captions by embedding spatial information and awareness of object classes.In our proposal,we construct our PCAT network by proposing a novel Grid Mapping Position Encoding(GMPE)method and refining the encoder-decoder framework.First,GMPE includes mapping the regions of objects to grids,calculating the relative distance among objects and quantization.Meanwhile,we also improve the Self-attention to adapt the GMPE.Then,we propose a Classes Semantic Quantization strategy to extract semantic information from the object classes,which is employed to facilitate embedding features and refining the encoder-decoder framework.To capture the interaction between multi-modal features,we propose Object Classes Awareness(OCA)to refine the encoder and decoder,namely OCAE and OCAD,respectively.Finally,we apply GMPE,OCAE and OCAD to form various combinations and to complete the entire PCAT.We utilize the MSCOCO dataset to evaluate the performance of our method.The results demonstrate that PCAT outperforms the other competitive methods.

作者 Ziwei Tang Yaohua Yi Changhui Yu Aiguo Yin

机构地区 Research Center of Graphic Communication School of Remote Sensing and Information Engineering Zhuhai Pantum Electronics Co.

出处《Computers, Materials & Continua》 SCIE EI 2023年第6期6007-6022,共16页 计算机、材料和连续体（英文）

基金 supported by the National Key Research and Development Program of China[No.2021YFB2206200].

关键词 Image captioning relative position encoding object classes awareness

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Bing Zhou,Junjun Zou,Xiaojian Wu,Tian Chai,Renjie Zhou,Qianxi Pan,Ruixuan Zhou.Research on the improvement of the LaneGCN trajectory prediction algorithm[J].Transportation Safety and Environment,2022,4(4):121-129. 被引量：2
2Bailin YANG,Changrui ZHU,Frederick WBLI,Tianxiang WEI,Xiaohui LIANG,Qingxu WANG.IAACS: Image aesthetic assessment through color composition and space formation[J].Virtual Reality & Intelligent Hardware,2023,5(1):42-56.
3Neelam Mughees,Mujtaba Hussain Jaffery,Abdullah Mughees,Anam Mughees,Krzysztof Ejsmont.Bi-LSTM-Based Deep Stacked Sequence-to-Sequence Autoencoder for Forecasting Solar Irradiation and Wind Speed[J].Computers, Materials & Continua,2023(6):6375-6393.
4V.Arulalan,Dhananjay Kumar.Efficient Object Detection and Classification Approach Using HTYOLOV4 and M^(2)RFO-CNN[J].Computer Systems Science & Engineering,2023,44(2):1703-1717.
5Xueqiang Guo,Hanqing Yang,Mohan Wei,Xiaotong Ye,Yu Zhang.Few-shot object detection via class encoding and multi-target decoding[J].IET Cyber-Systems and Robotics,2023,5(2):1-14.
6茅天阳,赵文韬,王景川,陈卫东.Lidar-Visual-Inertial Odometry with Online Extrinsic Calibration[J].Journal of Shanghai Jiaotong university(Science),2023,28(1):70-76.
7Sangyun Lee,Junekoo Kang,Hyunki Hong.Clustering Reference Images Based on Covisibility for Visual Localization[J].Computers, Materials & Continua,2023(5):2705-2725.
8MA DI.Will AI Take Our Jobs?[J].China Today,2023,72(7):48-49.
9Muhammad Aamir,Ziaur Rahman,Waheed Ahmed Abro,Uzair Aslam Bhatti,Zaheer Ahmed Dayo,Muhammad Ishfaq.A Progressive Approach to Generic Object Detection: A Two-Stage Framework for Image Recognition[J].Computers, Materials & Continua,2023(6):6351-6373.
10夏威怡.校园定向运动在中小学体育教学中的运用[J].运动-休闲（大众体育）,2023(9):79-81.

Computers, Materials & Continua

2023年第6期

浏览历史

内容加载中请稍等...

PCATNet: Position-Class Awareness Transformer for Image Captioning

相关作者

相关机构

相关主题

浏览历史