Existing image captioning models usually build the relation between visual information and words to generate captions,which lack spatial infor-mation and object classes.To address the issue,we propose a novel Position...Existing image captioning models usually build the relation between visual information and words to generate captions,which lack spatial infor-mation and object classes.To address the issue,we propose a novel Position-Class Awareness Transformer(PCAT)network which can serve as a bridge between the visual features and captions by embedding spatial information and awareness of object classes.In our proposal,we construct our PCAT network by proposing a novel Grid Mapping Position Encoding(GMPE)method and refining the encoder-decoder framework.First,GMPE includes mapping the regions of objects to grids,calculating the relative distance among objects and quantization.Meanwhile,we also improve the Self-attention to adapt the GMPE.Then,we propose a Classes Semantic Quantization strategy to extract semantic information from the object classes,which is employed to facilitate embedding features and refining the encoder-decoder framework.To capture the interaction between multi-modal features,we propose Object Classes Awareness(OCA)to refine the encoder and decoder,namely OCAE and OCAD,respectively.Finally,we apply GMPE,OCAE and OCAD to form various combinations and to complete the entire PCAT.We utilize the MSCOCO dataset to evaluate the performance of our method.The results demonstrate that PCAT outperforms the other competitive methods.展开更多
The creation of the classic 1960s film Liu Sanjiewas a complex process, and the intentions and meanings quite varied. In essence, the localized, marginalized, and decentralized folk legend about Liu Sanjie was discove...The creation of the classic 1960s film Liu Sanjiewas a complex process, and the intentions and meanings quite varied. In essence, the localized, marginalized, and decentralized folk legend about Liu Sanjie was discovered and transformed into a representation of the Guangxi area and even of China proper--a product of its era's emphasis on folk culture and state-sponsored policies in the 1950s concerning the production of art and literature. Liu Sanjie underwent a series of adaptations from folk legend to Guangxi caiagao opera, music and dance opera, and finally to film. It was not just a product made under those policies, but also Liu Sanjie became a model for later revolutionary operas. This paper applies a textual narrative strategy to examine how artists, guided by the literature and art policies, incorporated, adapted, and reiterated the legend of Liu Sanjie to express gender awareness and class struggle.展开更多
基金supported by the National Key Research and Development Program of China[No.2021YFB2206200].
文摘Existing image captioning models usually build the relation between visual information and words to generate captions,which lack spatial infor-mation and object classes.To address the issue,we propose a novel Position-Class Awareness Transformer(PCAT)network which can serve as a bridge between the visual features and captions by embedding spatial information and awareness of object classes.In our proposal,we construct our PCAT network by proposing a novel Grid Mapping Position Encoding(GMPE)method and refining the encoder-decoder framework.First,GMPE includes mapping the regions of objects to grids,calculating the relative distance among objects and quantization.Meanwhile,we also improve the Self-attention to adapt the GMPE.Then,we propose a Classes Semantic Quantization strategy to extract semantic information from the object classes,which is employed to facilitate embedding features and refining the encoder-decoder framework.To capture the interaction between multi-modal features,we propose Object Classes Awareness(OCA)to refine the encoder and decoder,namely OCAE and OCAD,respectively.Finally,we apply GMPE,OCAE and OCAD to form various combinations and to complete the entire PCAT.We utilize the MSCOCO dataset to evaluate the performance of our method.The results demonstrate that PCAT outperforms the other competitive methods.
文摘The creation of the classic 1960s film Liu Sanjiewas a complex process, and the intentions and meanings quite varied. In essence, the localized, marginalized, and decentralized folk legend about Liu Sanjie was discovered and transformed into a representation of the Guangxi area and even of China proper--a product of its era's emphasis on folk culture and state-sponsored policies in the 1950s concerning the production of art and literature. Liu Sanjie underwent a series of adaptations from folk legend to Guangxi caiagao opera, music and dance opera, and finally to film. It was not just a product made under those policies, but also Liu Sanjie became a model for later revolutionary operas. This paper applies a textual narrative strategy to examine how artists, guided by the literature and art policies, incorporated, adapted, and reiterated the legend of Liu Sanjie to express gender awareness and class struggle.