Cyclic Autoencoder for Multimodal Data Alignment Using Custom Datasets

下载PDF

导出

摘要 The subtitle recognition under multimodal data fusion in this paper aims to recognize text lines from image and audio data.Most existing multimodal fusion methods tend to be associated with pre-fusion as well as post-fusion,which is not reasonable and difficult to interpret.We believe that fusing images and audio before the decision layer,i.e.,intermediate fusion,to take advantage of the complementary multimodal data,will benefit text line recognition.To this end,we propose:(i)a novel cyclic autoencoder based on convolutional neural network.The feature dimensions of the two modal data are aligned under the premise of stabilizing the compressed image features,thus the high-dimensional features of different modal data are fused at the shallow level of the model.(ii)A residual attention mechanism that helps us improve the performance of the recognition.Regions of interest in the image are enhanced and regions of disinterest are weakened,thus we can extract the features of the text regions without further increasing the depth of the model(iii)a fully convolutional network for video subtitle recognition.We choose DenseNet-121 as the backbone network for feature extraction,which effectively enabling the recognition of video subtitles in complex backgrounds.The experiments are performed on our custom datasets,and the automatic and manual evaluation results show that our method reaches the state-of-the-art.

作者 Zhenyu Tang Jin Liu Chao Yu Y.Ken Wang

机构地区 College of Information Engineering Division of Management and Education

出处《Computer Systems Science & Engineering》 SCIE EI 2021年第10期37-54,共18页 计算机系统科学与工程（英文）

基金 This work is supported by the National Natural Science Foundation of China(61872231).

关键词 Deep learning convolutional neural network MULTIMODAL text recognition

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Shuo Wang,Zhongtao Shen,Shuwen Wang,Changqing Feng,Shubin Liu.Clock and data alignment scheme for readout electronics prototype of PandaX-nT[J].Radiation Detection Technology and Methods,2021,5(2):161-167.
2Li Yu,Peiyuan Qiu,Xiliang Liu,Feng Lu,Bo Wan.A holistic approach to aligning geospatial data with multidimensional similarity measuring[J].International Journal of Digital Earth,2018,11(8):845-862. 被引量：4
3Hongyu Lin,Feng Jiang,Yu Jiang,Huiyin Luo,Jian Yao,Jiaxin Liu.A Model for Helmet-Wearing Detection of Non-Motor Drivers Based on YOLOv5s[J].Computers, Materials & Continua,2023(6):5321-5336. 被引量：2
4Zhiyuan Ren,Chen Xing.Diabetic Retinopathy Classification Based on Bilinear Cross Attention Network[J].Journal of Computer and Communications,2023,11(5):16-28.
5Apurva Sonavane,Aditya Khamparia,Deepak Gupta.A Systematic Review on the Internet of Medical Things:Techniques,Open Issues,and Future Directions[J].Computer Modeling in Engineering & Sciences,2023(11):1525-1550. 被引量：1
6Mohammad Amaz Uddin,Mohammad Salah Uddin Chowdury,Mayeen Uddin Khandaker,Nissren Tamam,Abdelmoneim Sulieman.The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition[J].Computers, Materials & Continua,2023(1):1709-1722. 被引量：1
7Peicheng Shi,Zhiqiang Liu,Heng Qi,Aixi Yang.MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection[J].Computers, Materials & Continua,2023(6):5615-5637.
8Yuxuan Zhang,Junfeng Wu,Hong Yu,Shihao Guo,Yizhi Zhou,Jing Li.A Novel Fish Counting Method Based on Multiscale and Multicolumn Convolution Group Network[J].国际计算机前沿大会会议论文集,2022(1):339-353.
9R.D.Dhaniya,K.M.Umamaheswari.CNN-LSTM: A Novel Hybrid Deep Neural Network Model for Brain Tumor Classification[J].Intelligent Automation & Soft Computing,2023(7):1129-1143.
10周炎,王牧阳.基于多重单应性变换的改进ViBe算法[J].测绘地理信息,2023,48(3):60-64.

Computer Systems Science & Engineering

2021年第10期

浏览历史

内容加载中请稍等...

Cyclic Autoencoder for Multimodal Data Alignment Using Custom Datasets

相关作者

相关机构

相关主题

浏览历史