摘要
为快速生成准确描述图片内容的语句,提出语义分割和卷积神经网络(convolutional neural network,CNN)相结合的图像描述方法。将图像分类模型和语义分割模型结合为编码器,增强对图像语义信息的利用,采用CNN代替长短时记忆网络(long short term memory,LSTM)作为解码器生成完整描述性语句。通过在MSCOCO数据集上与5种主流算法的对比实验可知,以CNN作为解码器能够大幅提高解码速度,语义信息的增强能够有效提高实验精度,验证了该方法的有效性和可行性。
To quickly generate sentences that accurately describe the content of a picture,an image description method combining semantic segmentation and convolutional neural network(CNN)was proposed.The image classification model and semantic segmentation model were combined into an encoder to enhance the use of image semantic information,and CNN was used instead of long short term memory(LSTM)as a decoder to generate complete descriptive sentences.By comparing experiments with five mainstream algorithms on the MSCOCO data set,it can be seen that using CNN as a decoder can greatly increase the decoding speed,and the enhancement of semantic information can also effectively improve the experimental accuracy,which verifies the effectiveness and feasibility of the method.
作者
李永生
颜秉勇
周家乐
LI Yong-sheng;YAN Bing-yong;ZHOU Jia-le(College of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
出处
《计算机工程与设计》
北大核心
2023年第1期210-217,共8页
Computer Engineering and Design
基金
国家自然科学基金青年基金项目(61906068)。
关键词
图像描述
语义分割
卷积神经网络
编码器
语义信息
长短时记忆网络
解码速度
image description
semantic segmentation
convolutional neural network
encoder
semantic information
long and short-term memory network
decoding speed