针对输入的图像视觉信息不能在每一步解码过程中动态调整,同时为了提高图像语义描述模型的精度和泛化能力,提出了一种结合引导解码和视觉注意力机制的双层长短时记忆(long short term memory,LSTM)网络的图像语义描述模型。将提取到的...针对输入的图像视觉信息不能在每一步解码过程中动态调整,同时为了提高图像语义描述模型的精度和泛化能力,提出了一种结合引导解码和视觉注意力机制的双层长短时记忆(long short term memory,LSTM)网络的图像语义描述模型。将提取到的图像的视觉和目标特征通过一个引导网络建模后送入LSTM网络的每一时刻,实现端到端的训练过程;同时设计了基于图像通道特征的视觉注意力机制,提高了模型对图像细节部分的描述。利用MSCOCO和Flickr30k数据集对模型进行了训练和测试,结果显示模型性能在不同的评价指标上都得到了提升。展开更多
A new depth resampling for multi-view coding is proposed in this paper.At first,the depth video is downsampled by median filtering before encoding.After decoding,the classified edges,including credible edge and probab...A new depth resampling for multi-view coding is proposed in this paper.At first,the depth video is downsampled by median filtering before encoding.After decoding,the classified edges,including credible edge and probable edge from the aligned texture image and the depth image,are interpolated by the selected diagonal pair,whose intensity difference is the minimum among four diagonal pairs around edge pixel.According to different category of edge,the intensity difference is measured by either real depth or percentage depth without any parameter setting.Finally,the resampled depth video and the decoded full-resolution texture video are synthesized into virtual views for the performance evaluation.Experiments on the platform of multi-view high efficiency video coding(HEVC) demonstrate that the proposed method is superior to the contrastive methods in terms of visual quality and rate distortion(RD) performance.展开更多
文摘针对输入的图像视觉信息不能在每一步解码过程中动态调整,同时为了提高图像语义描述模型的精度和泛化能力,提出了一种结合引导解码和视觉注意力机制的双层长短时记忆(long short term memory,LSTM)网络的图像语义描述模型。将提取到的图像的视觉和目标特征通过一个引导网络建模后送入LSTM网络的每一时刻,实现端到端的训练过程;同时设计了基于图像通道特征的视觉注意力机制,提高了模型对图像细节部分的描述。利用MSCOCO和Flickr30k数据集对模型进行了训练和测试,结果显示模型性能在不同的评价指标上都得到了提升。
基金supported by the National Natural Science Foundation of China(Nos.61401132 and 61372157)the Zhejiang Provincial Natural Science Foundation of China(No.LY12F01007)
文摘A new depth resampling for multi-view coding is proposed in this paper.At first,the depth video is downsampled by median filtering before encoding.After decoding,the classified edges,including credible edge and probable edge from the aligned texture image and the depth image,are interpolated by the selected diagonal pair,whose intensity difference is the minimum among four diagonal pairs around edge pixel.According to different category of edge,the intensity difference is measured by either real depth or percentage depth without any parameter setting.Finally,the resampled depth video and the decoded full-resolution texture video are synthesized into virtual views for the performance evaluation.Experiments on the platform of multi-view high efficiency video coding(HEVC) demonstrate that the proposed method is superior to the contrastive methods in terms of visual quality and rate distortion(RD) performance.