期刊文献+

End-to-end dilated convolution network for document image semantic segmentation 被引量:8

基于膨胀卷积网络的端到端文档语义分割
下载PDF
导出
摘要 Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques. 本文采用膨胀卷积网络,实现端到端从文档图像中提取语义结构。膨胀卷积的优势在于提取多尺度上下文信息的同时,并不会损失空间分辨率。该模型使用带残差的膨胀卷积网络提取图像特征,并预测每个像素的类别标签。卷积部分作为特征提取器,能够获得多维度层级图像特征,反卷积部分输出全分辨率的语义预测结果。每个像素的概率值决定其语义类别标签。为了更好地理解分割粒度级别,实验设计了3组不同分割粒级数据集的测试。从文档细粒度到粗粒度级别的分割实验结果表明,语义数据分布的不平衡特点和网络深度都是影响该网络模型的重要因素。该模型可测试于人工智能教育平台英伟达Jetson Nano机器。
作者 XU Can-hui SHI Cao CHEN Yi-nong 许灿辉;史操;陈以农(School of Information Sciences and Technology,Qingdao University of Science and Technology,Qingdao 266061,China;School of Computing,Informatics and Decision Systems Engineering,Arizona State University,Tempe,AZ 85287-8809,USA)
出处 《Journal of Central South University》 SCIE EI CAS CSCD 2021年第6期1765-1774,共10页 中南大学学报(英文版)
基金 Project(61806107)supported by the National Natural Science Foundation of China Project supported by the Shandong Key Laboratory of Wisdom Mine Information Technology,China Project supported by the Opening Project of State Key Laboratory of Digital Publishing Technology,China。
关键词 semantic segmentation document images deep learning NVIDIA jetson nano 语义分割 文档图像 深度学习 英伟达Jetson Nano
  • 相关文献

同被引文献36

引证文献8

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部