摘要
Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.
本文采用膨胀卷积网络,实现端到端从文档图像中提取语义结构。膨胀卷积的优势在于提取多尺度上下文信息的同时,并不会损失空间分辨率。该模型使用带残差的膨胀卷积网络提取图像特征,并预测每个像素的类别标签。卷积部分作为特征提取器,能够获得多维度层级图像特征,反卷积部分输出全分辨率的语义预测结果。每个像素的概率值决定其语义类别标签。为了更好地理解分割粒度级别,实验设计了3组不同分割粒级数据集的测试。从文档细粒度到粗粒度级别的分割实验结果表明,语义数据分布的不平衡特点和网络深度都是影响该网络模型的重要因素。该模型可测试于人工智能教育平台英伟达Jetson Nano机器。
作者
XU Can-hui
SHI Cao
CHEN Yi-nong
许灿辉;史操;陈以农(School of Information Sciences and Technology,Qingdao University of Science and Technology,Qingdao 266061,China;School of Computing,Informatics and Decision Systems Engineering,Arizona State University,Tempe,AZ 85287-8809,USA)
基金
Project(61806107)supported by the National Natural Science Foundation of China
Project supported by the Shandong Key Laboratory of Wisdom Mine Information Technology,China
Project supported by the Opening Project of State Key Laboratory of Digital Publishing Technology,China。