End-to-end dilated convolution network for document image semantic segmentation 被引量：8

基于膨胀卷积网络的端到端文档语义分割

下载PDF

导出

摘要 Semantic segmentation is a crucial step for document understanding.In this paper,an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming.To extract semantic structures from document images,we present an end-to-end dilated convolution network architecture.Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution.Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels.The convolution part works as feature extractor to obtain multidimensional and hierarchical image features.The consecutive deconvolution is used for producing full resolution segmentation prediction.The probability of each pixel decides its predefined semantic class label.To understand segmentation granularity,we compare performances at three different levels.From fine grained class to coarse class levels,the proposed dilated convolution network architecture is evaluated on three document datasets.The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances.The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques. 本文采用膨胀卷积网络,实现端到端从文档图像中提取语义结构。膨胀卷积的优势在于提取多尺度上下文信息的同时,并不会损失空间分辨率。该模型使用带残差的膨胀卷积网络提取图像特征,并预测每个像素的类别标签。卷积部分作为特征提取器,能够获得多维度层级图像特征,反卷积部分输出全分辨率的语义预测结果。每个像素的概率值决定其语义类别标签。为了更好地理解分割粒度级别,实验设计了3组不同分割粒级数据集的测试。从文档细粒度到粗粒度级别的分割实验结果表明,语义数据分布的不平衡特点和网络深度都是影响该网络模型的重要因素。该模型可测试于人工智能教育平台英伟达Jetson Nano机器。

作者 XU Can-hui SHI Cao CHEN Yi-nong 许灿辉;史操;陈以农(School of Information Sciences and Technology,Qingdao University of Science and Technology,Qingdao 266061,China;School of Computing,Informatics and Decision Systems Engineering,Arizona State University,Tempe,AZ 85287-8809,USA)

机构地区 School of Information Sciences and Technology School of Computing

出处《Journal of Central South University》 SCIE EI CAS CSCD 2021年第6期1765-1774,共10页 中南大学学报（英文版）

基金 Project(61806107)supported by the National Natural Science Foundation of China Project supported by the Shandong Key Laboratory of Wisdom Mine Information Technology,China Project supported by the Opening Project of State Key Laboratory of Digital Publishing Technology,China。

关键词 semantic segmentation document images deep learning NVIDIA jetson nano 语义分割文档图像深度学习英伟达Jetson Nano

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献36

1应自炉,赵毅鸿,宣晨,邓文博.多特征融合的文档图像版面分析[J].中国图象图形学报,2020,0(2):311-320. 被引量：7
2LI Jianping,DU Changlong,BAO Jianwei.Direct-impact of sieving coal and gangue[J].Mining Science and Technology,2010,20(4):611-614. 被引量：21
3陈珊,郑琴,秦绪佳.基于二维经验模式分解的医学图像融合方法[J].计算机系统应用,2014,23(6):105-110. 被引量：4
4钱鸣高,许家林,王家臣.再论煤炭的科学开采[J].煤炭学报,2018,43(1):1-13. 被引量：308
5袁亮,张农,阚甲广,王洋.我国绿色煤炭资源量概念、模型及预测[J].中国矿业大学学报,2018,47(1):1-8. 被引量：114
6白翔,杨明锟,石葆光,廖明辉.基于深度学习的场景文字检测与识别[J].中国科学：信息科学,2018,48(5):531-544. 被引量：35
7王一霖,万华森,曾鹏.基于仿真平台的自动驾驶汽车转向控制方法优化研究[J].软件导刊,2018,17(12):29-33. 被引量：3
8Chunwei Tian,Yong Xu,Lunke Fei,Junqian Wang,Jie Wen,Nan Luo.Enhanced CNN for image denoising[J].CAAI Transactions on Intelligence Technology,2019,4(1):17-23. 被引量：14
9潘丽艳,梁会营.基于深度学习的儿童肺炎病原学类型判别模型[J].中国数字医学,2019,14(3):59-61. 被引量：11
10曹现刚,李莹,王鹏,吴旭东.煤矸石识别方法研究现状与展望[J].工矿自动化,2020,46(1):38-43. 被引量：34

引证文献8

1许光宇,汤伟建.一种有效融合多尺度特征的图像语义分割方法[J].光电子．激光,2022,33(3):264-271. 被引量：2
2Menghua Zheng,Keyan Zhi,Jiawen Zeng,Chunwei Tian,Lei You.A Hybrid CNN for Image Denoising[J].Journal of Artificial Intelligence and Technology,2022,2(3):93-99. 被引量：3
3张科,张春晓.基于深度残差网络的儿科肺炎辅助诊断算法[J].中国医疗设备,2022,37(9):42-46. 被引量：1
4倪波,沈天马,周桢凌,裴颂文.基于对抗训练的快速图像实例分割研究[J].软件导刊,2022,21(12):168-173.
5李玉腾,史操,许灿辉,程远志.基于视觉和文本的多模态文档图像目标检测[J].计算机应用研究,2023,40(5):1559-1564. 被引量：2
6邱海韬,史操.LM-UNet:横向MLP用于增强U-Net的医学图像分割[J].计算机系统应用,2024,33(5):110-117.
7王翔,史操,袁正一.融合分区注意力UNet模型用于分割MRI中的膝关节软骨[J].中国医学影像技术,2024,40(5):764-768.
8杨洋,李海雄,胡淼龙,郭秀才,张会鹏.基于YOLOv5−SEDC模型的煤矸分割识别方法[J].工矿自动化,2024,50(8):120-126. 被引量：1

二级引证文献9

1Jorge Brieva.Datamining and Its Applications[J].Journal of Artificial Intelligence and Technology,2022,2(3):77-79. 被引量：1
2员娇娇,胡永利,尹宝才.一种基于文本和图像的多模态目标检测方法[J].中国传媒大学学报（自然科学版）,2023,30(3):41-49.
3段秀真,夏晨星,罗双强,葛斌,高修菊.基于交叉贝叶斯融合全局和局部大气光的深度预测研究[J].光电子．激光,2023,34(7):704-712.
4袁琨鹏,米金鹏,陈智谦.基于模态预融合的三维指称表达理解[J].计算机应用研究,2023,40(12):3666-3671.
5李运堂,黄永勇,王鹏峰,谢梦鸣,陈源,李孝禄.基于新型编-解码网络斜拉桥拉索表面的缺陷检测[J].光电子．激光,2024,35(1):41-50.
6Li Li,Youran Kong,Qing Zhang.Lightweight Malicious Code Classification Method Based on Improved Squeeze Net[J].Computers, Materials & Continua,2024,78(1):551-567.
7Yan Li,Qiyuan Wang,Kaidi Jia.Enhancing Image Description Generation through Deep Reinforcement Learning:Fusing Multiple Visual Features and Reward Mechanisms[J].Computers, Materials & Continua,2024,78(2):2469-2489.
8魏子伊,汤奕,滕泽,李宏锋,彭芸,操江峰,高天姿,张恒,韩鸿宾.胸部X线片人工智能联邦学习系统用于病原学诊断儿童社区获得性肺炎[J].中国介入影像与治疗学,2024,21(6):368-373.
9王福奇,王志峰,金建成,井庆贺,王耀辉,王大龙,汪义龙.基于GSL-YOLO模型的综放工作面混矸率检测方法[J].工矿自动化,2024,50(9):59-65.

1赵起越,范益,范恩点,赵柏杰,黄运华,程学群,李晓刚.低合金结构钢腐蚀的影响因素及其耐蚀性判据[J].工程科学学报,2021,43(2):255-262. 被引量：5
2刘培生,陈靖鹤,孙进兴.空心球颗粒制品的结构参量测算和强度表征[J].北京信息科技大学学报（自然科学版）,2021,36(2):7-13. 被引量：1

Journal of Central South University

2021年第6期

浏览历史

内容加载中请稍等...

End-to-end dilated convolution network for document image semantic segmentation 被引量：8

同被引文献36

引证文献8

二级引证文献9

相关作者

相关机构

相关主题

浏览历史