Invoice document digitization is crucial for efficient management in industries.The scanned invoice image is often noisy due to various reasons.This affects the OCR(optical character recognition)detection accuracy.In ...Invoice document digitization is crucial for efficient management in industries.The scanned invoice image is often noisy due to various reasons.This affects the OCR(optical character recognition)detection accuracy.In this paper,letter data obtained from images of invoices are denoised using a modified autoencoder based deep learning method.A stacked denoising autoencoder(SDAE)is implemented with two hidden layers each in encoder network and decoder network.In order to capture the most salient features of training samples,a undercomplete autoencoder is designed with non-linear encoder and decoder function.This autoencoder is regularized for denoising application using a combined loss function which considers both mean square error and binary cross entropy.A dataset consisting of 59,119 letter images,which contains both English alphabets(upper and lower case)and numbers(0 to 9)is prepared from many scanned invoices images and windows true type(.ttf)files,are used for training the neural network.Performance is analyzed in terms of Signal to Noise Ratio(SNR),Peak Signal to Noise Ratio(PSNR),Structural Similarity Index(SSIM)and Universal Image Quality Index(UQI)and compared with other filtering techniques like Nonlocal Means filter,Anisotropic diffusion filter,Gaussian filters and Mean filters.Denoising performance of proposed SDAE is compared with existing SDAE with single loss function in terms of SNR and PSNR values.Results show the superior performance of proposed SDAE method.展开更多
目的利用分子拓扑结构探讨归肺和大肠经中药成分“印迹模板”的特征,并进行实验验证,确定归肺和大肠经可能的物质基础。方法以普通高等教育“十三五”国家级规划教材《中药学》为基准,为排除其他经络如肝经、肾经等混合归经的影响,对44...目的利用分子拓扑结构探讨归肺和大肠经中药成分“印迹模板”的特征,并进行实验验证,确定归肺和大肠经可能的物质基础。方法以普通高等教育“十三五”国家级规划教材《中药学》为基准,为排除其他经络如肝经、肾经等混合归经的影响,对443味中药(不包含附药)进行归纳且只确定归肺和大肠经中药,再通过查阅中国知网和中药系统药理学数据库与分析平台(Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform,TCMSP),归纳总结出归肺和大肠经中药化学成分,并对其中相同的化学成分进行整理、删减、合并,计算分子连接性指数(molecular connectivity index,MCI);运用夹角余弦法计算出各成分MCI与总体平均MCI的相似度,确定成分部位和对照品;建立相似度与保留时间的关系,进行中药与对照品的HPLC指纹图谱的印迹性比较,从而确定归肺和大肠经可能表征的化学结构特征。结果共获得11味中药的886种化学成分,黄酮、蒽醌、鞣质类成分的相似度较高,且在药材中分布集中,故选取其中相似度排名依次为5、8、10、16、40、44、49的7个成分:大黄素-1-O-β-D-葡萄糖苷、大黄酚-8-O-β-D-葡萄糖苷、大黄素-8-O-β-D-吡喃葡萄糖苷、3,3’-二甲基鞣花酸-4’-O-葡萄糖苷、山柰苷、芦丁、山柰酚-3-O-芸香糖苷,平均MCI的相似度为0.99508~0.99920,保留时间为39.63~60.14 min,基于定量结构-性质关系/定量结构-保留时间关系(quantitative structure-property relationship/quantitative structure-retention relationship,QSPR/QSRR)原理,对二者进行线性回归,相关系数R=0.8662(P<0.01);7个成分和9味药材的总量统计矩的一阶矩的15%范围分别为[44.46 min,46.50 min]、[39.70 min,47.08 min],二者重叠,则有85%把握认为可用7个对照品成分表征归肺和大肠经的中药成分的“印迹模板”特征。结论归肺和大肠经的中药成分可以用黄酮、蒽醌和鞣酸类成分进行“印迹模板”特征的表征。展开更多
文摘Invoice document digitization is crucial for efficient management in industries.The scanned invoice image is often noisy due to various reasons.This affects the OCR(optical character recognition)detection accuracy.In this paper,letter data obtained from images of invoices are denoised using a modified autoencoder based deep learning method.A stacked denoising autoencoder(SDAE)is implemented with two hidden layers each in encoder network and decoder network.In order to capture the most salient features of training samples,a undercomplete autoencoder is designed with non-linear encoder and decoder function.This autoencoder is regularized for denoising application using a combined loss function which considers both mean square error and binary cross entropy.A dataset consisting of 59,119 letter images,which contains both English alphabets(upper and lower case)and numbers(0 to 9)is prepared from many scanned invoices images and windows true type(.ttf)files,are used for training the neural network.Performance is analyzed in terms of Signal to Noise Ratio(SNR),Peak Signal to Noise Ratio(PSNR),Structural Similarity Index(SSIM)and Universal Image Quality Index(UQI)and compared with other filtering techniques like Nonlocal Means filter,Anisotropic diffusion filter,Gaussian filters and Mean filters.Denoising performance of proposed SDAE is compared with existing SDAE with single loss function in terms of SNR and PSNR values.Results show the superior performance of proposed SDAE method.
文摘目的利用分子拓扑结构探讨归肺和大肠经中药成分“印迹模板”的特征,并进行实验验证,确定归肺和大肠经可能的物质基础。方法以普通高等教育“十三五”国家级规划教材《中药学》为基准,为排除其他经络如肝经、肾经等混合归经的影响,对443味中药(不包含附药)进行归纳且只确定归肺和大肠经中药,再通过查阅中国知网和中药系统药理学数据库与分析平台(Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform,TCMSP),归纳总结出归肺和大肠经中药化学成分,并对其中相同的化学成分进行整理、删减、合并,计算分子连接性指数(molecular connectivity index,MCI);运用夹角余弦法计算出各成分MCI与总体平均MCI的相似度,确定成分部位和对照品;建立相似度与保留时间的关系,进行中药与对照品的HPLC指纹图谱的印迹性比较,从而确定归肺和大肠经可能表征的化学结构特征。结果共获得11味中药的886种化学成分,黄酮、蒽醌、鞣质类成分的相似度较高,且在药材中分布集中,故选取其中相似度排名依次为5、8、10、16、40、44、49的7个成分:大黄素-1-O-β-D-葡萄糖苷、大黄酚-8-O-β-D-葡萄糖苷、大黄素-8-O-β-D-吡喃葡萄糖苷、3,3’-二甲基鞣花酸-4’-O-葡萄糖苷、山柰苷、芦丁、山柰酚-3-O-芸香糖苷,平均MCI的相似度为0.99508~0.99920,保留时间为39.63~60.14 min,基于定量结构-性质关系/定量结构-保留时间关系(quantitative structure-property relationship/quantitative structure-retention relationship,QSPR/QSRR)原理,对二者进行线性回归,相关系数R=0.8662(P<0.01);7个成分和9味药材的总量统计矩的一阶矩的15%范围分别为[44.46 min,46.50 min]、[39.70 min,47.08 min],二者重叠,则有85%把握认为可用7个对照品成分表征归肺和大肠经的中药成分的“印迹模板”特征。结论归肺和大肠经的中药成分可以用黄酮、蒽醌和鞣酸类成分进行“印迹模板”特征的表征。