结合注意力机制和编码器—解码器架构的化学结构识别方法

Chemical structure recognition method based on attention mechanism and encoder-decoder architecture

导出

摘要目的化学结构识别是化学和计算机视觉领域的一个重要问题,传统光学化学结构识别技术在复杂化学结构识别任务中易发生信息丢失或误识别的现象,同时又因为化学物质的结构多样性常导致其无法解析,识别效果不佳。而基于深度学习的模型通常具有网络结构复杂度高、上下文信息易丢失和识别率低的问题。为此,提出一种结合注意力机制和编码器—解码器架构的化学结构识别方法。方法首先,使用改进的ResNet50(residual network)作为特征提取器抓取表征信息;其次,使用BLSTM(bi-directional long-short term memory)作为行编码器为ResNet50提取的表征信息加强空间信息;最后,使用去填充模块和基于覆盖注意力机制的LSTM(long short-term memory)网络作为模型解码器,对化学结构图像进行解码,将编码结果解码为SMILES(simplified molecular input line entry system)序列。结果在Indigo、ChemDraw、CLEF(Conference and Labs of the Evaluation Forum)、JPO(Japanese Patent Office)、UOB(University of Birmingham)、USPTO(United States Patent and Trademark Office)、Staker、ACS(American Chemistry Society)、CASIA-CSDB(Institute of Automation of Chinese Academy of Sciences—Chemical Structure Database)和Mini CASIA-CSDB数据集上,所提方法识别准确率分别为71.1%、70.21%、45.8%、30.3%、53.02%、58.21%、43.39%、46.3%、84.42%和85.78%,高于SwimOCSR、Image2Mol和ChemPix模型得分。结论与其他模型相比,本文方法通过少量训练集能够获得较高的识别准确率。 Objective Emerging digital and intelligent technologies have ushered in a new era of text recognition and interpretation.These advancements have greatly facilitated the ability to recognize and comprehend textual content originating from a variety of sources,including paper documents,photographs,and diverse contexts.One particularly noteworthy application of these technologies is in the field of chemical structure image recognition,where portable devices such as mobile phones and tablet PCs have become indispensable tools,playing a vital role in converting hand-drawn chemical structure images into machine-readable formats.They translate these intricate structures into human-readable representations,simultaneously highlighting relevant physical properties,chemical characteristics,and elemental compositions.These innovative models for chemical structure recognition serve as a bridge between hand-drawn representations and machine-interpretable data.This capability has made it feasible to electronically document complex scenarios,such as those encountered in classrooms and academic meetings.Notably,ongoing research has focused on developing encoderdecoder-based methods for mathematical expression recognition,which have shown promising results.However,the pivotal role of the quality and quantity of training data in shaping the performance of deep neural networks needs to be acknowledged.The current challenge lies in the absence of a comprehensive,high-quality dataset that is specifically tailored for chemical structure image recognition.This data deficiency poses a significant hurdle,impacting the optimization,generalization,and robustness of the models.Furthermore,the computational demands of real-time offline recognition on mobile devices remain a practical limitation.Method To address the aforementioned issues,we developed a chemical structure recognition model based on an encoder-decoder architecture.This model is capable of generating corresponding character representations,such as SMILES,from given chemical structure images.In the context of image-related tasks,the effectiveness of the encoder in extracting features from images and the decoder's ability to decode feature sequences directly impact the performance of the recognition task.The encoder is designed to efficiently model the input images,while the decoder should be able to comprehensively extract various features from the images,obtain accurate feature distributions,and encode them to establish feature maps.Therefore,we designed a feature extraction network based on ResNet-50 in the encoder,which adequately captures the two-dimensional structural information of chemical structure images.Furthermore,to enhance the effectiveness of information in feature maps,we introduced a row encoder based on bi-directional long-short term memory(BLSTM),reinforcing the spatial feature distribution weight through row encoding of feature maps.The decoder should be capable of accurately decoding the sequence information from the encoder's output.To align input sequence information with output characters and improve the model's memory and decoding capabilities for long sequences,we incorporated a coverage-attention mechanism into the decoder.Ultimately,the model can generate corresponding representations from input chemical structure images.Result For an objective evaluation of the performance of our model in this study,we conducted training on the Image2Mol and ChemPix models using the CASIA-CSDB(Institute of Automation,Chinese Academy of Sciences Chemical Structure Database) dataset.Subsequently,we performed performance testing on a range of datasets,including Indigo,ChemDraw,Conference and Labs of the Evaluation Forum(CLEF),Japanese Patent Office(JPO),University of Birmingham(UOB),United States Patent and Trademark Office(USPTO),Stacker,American Chemistry Society(ACS),CASIA-CSDB,and Mini CASIA-CSDB.Results demonstrated that our model achieved higher recognition accuracy when trained on small datasets and exhibited robust generalization capabilities.Furthermore,we compared our model with untrainable models such as SwimOCSR,MSE-DUDL,ChemGrapher,Image2Graph,and MolScribe.The comparison revealed that our model also exhibited commendable performance when compared with models trained on millions of images.Conclusion A chemical structure recognition method is introduced based on an encoder-decoder architecture.The method allows for the generation of SMILES strings from given chemical structure images.Experimental results demonstrate that the model achieves higher recognition accuracy when trained on small datasets and exhibits strong generalization capabilities.

作者曾水玲李昭贤张嘉雄丁龙飞赵才荣 Zeng Shuiling;Li Zhaoxian;Zhang Jiaxiong;Ding Longfei;Zhao Cairong(School of Communication and Electronic Engineering,Jishou University,Jishou 416000,China;Key Laboratory of Image and Video Understanding for Social Safety,Nanjing University of Science and Technology,Nanjing 210094,China;College of Electronics and Information Engineering,Tongji University,Shanghai 201804,China)

机构地区吉首大学通信与电子工程学院南京理工大学江苏省社会安全图像与视频理解重点实验室同济大学电子与信息工程学院

出处《中国图象图形学报》 CSCD 北大核心 2024年第7期1960-1969,共10页 Journal of Image and Graphics

基金国家自然科学基金项目(61966014) 湖南省自然科学基金项目(2024JJ7413) 江苏省社会安全图像与视频理解重点实验室开放课题项目(202212) 吉首大学校级科研项目(JGY2023071,Jdy23042) 湖南省研究生科研创新项目(QL20230255,CX20221107)。

关键词化学结构识别编码器—解码器注意力机制残差网络 SMILES(simplified molecular input line entry system) chemical structure recognition encoder-decoder attention mechanism residual network SMILES(simpli-fied molecular input line entry system)

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1杨晨,杜俊,薛莫白,张建树.用于在线手写公式合成的编解码网络[J].中国图象图形学报,2023,28(8):2356-2369. 被引量：2
2刘成林,金连文,白翔,李晓辉,殷飞.文档智能分析与识别前沿:回顾与展望[J].中国图象图形学报,2023,28(8):2223-2252. 被引量：4

二级参考文献6

1丁杰,娄震,杨静宇.基于笔划组合的手写数字切分[J].中国图象图形学报,2009,14(8):1609-1614. 被引量：5
2丁杰,杨静宇.一种基于模糊规则的手写体粘连数字串分割[J].中国图象图形学报,2009,14(11):2292-2298. 被引量：2
3杨巨峰,史广顺,王恺.联机手写化学公式识别与分析[J].中国图象图形学报,2010,15(9):1291-1298. 被引量：2
4刘成林.文档图像识别技术回顾与展望[J].数据与计算发展前沿,2019,1(2):17-25. 被引量：6
5刘崇宇,陈晓雪,罗灿杰,金连文,薛洋,刘禹良.自然场景文本检测与识别的深度学习方法[J].中国图象图形学报,2021,26(6):1330-1367. 被引量：31
6高良才,李一博,都林,张新鹏,朱子仪,卢宁,金连文,黄永帅,汤帜.表格识别技术研究进展[J].中国图象图形学报,2022,27(6):1898-1917. 被引量：14

共引文献3

1高强,张仰森,孙圆明,贾启龙.一种面向催化材料领域的文献信息抽取方法[J].北京信息科技大学学报（自然科学版）,2024,39(2):50-56.
2姜兴兴,刘建涛,李春雷,靳彩霞,张林凤.基于油田开发实例的相似油藏智能推荐[J].内蒙古石油化工,2024,50(5):117-120.
3王维兰,胡金水,魏宏喜,库尔班·吾布力,邵文苑,毕晓君,贺建军,李振江,丁凯,金连文,高良才.少数民族文字文本分析与识别的研究进展[J].中国图象图形学报,2024,29(6):1685-1713.

1徐晨,孙艺格,曹红,高云玲,钱丽,何德飞,郑兰兰.酸碱度对硼酸/柠檬酸(钠)络合物合成的影响及其分析策略[J].合成化学,2024,32(6):517-524.
2张恩勇,刘超,李永立,夏丽娟.基于多层复杂网络结构分析的关联贷款风险识别模型[J].管理评论,2024,36(5):3-11.
3季雨洁,邱子昂.产权异质角度下海外逆向创新对新兴技术企业财务绩效的影响——基于USPTO数据[J].现代商业,2024(6):188-192.
4Baofei Sun,Wei Chen,Yanyi Huang,Daofu Wu,Heng Luo,Faguang Kuang,Hongmei Ran,Yichen Liu,Liqin Gao,Jinchen Zhou,Bo Gao,Qiang Huang,Xiaosheng Tang.Ligand modulation of active center to promote lead-free Cs_(2)AgInCl_(6)photocatalytic CO_(2)reduction[J].Journal of Energy Chemistry,2024,95(8):660-669.
5晋梅,刘红姣,安良,吴宇琼,宋姣华,邹琳玲.基于Aspen EDR的管壳式换热器工艺选型设计教学实践[J].广州化工,2024,52(7):204-207.
6李冰,丁堃,孙晓玲,江文森.科学论文向技术领域扩散的扩散速度与扩散效果研究[J].情报理论与实践,2024,47(7):35-47.
7冀宁.基于无人机遥感图像的公路路面裂缝识别技术研究[J].工程机械与维修,2024(6):141-143.
8Zhong-Qiu Tang,Shao-Bo He,Dong-Yang Yu,Hai-Mao Luo,Xue-Hong Xing,Yong-Wen Zhou.Factors influencing further vertebral height loss following percutaneous vertebroplasty in osteoporotic vertebral compression fractures:A 1-year follow-up study[J].World Journal of Clinical Cases,2024,12(21):4609-4617.
9杨勇.数字技术影响产业链内部创新分工的机制研究[J].复印报刊资料（创新政策与管理）,2023(10):94-104.
10刘斌,丁昊.一种用于季节性产品需求预测的多元化堆叠回归模型[J].物流技术,2024,43(6):15-30.

中国图象图形学报

2024年第7期

浏览历史

内容加载中请稍等...

结合注意力机制和编码器—解码器架构的化学结构识别方法

参考文献2

二级参考文献6

共引文献3

相关作者

相关机构

相关主题

浏览历史