摘要
从科学出版物中识别光学化学结构是重新发现化学结构性质的重要组成部分,对于药物研发和天然产物研究方面具有重要意义。现有的光学化学结构识别方法存在识别率较低等问题,为了有效地提高光学化学结构识别任务的识别性能,本文提出了一种用于光学化学结构识别的深度学习方法(DeepOCSR)。该方法基于编码器-解码器架构,引入了Transformer和ResNeSt模型,将出版物中的化学结构图像转换为SMILES序列。构建了两种新的化学结构数据集,其中一个包含了化学文献中常见的取代基。将本文方法与现有的其他方法进行对比实验,结果表明本文方法在相似度和有效性等关键指标上均优于对比方法。
Optical chemical structure recognition from scientific publications is an essential part of rediscovering the nature of chemical structures, and is of great significance for drug research and natural product research. The existing optical chemical structure recognition methods have problems such as low recognition rate. To improve the recognition performance of the optical chemical structure recognition task effectively, the paper proposes a deep learning method(DeepOCSR) for optical chemical structure recognition. Based on the encoder-decoder architecture,this method introduces Transformer and ResNeSt models to transform chemical structure images in publications into SMILES sequences. In order to train and verify the proposed method, two novel chemical structure datasets are constructed, one of which contains common substituents in the chemical literature. The proposed method is compared with other existing deep-learning approaches. It is shown via the experimental results that the proposed method is superior to other methods in the key indicators such as similarity and effectiveness.
作者
杨赵朋
李建华
YANG Zhaopeng;LI Jianhua(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
出处
《华东理工大学学报(自然科学版)》
CAS
CSCD
北大核心
2023年第1期135-143,共9页
Journal of East China University of Science and Technology
基金
国家科技重大专项(2018ZX09735002)。