摘要
化学键识别是化学结构识别任务的重要组成部分。化学键中的单键、双键和三键都是由线段组成的,采用霍夫变换进行线段检测时容易产生冗余数据和干扰数据。为此,提出了一种面向化学键的线段聚类算法,对霍夫变换检出的线段进行聚类,进而合并冗余线段。具体而言,基于线段间空间关系的分析,定义线段间的相对相似性与间隔相似性度量;利用这两种度量,进行基于线段合并的聚类方法。实验结果表明,所提出的相似性度量可以全面地刻画线段间的相似关系;该算法能获得较好的聚类结果,同时能够准确复原化学键组成线段的真实位置,是一种有效的化学结构图像预处理方法。
Chemical bond recognition is an important sub-task of chemical structure recognition.The single bonds,double bonds and triple bonds of the chemical structure are all composed of line segments,and it is easy to produce redundant data and interfe-rence data when the Hough transform is used for line segment detection.To this end,a clustering algorithm is proposed to cluster the line segments in chemical bonds detected by Hough transform,during which the redundant line segments can be merged dynamically.Specifically,based on the analysis of spatial relationship between the line segments,the relative similarity measure and interval similarity measure between line segments are defined.A clustering method based on the merging of line segments is carried out by using these two measures.Experimental results show that the proposed similarity measures can comprehensively des-cribe the similarity between line segments.The algorithm can obtain good clustering results,and accurately restore the true position of the line segments in the chemical bonds.It is therefore an effective method for chemical structure image preprocessing.
作者
朱哲清
耿海军
钱宇华
ZHU Zhe-qing;GENG Hai-jun;QIAN Yu-hua(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China;Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China)
出处
《计算机科学》
CSCD
北大核心
2022年第5期113-119,共7页
Computer Science
基金
国家自然科学基金(61672332)
山西省重点研发计划(201903D421003)
山西省教育厅科技成果转化培育项目(2020CG001)
山西省应用基础研究计划(20210302123444)
中国高校产学研创新基金(2021FNA02009)。
关键词
化学结构式识别
HOUGH变换
化学键
线段聚类
Chemical structure recognition
Hough transform
Chemical bond
Clustering of line segments