摘要
本文提出了一种基于关键词的中文文档图像检索方法,能在不经OCR(Optical Character Recognition)识别的情况下,直接利用中文字符的图像特征进行关键词检索。首先将文档图像分割成单个中文字符图像,接着对字符图像进行汉字笔画的特征数据提取,然后在特征数据间进行基于WMHD(Weighted Modified Hausdorff Dis-tance)的相似性测量。该方法不受字号的影响,也有一定的抗字体能力,实验证明其具有较高的检索效果。
A Chinese document image retrieval method by keywords is proposed, which retrieved Chinese character directly from Chinese character image without OCR (Optical Character Recognition). At first, Chinese character image was segmented from Chinese document image. Then the feature data of Chinese stroke were extracted from the Chinese character image. At last, the similarity of the Chinese character images were measured by weighted modified Hausdorff distance between their feature data. That retrieval method is robust to character size and font. The experimental results show good performance.
出处
《中文信息学报》
CSCD
北大核心
2007年第4期61-64,72,共5页
Journal of Chinese Information Processing
基金
国家发改委CNGI资助项目(CNGI-04-12-2A)