摘要
二值图像编码在文本存储、图象检索中有广泛的应用。为了提高二值图像的压缩比,提出了一种利用OCR结果的JB IG 2(jo in t b i-leve l im age group)编码算法。它在对二值文本图像进行基于模式匹配的压缩时,利用了OCR识别结果和识别置信度的信息,从而更好地完成了字模重建和模式匹配的处理,提高了JB IG 2算法的性能。图像中所有识别结果可信的字符被重建字模代替,编码器只需编码字符的位置。实验结果表明:该算法优于以往JB IG 2算法的效果,它可以获得高于以往有损压缩算法的图像质量,并在实验图像上得到高于以往无损压缩算法14.3%的压缩比。
Bi-level image coding is useful for document storage and archiving, image searches on the Internet and digital libraries. The JBIG2 (joint hi-level image group) standard for lossless and lossy coding of hi-level images is a very flexible encoding strategy which allows researchers to design their own encoders. OCR processing of text images is one encoding technique that gives measurable recognition and the confidence results. We propose a lossy JBIG2 encoding method which uses OCR processing results to improve text image compression based on pattern matching. All the credible recognized characters in the image are replaced by representative character images so that the encoder only needs to mark the positions of these characters. Experiment results show that this method gives better results than previous JBIG2 encoding methods with 14.3% less storage compared to previous lossless methods while preserving relatively good text image quality.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2006年第7期1247-1249,1253,共4页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金资助项目(60472002)
关键词
模式识别
二值图像编码
文本图像压缩
OCR
模式匹配
pattern recognition
bi-level image coding
text image compression
OCR
pattern matching