期刊文献+

一种基于关键词的中文文档图像检索方法 被引量:5

A Chinese Document Image Retrieval Method by Keywords
下载PDF
导出
摘要 本文提出了一种基于关键词的中文文档图像检索方法,能在不经OCR(Optical Character Recognition)识别的情况下,直接利用中文字符的图像特征进行关键词检索。首先将文档图像分割成单个中文字符图像,接着对字符图像进行汉字笔画的特征数据提取,然后在特征数据间进行基于WMHD(Weighted Modified Hausdorff Dis-tance)的相似性测量。该方法不受字号的影响,也有一定的抗字体能力,实验证明其具有较高的检索效果。 A Chinese document image retrieval method by keywords is proposed, which retrieved Chinese character directly from Chinese character image without OCR (Optical Character Recognition). At first, Chinese character image was segmented from Chinese document image. Then the feature data of Chinese stroke were extracted from the Chinese character image. At last, the similarity of the Chinese character images were measured by weighted modified Hausdorff distance between their feature data. That retrieval method is robust to character size and font. The experimental results show good performance.
出处 《中文信息学报》 CSCD 北大核心 2007年第4期61-64,72,共5页 Journal of Chinese Information Processing
基金 国家发改委CNGI资助项目(CNGI-04-12-2A)
关键词 计算机应用 中文信息处理 中文文档图像 关键词检索 加权的修正Hausdorff距离(WMHD) computer application chinese information processing chinese document image retrieval by keywords WMHD (Weighted Modified Hausdorff Distance)
  • 相关文献

参考文献11

  • 1Mitra M,Chaudhuri B B.Information Retrieval from Documents:A survey.Information Retrieval[J].2000,2(2):141-163.
  • 2Tan C L,Huang W,Yu Z,et al.Imaged Document Text Retrieval without OCR.IEEE Trans.Pattern Analysis and Machine Intelligence[J].2002,24 (6):838-844.
  • 3Doermann D,Sauvola J,Kauniskangas H,et al.The development of a general framework for intelligent document image retrieval[A].In:The 3rd intl workshop on Document Analysis Systems[C].Malvern,Pennsylvania,USA,1996.605-632.
  • 4Doermann D,Li H,Kia O.The detection of duplicate in document image database.Image and Vision Computing[J].1998,16(12):907-920.
  • 5Niyogi D,Srihari S.The Use of Document Structure Analysis to Retrieve Information from Documents in Digital Libraries[A].In:Proc SPIE,Document Recoginition Ⅳ[C].1997.3027:207-218.
  • 6Wang C L,Chen T S,et al.Chinese document image retrieval system based on proportion of black pixel area in a character image[A].In:The 6th International Conference on Advanced Communication Technology (IEEE ICACT)[C].2004.25-29.
  • 7卜飞宇,刘长松,丁晓青.灰度名片图像快速倾斜检测和校正方法[J].中文信息学报,2004,18(1):62-69. 被引量:9
  • 8陈艳,孙羽菲,张玉志.灰度图像中字符切分方法的研究[J].中文信息学报,2004,18(4):44-49. 被引量:11
  • 9丁明跃,彭嘉雄.基于内点保留的二值图像细化算法[J].华中理工大学学报,1994,22(1):79-83. 被引量:9
  • 10Huttenlocher D P,Klanderman G A,William J R.Comparing images using the Hausdorff distance.IEEE Transactions on Pattern Analysis and Intelligence[J].1993,15(03):850-863.

二级参考文献8

  • 1潘世言.[D].北京:清华大学,1999.
  • 2丁明跃,数据采集与处理,1989年,3卷,1期,1页
  • 3Sankur B, Sezgin M. Image Thresholding Techniques: A Survey over Categories[R]. a technique report of 2001.
  • 4Casey R G, Lecolinet E. A Survey of Methods and Strategies in Character Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,1996,18(7):690-706.
  • 5吕凤军.数字图像处理编程入门[M].北京:清华大学出版社,1999..
  • 6张炘中.汉字识别技术[M].北京:清华大学出版社,1992..
  • 7朱军明 黄磊 刘昌平.图像二值化方法的比较[A]..第八届全国汉字识别学术会议论文集[C].,.110-116.
  • 8徐蔚然 于武贵 郭军.基于统计方法的混排文字切分与分类[A]..第八届全国汉字识别学术会议论文集[C].,.123-129.

共引文献25

同被引文献34

  • 1叶航军,徐光祐.基于矢量量化的快速图像检索[J].软件学报,2004,15(5):712-719. 被引量:11
  • 2魏湘辉,马少平.粘连字符切分综述[J].计算机科学,2004,31(11):199-201. 被引量:2
  • 3薄树奎,李盛阳,朱重光.基于统计学的最近邻查询中维数灾难的研究[J].计算机工程,2006,32(21):6-8. 被引量:15
  • 4Worring M,Santini S,Gupta A,et al.Content-based Image Retrieval at the End of the Early Years[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(12):1349-1380.
  • 5Cover T.Nearest Neighbor Pattern Classi fication[J].IEEE Transactions on Information Theory,1967,13(1):21-27.
  • 6Laughlin D C.The Intrinsic Dimensionality of Plant Traits and Its Relevance to Community Assembly[J].Journal of Ecology,2014,102(1):186-193.
  • 7Buhler J.Efficient Large-scale Sequence Comparison by Locality-sensitive Hashing[J].Bioinformatics,2001,17(5):419-428.
  • 8Gong Y,Lazebnik S.Iterative Quantization:A Procrustean Approach to Learning Binary Codes[J].IEEE Conference on Computer Vision and Pattern Recognition,2011,42(7):817-824.
  • 9Ge Tiezheng.Optimized Product Quantization for Approximate Nearest Neighbor Search[J].Conference on Com-puter Vision and Pattern Recognition,2013,36(4):2946-2953.
  • 10Jegou H,Douze M.Product Quantization for Nearest Neighbor Search[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(1):117-128.

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部