An efficient search method is desired for calligraphic characters due to the explosive growth of calligraphy works in digital libraries. However, traditional optical character recognition (OCR) and handwritten charact...An efficient search method is desired for calligraphic characters due to the explosive growth of calligraphy works in digital libraries. However, traditional optical character recognition (OCR) and handwritten character recognition (HCR) technologies are not suitable for calligraphic character retrieval. In this paper, a novel shape descriptor called SC-HoG is proposed by integrating global and local features for more discriminability, where a gradient descent algorithm is used to learn the optimal combining parameter. Then two efficient methods, keypoint-based method and locality sensitive hashing (LSH) based method, are proposed to accelerate the retrieval by reducing the feature set and converting the feature set to a feature vector. Finally, a re-ranking method is described for practicability. The approach filters query-dissimilar characters using the LSH-based method to obtain candidates first, and then re-ranks the candidates using the keypointor sample-based method. Experimental results demonstrate that our approaches are effective and efficient for calligraphic character retrieval.展开更多
As historical Chinese calligraphy works are being digitized, the problem of retrieval becomes a new challenge. But, currently no OCR technique can convert calligraphy character images into text, nor can the existing H...As historical Chinese calligraphy works are being digitized, the problem of retrieval becomes a new challenge. But, currently no OCR technique can convert calligraphy character images into text, nor can the existing Handwriting Character Recognition approach does not work for it. This paper proposes a novel approach to efficiently retrieving Chinese calligraphy characters on the basis of similarity: calligraphy character image is represented by a collection of discriminative features, and high retrieval speed with reasonable effectiveness is achieved. First, calligraphy characters that have no possibility similar to the query are filtered out step by step by comparing the character complexity, stroke density and stroke protrusion. Then, similar calligraphy characters axe retrieved and ranked according to their matching cost produced by approximate shape match. In order to speed up the retrieval, we employed high dimensional data structure - PK-tree. Finally, the efficiency of the algorithm is demonstrated by a preliminary experiment with 3012 calligraphy character images.展开更多
基金supported by the National Natural Science Foundation of China (Nos. 60673088, 61070066, and 61103099)the China Postdoctoral Science Foundation (No. 20110491781)+1 种基金the Major National Science and Technology Special Project of China(No. 2010ZX01042-002-003)the China Academic Digital Associative Library Project
文摘An efficient search method is desired for calligraphic characters due to the explosive growth of calligraphy works in digital libraries. However, traditional optical character recognition (OCR) and handwritten character recognition (HCR) technologies are not suitable for calligraphic character retrieval. In this paper, a novel shape descriptor called SC-HoG is proposed by integrating global and local features for more discriminability, where a gradient descent algorithm is used to learn the optimal combining parameter. Then two efficient methods, keypoint-based method and locality sensitive hashing (LSH) based method, are proposed to accelerate the retrieval by reducing the feature set and converting the feature set to a feature vector. Finally, a re-ranking method is described for practicability. The approach filters query-dissimilar characters using the LSH-based method to obtain candidates first, and then re-ranks the candidates using the keypointor sample-based method. Experimental results demonstrate that our approaches are effective and efficient for calligraphic character retrieval.
基金Supported by the National Natural Science Foundation of China(Grant Nos.60533090,60525108)the National Grand Fundamental Research 973 Program of China(Grant No.2002CB312101)+1 种基金the Science and Technology Project of Zhejiang Province(2005C13032,2005C11001-05)the China-US Million Book Digital Library Project(www.cadal.zju.edu.cn).
文摘As historical Chinese calligraphy works are being digitized, the problem of retrieval becomes a new challenge. But, currently no OCR technique can convert calligraphy character images into text, nor can the existing Handwriting Character Recognition approach does not work for it. This paper proposes a novel approach to efficiently retrieving Chinese calligraphy characters on the basis of similarity: calligraphy character image is represented by a collection of discriminative features, and high retrieval speed with reasonable effectiveness is achieved. First, calligraphy characters that have no possibility similar to the query are filtered out step by step by comparing the character complexity, stroke density and stroke protrusion. Then, similar calligraphy characters axe retrieved and ranked according to their matching cost produced by approximate shape match. In order to speed up the retrieval, we employed high dimensional data structure - PK-tree. Finally, the efficiency of the algorithm is demonstrated by a preliminary experiment with 3012 calligraphy character images.