期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Optical Character Recognition for printed Tamil text using Unicode 被引量:1
1
作者 SEETHALAKSHMI R. SREERANJANI T.R. +4 位作者 BALACHANDAR T. Abnikant Singh Markandey Singh Ritwaj Ratan Sarvesh Kumar 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第11期1297-1305,共9页
Optical Character Recognition (OCR) refers to the process of converting printed Tamil text documents into software translated Unicode Tamil Text. The printed documents available in the form of books, papers, magazines... Optical Character Recognition (OCR) refers to the process of converting printed Tamil text documents into software translated Unicode Tamil Text. The printed documents available in the form of books, papers, magazines, etc. are scanned using standard scanners which produce an image of the scanned document. As part of the preprocessing phase the image file is checked for skewing. If the image is skewed, it is corrected by a simple rotation technique in the appropriate direction. Then the image is passed through a noise elimination phase and is binarized. The preprocessed image is segmented using an algorithm which de- composes the scanned text into paragraphs using special space detection technique and then the paragraphs into lines using vertical histograms, and lines into words using horizontal histograms, and words into character image glyphs using horizontal histograms. Each image glyph is comprised of 32×32 pixels. Thus a database of character image glyphs is created out of the segmentation phase. Then all the image glyphs are considered for recognition using Unicode mapping. Each image glyph is passed through various routines which extract the features of the glyph. The various features that are considered for classification are the character height, character width, the number of horizontal lines (long and short), the number of vertical lines (long and short), the hori- zontally oriented curves, the vertically oriented curves, the number of circles, number of slope lines, image centroid and special dots. The glyphs are now set ready for classification based on these features. The extracted features are passed to a Support Vector Machine (SVM) where the characters are classified by Supervised Learning Algorithm. These classes are mapped onto Unicode for recognition. Then the text is reconstructed using Unicode fonts. 展开更多
关键词 光学性质识别 支撑向量 人工神经网络 编码系统 软件翻译
下载PDF
A sustainable development OCR system in CADAL application 被引量:1
2
作者 黄晨 赵继海 胡晓 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第11期1312-1317,共6页
This paper briefly introduces the main ideas of a sustainable development OCR system based on open architecture techniques and then describes the construction of an optical character recognition (OCR) center built on ... This paper briefly introduces the main ideas of a sustainable development OCR system based on open architecture techniques and then describes the construction of an optical character recognition (OCR) center built on computer clusters, for the purpose of dynamically improving the recognition precision of the digitized texts of a million volumes of books produced by the China-US Million Books Digital Library (CADAL) Project. The practice of this center will provide helpful reference for other digital library projects. 展开更多
关键词 Sustainable Development Digital Library optical character recognition (OCR) China-US Million Books Digital Library (CADAL)
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部