摘要
现在大量的纸质凭证都需要通过扫描存入计算机,但如何对这些凭证进行归类和检索成为一个主要问题。随着OCR技术的发展,已有软件产品能够完成扫描件的识别和管理。但在很多情况下只需对扫描件进行归类和建立索引,并不需要对整张票据进行OCR识别。本文提出一种快速、有效的基于模板匹配的票据分类方法,然后借助开源软件Tes-seract实现数字和字母的识别,完成票据的分类和索引功能。所提出的方法简便、高效,有效地降低了企业成本。另外,为了提高识别率,根据待识别对象的特征对图像进行预处理,实验表明该方法可以极大提高识别率,对专业的OCR软件也具借鉴意义。
Nowadays a lot of paper documents are stored in the computer by scanner,but how to classify and retrieve these credentials becoming a main issue.With the OCR technology,there are many software products which can recognize and manage paper documents.However,in many cases,only classifying and indexing is necessary,rather than recognizing a whole credential.The paper presents a fast and efficient classifying approach for credentials based on template matching,and employs Tesseract,which is open source,to recognize optical numbers and letters.The approach is simple,efficient and effective in reducing customer's costs.In addition,in order to get a higher recognition rate,images are preprocessed according to the specific characteristics of credentials.Experiments show that the approach can greatly improve the recognition rate,and it is also a illumination to professional OCR software.
出处
《计算机与现代化》
2010年第7期132-135,共4页
Computer and Modernization