摘要
专利文献代码化对于专利无纸化审查、专利分析、专利检索和专利管理都非常重要。本文提出一种以专利文献OCR校对词典和技术领域特征为基础,利用中文分词、隐马尔科夫模型为方法的专利文献OCR校对框架和专利文献OCR中文文本的拼写校对方法,降低了人力成本投入,提高了专利文献代码化效率和代码化质量。本文最后给出了实验系统和实现结果。
Codification of patent document is important to the paperless review, analysis, searching and management of patents. This paper presents an OCR proofing framework of patent docume,at and a spelling proofing method of Chinese text produced by patent document OCR, based on an OCR proofreading dictionary and technical features of the fields of patent document, using Chinese words segmentation and HMM Model. The method will reduce labor costs and improve the efficiency and quality of patent document codification. Finally, the experimental system and results are presented.
出处
《情报杂志》
CSSCI
北大核心
2011年第3期182-184,190,共4页
Journal of Intelligence