摘要
随着标准数量和应用的日益增加,对标准信息查询的需求也更加明显。本文从标准的内容入手,结合OCR文字识别、中文分词、全文检索等技术,提出了标准全文检索系统的架构,并结合标准自身的特点提出了文本分析流程,对提高标准信息查询查全率和准确性,促进标准信息化发展有积极的意义。
With the increasing number and application of standards, the demand for standard information query is increasing. In this paper, the standard full-text retrieval system architecture is proposed based on the technologies such as OCR text recognition, Chinese word segmentation, full-text search, etc. The text analysis process is also presented combined with the characteristics of the standard itself. It can improve the standard information query recall rate and accuracy, playing a positive role in promoting the development of standard information.
出处
《标准科学》
2017年第1期19-23,共5页
Standard Science
关键词
标准
全文检索
搜索引擎
索引
分词
standard, full-text retrieval, search engine, index, tokenize