摘要
中文专利权利要求书是一种半结构化的文本,应对各种检索需要,迫切需要将中文专利权利要求进行分词处理。本文在总结中文专利权利要求书的特点的基础上,提出了一种基于领域词典和规则相结合的面向中文专利权利要求书的中文分词模型,并对词典、规则的构建进行了说明。该方法在封闭式测试条件下取得了较好的分词结果,能够将文本分割为有意义的实体,并且对未登录词的识别效果较好。
Chinese claim is a semi-structured text. To deal with various search needs, the urgent need to word segmentation of Chinese patent claims. This paper summarizes the characteristics of Chinese patent claims and presented a field dictionaries and rules model to solve the word segmentation of Chinese patent claims. Besides, it described the construction of the dictionary and the rules. The method achieved good segmentation results in a closed test conditions that is the ability to split the text into meaningful entities. And the identification of unknown words is better.
出处
《情报杂志》
CSSCI
北大核心
2011年第11期152-155,共4页
Journal of Intelligence
基金
北京自然科学基金"知识产权预警机制信息服务平台研究"(编号:9092002)
北京教委科技项目"基于MAS的专利预警系统关键技术研究"(编号:KM200910005027)研究成果之一
关键词
中文分词
领域词典
中文权利要求书
Chinese word segmentation fiend dictionary Chinese claim