摘要
中文专利独立权利要求分为前序部分和特征部分。文中构建的专利无效检索模型,充分考虑了这一信息,从专利数据库中统计出40个分割词对独立权利要求进行分割处理。具体检索中采用两步检索:第一步进行布尔检索以提高召回率;第二步对申请专利与第一步返回专利独立权利要求的前序部分和特征部分分别进行相似度计算,适当组合后作为整体的相似度。实验中对分割前后以及分割后不同的词语权重选择方法对检索效果的影响作了比较,结果显示该模型是非常有效的。
Chinese patent independent claim contains a preamble portion and a characterizing portion. Invalidity search model for Chinese patent proposed in the paper draws on the structure information. Forty split words were extracted from patent database artificially; these words could divide independent claims into preamble portion and characterizing portion effectively and automatically. For it was impossible to compute similarity on the whole database two-step search method was used in practice. at 1-step Boolean query was applied to improve recall, at 2-step vector space model was used to compute similarities of pream- ble portion and characterizing portion between applying patent (query) and previous patents (documents) obtained at 1-step respectively, and then combined them properly to sort the search results in order to improve precision. Experiment data set comes from SIPO; search results with split claims are contrasted with that without them; different methods of term-weighting are compared. Evaluation results show that the model works well.
出处
《计算机应用研究》
CSCD
北大核心
2008年第7期2068-2070,共3页
Application Research of Computers
基金
国家自然科学基金重点资助项目(70031010)
国家"985"工程经费资助项目