
一种基于相似度的专利与产业类目映射模型——以《国际专利分类》与《国民经济行业分类》为例 被引量:16

A Similarity-based Model for Mapping Between Patent and Industrial Classifications——Mapping Between the International Patent Classification and the Industrial Classification for National Economic Activities
摘要 [目的 /意义]提出一种基于相似度的专利与产业类目映射模型,模型拥有准确、易扩展和高效率的优点,可为后续研究提供借鉴和参考。[方法 /过程]整理现有专利与产业类目映射方法,以《国际专利分类》与《国民经济行业分类》为例,设计类目映射模型并做映射实验,通过Z-score标准化方法处理余弦相似度结果,完成《国际专利分类》小类与《国民经济行业分类》小类的部分映射,并根据国家知识产权局的试用版本对照成果综合评价本模型。[结果 /结论]模型综合考虑专利官方注释规范精炼性和大量专利数据覆盖面广的优点,通过自然语言处理技术自动化得到专利与产业类目的映射组合,较现有方法在节省大量人力成本的同时保证了正确率,并可方便地进行映射类目细粒度的调整,适用于其他符合本模型数据格式要求的专利与产业分类的映射。 [ Purpose/significance ] This paper aims to propose a model based on the similarity for mapping between patents and industries and provide some references for the further research. This model is accurate, scalable and efficient. [Method/process] After introducing the methods for mapping between patent and industrial classifications, the authors described and explored the model. They completed the mapping of the International Patent Classification and the Industrial Classi^cation for National Economic Activities partly and processed cosine similarity results by the Z-score normalization method. Then, the authors evaluated this model according to the trial version results of SIPO. [ Result/conclusion] This model takes the advantage of the official annotation of patent classification and the descriptive content of patents and gets the mapping results between patents and industrial classifications automatically by the natural language processing technology. Compared with the existing methods, the method saves a lot of labor costs while ensuring the accuracy. This model can easily adjust the fine-grained classification and be applied to most of mapping between patents and industrial classifications. Finally, the improvement of the model is described. Some future application areas are also briefly discussed in this paper.
作者 田创 赵亚娟
出处 《图书情报工作》 CSSCI 北大核心 2016年第20期123-131,共9页 Library and Information Service
关键词 专利分类 产业分类 类目映射 映射方法 patent classification; industry classification; classification mapping; mapping methods
  • 相关文献



  • 1Hart G W. To decode short cryptograms[A]. Communications of the ACM[C]. New York: Association for Computing Machinery, 1994.102-108.
  • 2Van Rijsbergen C J. Information retrieval[M]. London: Butterworths Scientific Publication, 1975.
  • 3Fox C. Lexical analysis and stoplists(including the ‘Brown Corpus’stoplist), information retrieval: Data structures and algorithms[M]. Upper Saddle River, New Jersey: Prentice Hall, 1992.
  • 4Sinka M P, Corne D W. Web intelligence WI 2003[A]. Proceedings IEEE/WIC International Conference on Soc[C]. Los Alamitos: IEEE Comput, 2003.396-402.
  • 5Silva C, Ribeiro B. The importance of stop word removal on recall values in text categorization[J]. Neural Networks, 2003, 3:20-24.
  • 6Yang Y. Pedersen J O. A comparative study on feature selection in text categorization[A]. Proceedings of ICML-97, 14th International Conference on Machine Learning[C]. San Francisco: Morgan Kaufmann Publishers Inc., 1997.412-420.
  • 7Luhn H P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958, 2(2):159-165.
  • 8Harman D. An experimental study of factors important in document ranking[A]. Proceedings of the 1986 ACM Conference on Research and Developments in Information Retrieval[C]. New York: Association for Computing Machinery, 1986.186-193.
  • 9北京大学计算语言学研究所. 1998年1月人民日报切分、标注语料库[EB/OL]. http:∥icl.pku.edu.cn//icl_groups/corpus/dwldform1.asp,2001-05-10/2004-04-01. (in Chinese)Institute of Computational Linguistics Peking University. Word segmentation corpus from People's Daily(January 1998)[EB/OL]. http:∥icl.pku.edu.cn//icl_groups/corpus/dwldform1.asp,2001-05-10/2004-04-01.
  • 10自然语言处理开放平台. 文本分类语料库(复旦)训练语料[EB/OL]. http:∥www.nlp.org.cn/categories,2003-06-23/2004-05-01.(in Chinese)CNLP Platform. Training subset from text categorization corpus(Fudan)[EB/OL]. http:∥www.nlp.org.cn/categories,2003-06-23/2004-05-01.












使用帮助 返回顶部