期刊文献+

专利文本分类的基础问题研究 被引量:15

Fundamental Research Questions in Patent Text Categorization
原文传递
导出
摘要 对专利文本分类中的基础问题进行研究,包括术语作为专利文本分类特征的适用性,主权项字段分类研究和相近主题对分类结果的影响等。研究在两种朴素贝叶斯分类器、kNN、Racchio和支持向量机等5个分类器上进行,测试主要采用交叉验证的方法。研究结果显示,在同样的设定下,采用术语作为特征的分类结果优于使用一般特征词;使用摘要训练,对主权项进行分类有助于改善主权项的分类效果;相近主题会降低分准率,有必要设计层次的分类器进行分类试验。研究结果可以为专利文本分类研究和实践提供参考数据,并可作为信息分析等工作使用专利文本分类技术的参考。 The paper focuses on some fundamental problems in patent text categorization, including the feasibility of using terms for automatic categorization, the research on claim categorization, and the effect of classes with close - related topics on the categorization result. The research is executed on two Naive Bayesian classifiers, kNN, Racchio and SVM classifier, and cross validation is used for testing. The results of the paper are that terms are better than common features under the same settings, that training a classifier with s can improve the claim categorization results, and that clas- ses with close- related topics result in low precision and hierarchical design of classifier is necessary, correspondingly. The paper provides fundamental data for patent text categorization and can be referred by information analysis and other applications using patents.
作者 屈鹏 王惠临
出处 《现代图书情报技术》 CSSCI 北大核心 2013年第3期38-44,共7页 New Technology of Library and Information Service
基金 第51批中国博士后科学基金面上资助一等资助项目"科技文本信息资源中术语抽取与基于术语的分类与聚类"(项目编号:2012M510040) 中国科学技术信息研究所学科建设项目"自然语言处理"(项目编号:XK2012-6)的研究成果之一
关键词 专利 文本分类 文本挖掘 Patent Text categorization Text mining
  • 相关文献

参考文献13

  • 1李程雄,丁月华,文贵华.SVM-KNN组合改进算法在专利文本分类中的应用[J].计算机工程与应用,2006,42(20):193-195. 被引量:22
  • 2丁月华,文贵华,郭炜强.基于核向量空间模型的专利分类[J].华南理工大学学报(自然科学版),2005,33(8):58-61. 被引量:12
  • 3郭炜强,文军,文贵华.基于贝叶斯模型的专利分类[J].计算机工程与设计,2005,26(8):1986-1987. 被引量:13
  • 4蒋健安,陆介平,倪巍伟,孙志挥.一种面向专利文献数据的文本自动分类方法[J].计算机应用,2008,28(1):159-161. 被引量:14
  • 5李生珍,王建新,齐建东,等.基于BP神经网络的专利自动分类法[J].计算机工程与没计,2010,31(23):5075-5078.
  • 6季铎,蔡云雷,蔡东风,苗雪雷.基于共享最近邻的专利自动分类技术研究[J].沈阳航空工业学院学报,2010,27(4):41-46. 被引量:6
  • 7Li Y Y ,Bontcheva K,Cunningham H. SVM Based Learning System for F - term Patent Classification [ C ]. In : Proceedings of the 6th NTCIR Work.hop Meeting on Ewduation of InfiJrmation Access Tech- nologies: Information Retrieval, Question Answering and CrossLing- ual Information Access. 2007.
  • 8Fall C J, T,rcsv6ri A, Benzineb K, et al. Automated Categoriza- tion in the International Patent Classification [ J/OL]. ACM SIGIR Forum,2003,37(1 ) :10 -25. [2013 -03 -07]. http://www. sigir, org/forum/S2OO3/CJF_ Manuscript_sigir. pdf.
  • 9Lai K K, Wu S J. Using the Patent Co - citation Approach to Es- tablish a New Patent Classification System [ J ]. Information Pro- cessing and Management, 2005. 41 (2) : 313 -330.
  • 10Li X, Chen H, Zhang Z, et al. Automatic Patent ClassificationUsing Citation Network Information: An Experimental Study in Nanotechnology[ C ]. In: Proceedings of the 7th ACM/IEEE - CS Joint Conference on Digital Libraries. New York: ACM, 2007 : 419 - 427.

二级参考文献61

  • 1李淑文.试论文本自动分类[J].现代计算机,2004,10(7):38-41. 被引量:2
  • 2郭炜强,戴天,文贵华.基于领域知识的专利自动分类[J].计算机工程,2005,31(23):52-54. 被引量:17
  • 3Peters C. , Koster C. H. A. Uncertainty - based noise reduction and term selection in text categorization [A]. Advances in Information Retrieval: 24th BCS - IRSG European Colloquium on IR Research [ C ]. Glasgow, 2002 : 25 - 27.
  • 4Larkey L.S. Some issues in the automatic classification of U. S. patents[ A]. AAAI - 98 Workshop on Learning for Text Categorization[ C]. Menlo Park, 1998:87 -90.
  • 5Larkey L.S. A patent search and classification system [ A ]. Proceedings of the 4th ACM conference on Digital Libraries [ C ]. Berkeley, 1999 : 179 - 187.
  • 6Larkey L. S. , Connell M. E. , Callan J. Collection selection and results merging with topically organized US patents and TREC data [A]. Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM) [ C ]. Washington D. C. , 2000:282-289.
  • 7Fall C. J. , T rcsvari A. , Benzineb K. , et al. Automated categorization in the international patent classification [ J]. ACM SIGIR Forum, 2003, 37(1):10-25.
  • 8Iwayama M. , Fujii A. , Kando N. Overview of patent retrieval task at NTCIR-3[A]. ACL-2003 Workshop on Patent Corpus Processing[C]. Sapporo, 2003:24 - 32.
  • 9Li Y. , Bontcheva K. , Cunningham H. SVM based learning system for f-term patent classification[ A]. Proceedings of the 6th NTCIR Workshop Meeting[ C]. Tokyo, 2007 : 15 - 18.
  • 10Li Y. , Bontcheva K. , Cunningham H. Cost sensitive evaluation measures for f - term patent classification [A]. Proceedings of the 1 st International Workshop on Evaluating Information Access (EVIA) [C]. Tokyo, 2007:44 -53.

共引文献60

同被引文献171

引证文献15

二级引证文献123

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部