期刊文献+

运用图示法自动提取中文专利文本的语义信息 被引量:9

Applying Graph Representations to Automatic Extraction of Semantic Information from Chinese Patent text
原文传递
导出
摘要 [目的/意义]提出利用图结构的表示法自动挖掘中文专利文本的语义信息,以为基于文本内容的专利智能分析提供语义支持。[方法/过程]设计两种运用图结构的模型:1基于关键词的文本图模型;2基于依存关系树的文本图模型。第一种图模型通过计算关键词之间的相似性关系来定义;第二种图模型则由句中所提取的语法关系来定义。在案例研究中,借助频繁子图挖掘算法,对所建图模型进行子图挖掘,并构建以子图为特征的文本分类器,用来检测所建图模型的表达性和有效性。[结果/结论]将所建的基于图模型的文本分类器应用于4个不同技术领域的专利文本数据集,并与经典文本分类器的测试结果相比较而知:前者在使用明显较少的特征数的基础上,分类性能较后者提升2.1%-10.5%。由此而推断,使用图结构的表达法并结合图挖掘技术从专利文本中所提取的语义信息是有效的,有助于进一步的专利文本分析。 [ Purpose/significance ] This paper proposes a graph representation based approach to extract automatical- ly semantic information from Chinese patent texts; such information can be used to provide semantic support for text-con- tent based patent intelligent analysis. [ Method/process ] The author devised two graph models using graph representa- tions: ①a keyword based text graph model, ②a dependency tree based text graph model. The first graph model was con- structed by computing the similarities between any two keywords; the second graph model was constructed by extracting syntactic relations from text sentences. In the case study, the author utilized a frequent subgraph mining algorithm to dis- cover frequent subgraph patterns, and such patterns were further used as features to build text classifiers for the purpose of testing the expressivity and effectiveness of the graph models built before. [ Result/conclusion ] The constructed text clas- sifiers were tested on datasets consisting of patents from four different technology domains, in comparison with using a clas- sic text classifier. The experimental results show that the performance of two text classifiers using graph models has a gain of 2.1% - 10.5 % than a classic text classifier by using a smaller number of features. Thus, it can be inferred that emplo- ying graph representations and graph mining techniques to extract semantic information from patent texts is effective and fa- cilitates a further patent text analysis.
作者 姜春涛
出处 《图书情报工作》 CSSCI 北大核心 2015年第21期115-122,共8页 Library and Information Service
关键词 图示法 专利信息提取 频繁子图挖掘 专利分类 graph representations patent information extraction frequent subgraph mining patent classification
  • 相关文献

参考文献3

二级参考文献26

  • 1王卫平,郭长旺.文本挖掘在科技情报中的应用[J].中国科技产业,2004(12):35-37. 被引量:8
  • 2夏天,樊孝忠,刘林.利用JNI实现ICTCLAS系统的Java调用[J].计算机应用,2004,24(B12):177-178. 被引量:24
  • 3Vintar S,Buitelaar P,Ripplinger B. et al. An Efficient and Flexible Format for Linguistic and Semantic Annotation: Proceedings of LREC [ J ]. Online Review, 2003,13 ( 6 ) :466 - 469.
  • 4ArtEquAkt from The University of Southampton [ EB/OL]. [ 2008 - 08-30]. http ://www. aktors, org/technologies/artequakt/.
  • 5Advanced Knowledge Technologies [ EB/OL]. [ 2008 - 08 - 30 ]. http ://www. aktors, org/akt/.
  • 6Semantic Knowledge Technologies [ EB/OL]. [ 2008 - 08 - 30 ]. http ://www. sekt - project, com/.
  • 7Intelligent Search Agent for Information Extraction and Synthesis on the Web [ EB/OL ]. [ 2008 -08 -30 ]. http ://www. ntu. edu. sg,/ sci/research/knowledge, html.
  • 8What is Protege[ EB/OL]. [ 2008 -06 -10 ]. http://protege. stanford, edu/overview/index, html.
  • 9GATE : An Application Developer' s Guide [ EB/OL ]. [ 2008 - 06 - 30 ]. http ://www. dcs. shef. ac. uk/- valyt, diana, kalian, Hamish.
  • 10张嘉君,吴志新,乔维高.混合动力汽车整车控制策略研究[J].客车技术与研究,2007,29(4):8-11. 被引量:15

共引文献28

同被引文献79

引证文献9

二级引证文献76

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部