摘要
[目的/意义]提出利用图结构的表示法自动挖掘中文专利文本的语义信息,以为基于文本内容的专利智能分析提供语义支持。[方法/过程]设计两种运用图结构的模型:1基于关键词的文本图模型;2基于依存关系树的文本图模型。第一种图模型通过计算关键词之间的相似性关系来定义;第二种图模型则由句中所提取的语法关系来定义。在案例研究中,借助频繁子图挖掘算法,对所建图模型进行子图挖掘,并构建以子图为特征的文本分类器,用来检测所建图模型的表达性和有效性。[结果/结论]将所建的基于图模型的文本分类器应用于4个不同技术领域的专利文本数据集,并与经典文本分类器的测试结果相比较而知:前者在使用明显较少的特征数的基础上,分类性能较后者提升2.1%-10.5%。由此而推断,使用图结构的表达法并结合图挖掘技术从专利文本中所提取的语义信息是有效的,有助于进一步的专利文本分析。
[ Purpose/significance ] This paper proposes a graph representation based approach to extract automatical- ly semantic information from Chinese patent texts; such information can be used to provide semantic support for text-con- tent based patent intelligent analysis. [ Method/process ] The author devised two graph models using graph representa- tions: ①a keyword based text graph model, ②a dependency tree based text graph model. The first graph model was con- structed by computing the similarities between any two keywords; the second graph model was constructed by extracting syntactic relations from text sentences. In the case study, the author utilized a frequent subgraph mining algorithm to dis- cover frequent subgraph patterns, and such patterns were further used as features to build text classifiers for the purpose of testing the expressivity and effectiveness of the graph models built before. [ Result/conclusion ] The constructed text clas- sifiers were tested on datasets consisting of patents from four different technology domains, in comparison with using a clas- sic text classifier. The experimental results show that the performance of two text classifiers using graph models has a gain of 2.1% - 10.5 % than a classic text classifier by using a smaller number of features. Thus, it can be inferred that emplo- ying graph representations and graph mining techniques to extract semantic information from patent texts is effective and fa- cilitates a further patent text analysis.
出处
《图书情报工作》
CSSCI
北大核心
2015年第21期115-122,共8页
Library and Information Service
关键词
图示法
专利信息提取
频繁子图挖掘
专利分类
graph representations patent information extraction frequent subgraph mining patent classification