摘要
[目的/意义]改善现有专利技术主题分析方法主题辨识度低、主题词二义性、无法识别技术信息中的"问题"与相应"解决方案"等问题。[方法/过程]本文通过抽取专利文本中的SAO结构,并从SAO结构中识别"问题和解决方案"(P&S)模式,基于"bagofP&S"假设,构建基于"主语-行为-宾语"(subject-action-object,SAO)结构的LDA主题模型,实现对专利文献主题结构的识别和分析。[结果/结论]案例研究表明,该方法能够有效识别主题分布,并在主题辨识度和语义消岐方面较传统LDA模型具有较大优势。
[ Purpose/significance] There are three problems we have to fix in performing technical topic analysis: difficult to classify topic ; homonyms of words and terms ; difficult to identify technical problem and solution. [ Method/ process ] In this paper, we first extract SAO structures from patents, and then we explore and identify the problem & solu- tion patterns embodied in SAO structures. At last, SA0-Based LDA model is built based on the "bag of P&S" assumption and it performs technical topic analysis at concept level. [ Result/conclusion ] The case study shows that the proposed method can effectively identify topics' distribution, and has great advantages in topic identification and word disambigu- ation compared with traditional LDA model.
出处
《图书情报工作》
CSSCI
北大核心
2017年第3期86-96,共11页
Library and Information Service
基金
国家自然科学基金面上项目"基于语义TRIZ的新兴技术创新路径预测研究"(项目编号:71373019)
国家高技术研究发展计划"面向政府管理的大数据智能服务系统及应用示范"(项目编号:2014AA015105)研究成果之一