摘要
针对专利技术主题识别方法存在缺少语义语境、可解释性弱和主题界定模糊等问题,提出一种融合专利结构数据和文本语义的技术主题识别分析方法。该方法以专利IPC作为结构数据改进纯文本主题建模,获取由IPC和专家分类意见指导的主题词向量,并使用word2vec模型获取专利文本语义词向量,将二者结果进行向量拼接,进而获得易于解释的精准技术主题,满足细粒度分析要求。最后,以非小细胞肺癌治疗领域作为实证研究,证实该方法的科学性、有效性和实用性。
Aiming at the problems of lack of semantic context,weak interpretability and ambiguous topic definition in the patent technical subject recognition method,a technical subject recognition and analysis method integrating patent structure data and text semantics is proposed.The method uses patent IPC as structural data to improve the topic modeling of plain text,obtains topic word vectors guided by IPC and expert classification opinions,and uses word2vec model to obtain semantic word vectors of patent texts.The precise technical theme of,to meet the requirements of fine-grained analysis.Finally,taking the field of non-small cell lung cancer treatment as an empirical study,the scientificity,effectiveness and practicality of the method are confirmed.
作者
沈漫竹
于慧娴
李倩
袁红梅
Shen Manzhu;Yu Huixian;Li Qian;Yuan Hongmei(School of Business Administration,Shenyang Pharmaceutical University,Shenyang 110016,China)
出处
《科技管理研究》
CSSCI
北大核心
2022年第13期131-137,共7页
Science and Technology Management Research
基金
沈阳药科大学工商管理学院学科建设课题“基于专利数据的制造产业技术情报分析”(2021-sygsxk-01)。