摘要
技术相似性是企业、组织或国家进行技术情报分析的重要内容,能为其识别潜在竞争关系和合作伙伴提供准确、有效的信息支持。本文针对传统LDA(latent Dirichlet allocation)主题模型忽略专利文本上下文间语义关联的问题,提出了基于word2vec和LDA主题模型的技术相似性可视化研究方法。首先,基于word2vec模型学习特征词在专利文档集合中的上下文语境信息;其次,基于LDA主题模型构建专利权人-专利-技术主题三层概率分布,并将两者融合生成“词粒度”层面的主题向量、专利文档向量及专利权人向量;再次,利用向量相似性指标计算专利权人间的语义相似度,并在此基础上构建能够直观反映专利权人与技术主题关系的二模网络;最后,以NEDD(nano enabled drug delivery)领域为例进行实证研究,证明了该模型在技术相似性测度分析中具有较好的效果。
Technical similarity is an important part of technical intelligence analysis of enterprises,organizations,or countries,which can provide accurate and effective information for identifying potential competitive relationships and partners.Aiming at the problem that the LDA topic model ignores the semantic correlation between patent context,this paper proposes a technical similarity visualization method based on word2vec and LDA topic model.First,based on the word2vec model,we learn the contextual information of feature words in the collection of patent documents,and based on the LDA topic model,we construct the probability distribution of patentee-patent-technology topic and generate the topic vector,patent document vector,and patentee vector at the level of“word granularity.”We then use vector similarity index to calculate semantic similarity between patentee,and on this basis,the patentee-technology subject network is constructed.Finally,taking NEDD(nano enabled drug delivery)as an example,the model is proved to be effective in the analysis of technology similarity measure.
作者
席笑文
郭颖
宋欣娜
王瑾
Xi Xiaowen;Guo Ying;Song Xinna;Wang Jin(Archives of Chinese Academy of Sciences,Beijing 100190;Business School,China University of Political Science and Law,Beijing 100088;School of Management&Economics,Beijing Institute of Technology,Beijing 100081)
出处
《情报学报》
CSSCI
CSCD
北大核心
2021年第9期974-983,共10页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金面上项目“跨越‘死亡之谷’:产学研合作多层复杂网络测度方法与演化预测研究”(71874013)
中国政法大学钱端升杰出学者支持计划资助项目。