期刊文献+

基于Sentence-BERT的专利技术主题聚类研究——以人工智能领域为例

Research onPatent Technology Subject Clustering Based on Sentence-BERT:Taking the Field of Artificial Intelligence as an Example
下载PDF
导出
摘要 [研究目的]将Sentence-BERT模型应用于专利技术主题聚类,解决专利文献为突出新颖性,常使用独特技术术语造成词汇向量语义特征稀疏的问题。[研究方法]以人工智能领域2015年-2019年的22370篇专利为实验数据。首先,采用Sentence-BERT算法对专利文献摘要文本进行向量化表示;其次,对向量化矩阵进行数据降维,利用HDBSCAN方式寻找原始数据中的高密度簇;最后,识别类簇文本集合中的主题特征,并完成主题呈现。[研究结论]对比LDA主题模型、K-means、doc2vec等方法,本文的实验结果提高了主题划分的细粒度和精确度,获得了较好的主题一致性。如何采用fine-tune策略进一步提升模型的效果,是未来该方法进一步深入探索的方向。 [Research purpose]The Sentence-Bert model is applied to patent technology topic clustering to solve the problem of sparse semantic features of lexical vectors caused by the frequent use of unique technical terms in patent documents in order to highlight novelty.[Research method]The study takes 22370 patents in the field of artificial intelligence from 2015 to 2019 as experimental data.Firstly,the Sentence-Bert algorithm is used to vectorize the patent document abstract text;Secondly,the data dimension of the vectorization matrix is reduced,and the HDBSCAN method is used to find the high-density clusters in the original data;Finally,the topic features in the class cluster text collection are identified and the topic presentation was completed.[Research conclusion]Compared with LDA topic model,K-means,doc2vec and other methods,the experimental results of this study improves the granularity and accuracy of topic division,and obtains better topic consistency.How to use the fine tune strategy to further improve the effect of the model is the direction of further exploration of this method in the future.
作者 阮光册 周萌葳 Ruan Guangce;Zhou Mengwei(Faculty of Economics and Management,East China Normal University,Shanghai 200241)
出处 《情报杂志》 北大核心 2024年第2期110-117,共8页 Journal of Intelligence
关键词 Sentence-BERT 专利文本 主题识别 文本聚类 Sentence-BERT patent text subject identification text clustering
  • 相关文献

参考文献25

二级参考文献259

共引文献716

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部