期刊文献+

专利文本主题建模中领域停用词自动选取研究 被引量:4

Automatic Selection of Domain-Specific Stopwords in Topic Model of Patent Text
原文传递
导出
摘要 [目的/意义]针对专利文本主题建模中领域停用词自动选取尚未有充分研究的问题,提出一种新的领域停用词自动选取方法,用于专利文本主题模型分析,以提高专利主题模型的区分度与建模质量。[方法/过程]领域停用词本质上是信息比较少,在不同类别专利文本中区分度低的词。因此,引入辅助专利文本集,使用类别熵衡量词的分布情况,然后依据词的类别熵进行排序,选取类别熵最大的若干词作为领域停用词。[结果/结论]实验通过专利文本数据,验证了该方法的可行性与有效性,能够有效地提高专利主题模型的区分度。 [ Purpose/significance ] Because the research that automatic selection of domain-specific stopwords in topic model of patent text is insufficient, this paper proposes a new method of automatic selection of domain-specific stopwords, for patent text topic model analysis, in order to improve the differentiation and modeling quality of the patent topic model. [ Method/process] In essence, domain-specific stopwords are less important words which contain relatively less information, such words are poorly differentiated in different kinds of patent. Therefore, this paper introduced the auxiliary multi-category patent text dataset and measured the distributions of words through the category entropy. Then, according to the category entropy of words. It chose some words that have the maximum category entropy as the domain-specific stopwords. [ Result/conclusion ] Experimental results show the feasibility and validity of the method proposed in this paper, which can improve the differentiation and quality of topic model for patent text analysis.
作者 俞琰 赵乃瑄 Yu Yan;Zhao Nianxuan(Information Service Department, Nanjing Teeh University, Nanjing 211816;Computer Science department, Southeast University Chengxian College, Nanjing 211816)
出处 《图书情报工作》 CSSCI 北大核心 2018年第11期120-126,共7页 Library and Information Service
基金 教育部人文社科规划项目项目“大数据时代技能知识图谱构建研究”(项目编号:16YJAZH073) 国家社会科学基金一般规划项目“大数据时代支持创新设计的多维度多层次专利文本挖掘研究”(项目编号:17BTQ059)研究成果之一
关键词 专利文本 主题建模 领域停用词 自动选取 patent text topic model domain-specific stopword automatic selection
  • 相关文献

参考文献12

二级参考文献104

共引文献219

同被引文献90

引证文献4

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部