Topic discovery and evolution in scientific literature based on content and citations 被引量：5

Topic discovery and evolution in scientific literature based on content and citations

导出

摘要 Researchers across the globe have been increasingly interested in the manner in which important research topics evolve over time within the corpus of scientific literature. In a dataset of scientific articles, each document can be considered to comprise both the words of the document itself and its citations of other documents. In this paper, we propose a citationcontent-latent Dirichlet allocation(LDA) topic discovery method that accounts for both document citation relations and the content of the document itself via a probabilistic generative model. The citation-content-LDA topic model exploits a two-level topic model that includes the citation information for ‘father' topics and text information for sub-topics. The model parameters are estimated by a collapsed Gibbs sampling algorithm. We also propose a topic evolution algorithm that runs in two steps: topic segmentation and topic dependency relation calculation. We have tested the proposed citation-content-LDA model and topic evolution algorithm on two online datasets, IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI) and IEEE Computer Society(CS), to demonstrate that our algorithm effectively discovers important topics and reflects the topic evolution of important research themes. According to our evaluation metrics, citation-content-LDA outperforms both content-LDA and citation-LDA. Researchers across the globe have been increasingly interested in the manner in which important research topics evolve over time within the corpus of scientific literature. In a dataset of scientific articles, each document can be considered to comprise both the words of the document itself and its citations of other documents. In this paper, we propose a citationcontent-latent Dirichlet allocation（LDA） topic discovery method that accounts for both document citation relations and the content of the document itself via a probabilistic generative model. The citation-content-LDA topic model exploits a two-level topic model that includes the citation information for ‘father＇ topics and text information for sub-topics. The model parameters are estimated by a collapsed Gibbs sampling algorithm. We also propose a topic evolution algorithm that runs in two steps： topic segmentation and topic dependency relation calculation. We have tested the proposed citation-content-LDA model and topic evolution algorithm on two online datasets, IEEE Transactions on Pattern Analysis and Machine Intelligence（PAMI） and IEEE Computer Society（CS）, to demonstrate that our algorithm effectively discovers important topics and reflects the topic evolution of important research themes. According to our evaluation metrics, citation-content-LDA outperforms both content-LDA and citation-LDA.

作者 Hou-kui ZHOU Hui-min YU Roland HU

机构地区 College of Information Science & Electronic Engineering State Key Lab of CAD & CG School of Information Engineering Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and Information Technology

出处《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第10期1511-1524,共14页 信息与电子工程前沿（英文版）

基金 supported by the National Basic Research Program(973)of China(No.2012CB316400)

关键词 Topic extraction Topic evolution Evaluation method Topic extraction Topic evolution Evaluation method

分类号 G353.1 [文化科学—情报学] TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献91

1陈悦,陈超美,刘则渊,胡志刚,王贤文.CiteSpace知识图谱的方法论功能[J].科学学研究,2015,33(2):242-253. 被引量：6718
2钱庆,李军莲.中国生物医学文献数据库的知识管理[J].医学情报工作,2004,25(5):347-349. 被引量：6
3叶继元.引文法既是定量又是定性的评价法[J].图书馆,2005(1):43-45. 被引量：46
4郝丽云,郭启煜.非相关文献知识发现研究进展[J].情报学报,2006,25(3):342-348. 被引量：19
5刘玉琴,汪雪锋,雷孝平.基于文本挖掘技术的专利质量评价与实证研究[J].计算机工程与应用,2007,43(33):12-14. 被引量：41
6曹娟,张勇东,李锦涛,唐胜.一种基于密度的自适应最优LDA模型选择方法[J].计算机学报,2008,31(10):1780-1787. 被引量：83
7华连连,张悟移.知识流动及相关概念辨析[J].情报杂志,2010,29(10):112-117. 被引量：55
8张运良,徐硕,朱礼军,乔晓东.汉语科技词系统——一种可用于科技信息资源深度内容分析的语义资源[J].图书情报工作,2011,55(4):100-105. 被引量：5
9王萍.基于概率主题模型的文献知识挖掘[J].情报学报,2011,30(6):583-590. 被引量：26
10张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802. 被引量：165

引证文献5

1Xiaoli Chen,Tao Han.A Micro Perspective of Research Dynamics Through“Citations of Citations”Topic Analysis[J].Journal of Data and Information Science,2020,5(4):19-34. 被引量：2
2李璐萍,赵小兵.基于主题模型的主题发现方法研究综述[J].中央民族大学学报（自然科学版）,2021,30(2):59-66. 被引量：7
3白如江,赵梦梦,张玉洁,董坤.科技文献挖掘工具平台与关键技术综述[J].数据与计算发展前沿,2021,3(6):60-80. 被引量：3
4王露,乐小虬.科技论文引用内容分析研究进展[J].数据分析与知识发现,2022,6(4):1-15. 被引量：9
5于诗睿,李爱花,林紫洛,陈逸菲,唐小利.基于主题模型的科技文献主题演化及优化方法研究综述[J].医学信息学杂志,2023,44(8):31-36.

二级引证文献21

1周婷玮.基于共现网络与情感分析的多平台消费者评论主题比较研究[J].知识管理论坛,2023(2):79-91. 被引量：2
2Giuseppe Catalano,Cinzia Daraio,Jacqueline Leta,Henk F.Moed,Giancarlo Ruocco,Xiaolin Zhang.Novel Approaches to the Development and Application of Informetric and Scientometric Tools[J].Journal of Data and Information Science,2020,5(4):1-4.
3周帅,王绍杰.私有工控协议分类方法研究[J].信息技术与网络安全,2021,40(9):19-24. 被引量：4
4段红梅.LDA主题模型及其在护理学中的应用进展[J].中华现代护理杂志,2022,28(16):2106-2110. 被引量：1
5陈翔宇,王一博,段红梅.基于LDA主题模型的慢性病健康素养相关研究的主题挖掘与分析[J].中华现代护理杂志,2022,28(16):2111-2115. 被引量：2
6刘德喜,邹婷,廖国琼,万常选,狄国强.计算机组成原理教研论文主题比较分析[J].软件导刊,2022,21(7):130-136.
7王一禾,吕千千,祝贺.标准数字化转型关键技术及其应用分析[J].信息技术与标准化,2022(10):51-55. 被引量：7
8张吉玉,张均胜,乔晓东.辅助新颖性评估的科技论文评述画像构建方法[J].情报理论与实践,2023,46(1):159-167. 被引量：2
9齐小英,李昕尉,杨海平.基于意图和情感的南海学术论文引用特征研究[J].数据分析与知识发现,2022,6(12):53-69. 被引量：3
10张东鑫,张敏.图情领域LDA主题模型应用研究进展述评[J].图书情报知识,2022,39(6):143-157. 被引量：15

1Elinoar Bareket.The Evolution of Biblical Terms through the Ages[J].Journal of Philosophy Study,2017,7(10):543-552.
2Jing Lu,Longyi Shao,Minfang Yang,Kai Zhou,James R.Wheeley,Hao Wang,Jason Hilton.Depositional Model for Peat Swamp and Coal Facies Evolution Using Sedimentology, Coal Macerals, Geochemistry and Sequence Stratigraphy[J].Journal of Earth Science,2017,28(6):1163-1177. 被引量：7
3Giuseppe Lippi,Fabian Sanchis-Gomar.Procalcitonin in inflammatory bowel disease: Drawbacks and opportunities[J].World Journal of Gastroenterology,2017,23(47):8283-8290. 被引量：24
4Alberto Caballero Vázquez,Ana Dolores Romero Ortiz,Jose Manuel González de Vega San Román,Raimundo García del Moral,Bernardino Alcázar Navarrete.Epidemiological Evolution of Lung Cancer in the South of Spain from 1990 to 2010[J].中国肺癌杂志,2018,21(1):32-36. 被引量：17
5Alina Szwajczuk.Evolution of Terminology Within the School System in Poland Viewed as a Challenge for Translators[J].Sino-US English Teaching,2017,14(9):569-576.
6Yusuke Takamura,Toshiaki Tsunogae,M.Santosh,Yukiyasu Tsutsumi.Detrital zircon geochronology of the Lutzow-Holm Complex,East Antarctica:Implications for Antarctica-Sri Lanka correlation[J].Geoscience Frontiers,2018,9(2):355-375. 被引量：6
7A.R.A.Aitken,S.A.Occhipinti,M.D.Lindsay,A.Joly,H.M.Howard,S.P.Johnson,J.A.Hollis,C.V.Spaggiari,I.M.Tyler,T.C.McCuaig,M.C.Dentith.The tectonics and mineral systems of Proterozoic Western Australia:Relationships with supercontinents and global secular change[J].Geoscience Frontiers,2018,9(2):295-316.

Frontiers of Information Technology & Electronic Engineering

2017年第10期

浏览历史

内容加载中请稍等...

Topic discovery and evolution in scientific literature based on content and citations 被引量：5

同被引文献91

引证文献5

二级引证文献21

相关作者

相关机构

相关主题

浏览历史