Citations based relevant research paper recommendations can be generated primarily with the assistance of three citation models:(1)Bibliographic Coupling,(2)Co-Citation,and(3)Direct Citations.Millions of new scholarly...Citations based relevant research paper recommendations can be generated primarily with the assistance of three citation models:(1)Bibliographic Coupling,(2)Co-Citation,and(3)Direct Citations.Millions of new scholarly articles are published every year.This flux of scientific information has made it a challenging task to devise techniques that could help researchers to find the most relevant research papers for the paper at hand.In this study,we have deployed an in-text citation analysis that extends the Direct Citation Model to discover the nature of the relationship degree-ofrelevancy among scientific papers.For this purpose,the relationship between citing and cited articles is categorized into three categories:weak,medium,and strong.As an experiment,around 5,000 research papers were crawled from the CiteSeerX.These research papers were parsed for the identification of in-text citation frequencies.Subsequently,0.1 million references of those articles were extracted,and their in-text citation frequencies were computed.A comprehensive benchmark dataset was established based on the user study.Afterwards,the results were validated with the help of Least Square Approximation by Quadratic Polynomial method.It was found that degreeof-relevancy between scientific papers is a quadratic increasing/decreasing polynomial with respect to-increase/decrease in the in-text citation frequencies of a cited article.Furthermore,the results of the proposed model were compared with state-of-the-art techniques by utilizing a well-known measure,known as the normalized Discount Cumulative Gain(nDCG).The proposed method received an nDCG score of 0.89,whereas the state-of-the-art models such as the Content,Bibliographic-coupling,and Metadata-based Models were able to acquire the nDCG values of 0.65,0.54,and 0.51 respectively.These results indicate that the proposed mechanism may be applied in future information retrieval systems for better results.展开更多
目的探讨合成生物学高被引论文替代计量学指标和文献计量学指标与被引频次的相关性。方法检索Web of Science数据库中合成生物学相关论文,检索时间为2023年6月21日,筛选高被引论文(被引频次排名前1%);同时,通过该数据库获取期刊引证报告...目的探讨合成生物学高被引论文替代计量学指标和文献计量学指标与被引频次的相关性。方法检索Web of Science数据库中合成生物学相关论文,检索时间为2023年6月21日,筛选高被引论文(被引频次排名前1%);同时,通过该数据库获取期刊引证报告(JCR)影响因子,与论文被引频次共同作为文献计量学指标,通过Altmetric官网获取Altmetric评分、Twitters等7个指标作为替代计量学指标;排除数据缺失比例超过20%的指标。分析合成生物学高被引论文发文量的年度变化趋势;分别采用多变量、单变量回归模型分析各指标与被引频次的相关性。结果共纳入100篇高被引论文及6个指标[包括2个文献计量学指标(被引频次和JCR影响因子)及4个替代计量学指标(Altmetric评分、Twitters、Mendeley和Patents)]。1999年至2021年,合成生物学高被引论文发文量先上升后下降,集中发表于2011年至2015年(51篇)。多变量回归模型分析显示,Altmetric评分、Mendeley、Patents与被引频次呈显著正相关,JCR影响因子和Twitters与被引频次的相关性不显著。单变量回归模型分析显示,与被引频次的相关性除Twitters不显著外,其他变量均显著,其中JCR影响因子相关性较弱,Mendeley和Patents分别可解释被引频次90.0%和85.6%的变化。结论替代计量学指标在一定程度上可反映论文的影响力;JCR影响因子与论文影响力相关性弱或无关。展开更多
文摘Citations based relevant research paper recommendations can be generated primarily with the assistance of three citation models:(1)Bibliographic Coupling,(2)Co-Citation,and(3)Direct Citations.Millions of new scholarly articles are published every year.This flux of scientific information has made it a challenging task to devise techniques that could help researchers to find the most relevant research papers for the paper at hand.In this study,we have deployed an in-text citation analysis that extends the Direct Citation Model to discover the nature of the relationship degree-ofrelevancy among scientific papers.For this purpose,the relationship between citing and cited articles is categorized into three categories:weak,medium,and strong.As an experiment,around 5,000 research papers were crawled from the CiteSeerX.These research papers were parsed for the identification of in-text citation frequencies.Subsequently,0.1 million references of those articles were extracted,and their in-text citation frequencies were computed.A comprehensive benchmark dataset was established based on the user study.Afterwards,the results were validated with the help of Least Square Approximation by Quadratic Polynomial method.It was found that degreeof-relevancy between scientific papers is a quadratic increasing/decreasing polynomial with respect to-increase/decrease in the in-text citation frequencies of a cited article.Furthermore,the results of the proposed model were compared with state-of-the-art techniques by utilizing a well-known measure,known as the normalized Discount Cumulative Gain(nDCG).The proposed method received an nDCG score of 0.89,whereas the state-of-the-art models such as the Content,Bibliographic-coupling,and Metadata-based Models were able to acquire the nDCG values of 0.65,0.54,and 0.51 respectively.These results indicate that the proposed mechanism may be applied in future information retrieval systems for better results.
文摘目的探讨合成生物学高被引论文替代计量学指标和文献计量学指标与被引频次的相关性。方法检索Web of Science数据库中合成生物学相关论文,检索时间为2023年6月21日,筛选高被引论文(被引频次排名前1%);同时,通过该数据库获取期刊引证报告(JCR)影响因子,与论文被引频次共同作为文献计量学指标,通过Altmetric官网获取Altmetric评分、Twitters等7个指标作为替代计量学指标;排除数据缺失比例超过20%的指标。分析合成生物学高被引论文发文量的年度变化趋势;分别采用多变量、单变量回归模型分析各指标与被引频次的相关性。结果共纳入100篇高被引论文及6个指标[包括2个文献计量学指标(被引频次和JCR影响因子)及4个替代计量学指标(Altmetric评分、Twitters、Mendeley和Patents)]。1999年至2021年,合成生物学高被引论文发文量先上升后下降,集中发表于2011年至2015年(51篇)。多变量回归模型分析显示,Altmetric评分、Mendeley、Patents与被引频次呈显著正相关,JCR影响因子和Twitters与被引频次的相关性不显著。单变量回归模型分析显示,与被引频次的相关性除Twitters不显著外,其他变量均显著,其中JCR影响因子相关性较弱,Mendeley和Patents分别可解释被引频次90.0%和85.6%的变化。结论替代计量学指标在一定程度上可反映论文的影响力;JCR影响因子与论文影响力相关性弱或无关。