Purpose: This study aims to build an automatic survey generation tool, named CitationAS, based on citation content as represented by the set of citing sentences in the original articles.Design/methodology/approach: ...Purpose: This study aims to build an automatic survey generation tool, named CitationAS, based on citation content as represented by the set of citing sentences in the original articles.Design/methodology/approach: Firstly, we apply LDA to analyse topic distribution of citation content. Secondly, in CitationAS, we use bisecting K-means, Lingo and STC to cluster retrieved citation content. Then Word2Vec, Word Net and combination of them are applied to generate cluster labels. Next, we employ TF-IDF, MMR, as well as considering sentence location information, to extract important sentences, which are used to generate surveys. Finally, we adopt manual evaluation for the generated surveys.Findings: In experiments, we choose 20 high-frequency phrases as search terms. Results show that Lingo-Word2Vec, STC-Word Net and bisecting K-means-Word2Vec have better clustering effects. In 5 points evaluation system, survey quality scores obtained by designing methods are close to 3, indicating surveys are within acceptable limits. When considering sentence location information, survey quality will be improved. Combination of Lingo, Word2Vec, TF-IDF or MMR can acquire higher survey quality.Research limitations: The manual evaluation method may have a certain subjectivity. We use a simple linear function to combine Word2Vec and Word Net that may not bring out their strengths. The generated surveys may not contain some newly created knowledge of some articles which may concentrate on sentences with no citing.Practical implications: CitationAS tool can automatically generate a comprehensive, detailed and accurate survey according to user’s search terms. It can also help researchers learn about research status in a certain field.Originality/value: Citaiton AS tool is of practicability. It merges cluster labels from semantic level to improve clustering results. The tool also considers sentence location information when calculating sentence score by TF-IDF and MMR.展开更多
The purpose is to analyze the citing behaviors over books from the perspective of citation content,and to overcome the traditional deficiencies of book impact evaluation based on citation frequencies and book reviews,...The purpose is to analyze the citing behaviors over books from the perspective of citation content,and to overcome the traditional deficiencies of book impact evaluation based on citation frequencies and book reviews,so as to improve the accuracy and scientificity of book impact evaluation.We collected Chinese books from five disciplines including:computer science,law,medicine,literature and sport science from Amazon.cn.Then we extracted citation contents about these Chinese books from each citing literature manually and built a corpus with 2,288 citation contents.Finally,we analyzed citation behaviors over these Chinese books by mining citation locations,citation intensities,citation lengths and citation sentiments.The experimental results showed that:1)when citing Chinese books,authors from five disciplines had different preferences on citation locations;2)citation intensities mainly ranged from 1 to 3.In addition,citations in literature had more high citation intensities;3)the citation lengths were concentrated between 20 and 160;4)regarding citation sentiments of Chinese books,more than 80%citations were neutral.Compared with negative citations,there were more positive ones.展开更多
Purpose:Research dynamics have long been a research interest.It is a macro perspective tool for discovering temporal research trends of a certain discipline or subject.A micro perspective of research dynamics,however,...Purpose:Research dynamics have long been a research interest.It is a macro perspective tool for discovering temporal research trends of a certain discipline or subject.A micro perspective of research dynamics,however,concerning a single researcher or a highly cited paper in terms of their citations and“citations of citations”(forward chaining)remains unexplored.Design/methodology/approach:In this paper,we use a cross-collection topic model to reveal the research dynamics of topic disappearance topic inheritance,and topic innovation in each generation of forward chaining.Findings:For highly cited work,scientific influence exists in indirect citations.Topic modeling can reveal how long this influence exists in forward chaining,as well as its influence.Research limitations:This paper measures scientific influence and indirect scientific influence only if the relevant words or phrases are borrowed or used in direct or indirect citations.Paraphrasing or semantically similar concept may be neglected in this research.Practical implications:This paper demonstrates that a scientific influence exists in indirect citations through its analysis of forward chaining.This can serve as an inspiration on how to adequately evaluate research influence.Originality:The main contributions of this paper are the following three aspects.First,besides research dynamics of topic inheritance and topic innovation,we model topic disappearance by using a cross-collection topic model.Second,we explore the length and character of the research impact through“citations of citations”content analysis.Finally,we analyze the research dynamics of artificial intelligence researcher Geoffrey Hinton’s publications and the topic dynamics of forward chaining.展开更多
Citation Context Analysis(CCA)is a typical data-driven research field based on full-text information,which breaks the limitations of traditional citation analysis using only bibliographic data,and benefits further stu...Citation Context Analysis(CCA)is a typical data-driven research field based on full-text information,which breaks the limitations of traditional citation analysis using only bibliographic data,and benefits further studies on various citation behaviors and other core issues behind them,such as citation motivation,citation function and citation sentiment.Corpus for CCA is the most important guarantee and support for these issues.This paper attempts to discuss the corpus construction and mining for CCA in order to comprehensively review the research significance,research status and existing deficiencies in this area.Two main sections in our paper are:1)corpus construction for CCA,its three building tasks,such as citation sentence extraction,citation-reference mapping and citation context extraction,are discussed;2)corpus mining and utilization for CCA,following related topics or situations are explored,including classification of citation motivation(or behavior)and citation sentiment,indexing and retrieval based on citation,citation recommendation and evaluation,citation-based abstracting and review generation automatically,and domains knowledge metrics.Finally,some suggestions and future research directions are briefly listed.展开更多
基金supported by Major Projects of National Social Science Fund (No. 17ZDA291)Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201704)Qing Lan Project
文摘Purpose: This study aims to build an automatic survey generation tool, named CitationAS, based on citation content as represented by the set of citing sentences in the original articles.Design/methodology/approach: Firstly, we apply LDA to analyse topic distribution of citation content. Secondly, in CitationAS, we use bisecting K-means, Lingo and STC to cluster retrieved citation content. Then Word2Vec, Word Net and combination of them are applied to generate cluster labels. Next, we employ TF-IDF, MMR, as well as considering sentence location information, to extract important sentences, which are used to generate surveys. Finally, we adopt manual evaluation for the generated surveys.Findings: In experiments, we choose 20 high-frequency phrases as search terms. Results show that Lingo-Word2Vec, STC-Word Net and bisecting K-means-Word2Vec have better clustering effects. In 5 points evaluation system, survey quality scores obtained by designing methods are close to 3, indicating surveys are within acceptable limits. When considering sentence location information, survey quality will be improved. Combination of Lingo, Word2Vec, TF-IDF or MMR can acquire higher survey quality.Research limitations: The manual evaluation method may have a certain subjectivity. We use a simple linear function to combine Word2Vec and Word Net that may not bring out their strengths. The generated surveys may not contain some newly created knowledge of some articles which may concentrate on sentences with no citing.Practical implications: CitationAS tool can automatically generate a comprehensive, detailed and accurate survey according to user’s search terms. It can also help researchers learn about research status in a certain field.Originality/value: Citaiton AS tool is of practicability. It merges cluster labels from semantic level to improve clustering results. The tool also considers sentence location information when calculating sentence score by TF-IDF and MMR.
基金an outcome of the key project“Research on Discipline Construction of Information Science and Future Development Path of Information Work”(No.17ZDA291)supported by National Social Science Foundation of China
文摘The purpose is to analyze the citing behaviors over books from the perspective of citation content,and to overcome the traditional deficiencies of book impact evaluation based on citation frequencies and book reviews,so as to improve the accuracy and scientificity of book impact evaluation.We collected Chinese books from five disciplines including:computer science,law,medicine,literature and sport science from Amazon.cn.Then we extracted citation contents about these Chinese books from each citing literature manually and built a corpus with 2,288 citation contents.Finally,we analyzed citation behaviors over these Chinese books by mining citation locations,citation intensities,citation lengths and citation sentiments.The experimental results showed that:1)when citing Chinese books,authors from five disciplines had different preferences on citation locations;2)citation intensities mainly ranged from 1 to 3.In addition,citations in literature had more high citation intensities;3)the citation lengths were concentrated between 20 and 160;4)regarding citation sentiments of Chinese books,more than 80%citations were neutral.Compared with negative citations,there were more positive ones.
基金This work is supported by the Programs for the Young Talents of National Science Library,Chinese Academy of Sciences(Grant No.2019QNGR003).
文摘Purpose:Research dynamics have long been a research interest.It is a macro perspective tool for discovering temporal research trends of a certain discipline or subject.A micro perspective of research dynamics,however,concerning a single researcher or a highly cited paper in terms of their citations and“citations of citations”(forward chaining)remains unexplored.Design/methodology/approach:In this paper,we use a cross-collection topic model to reveal the research dynamics of topic disappearance topic inheritance,and topic innovation in each generation of forward chaining.Findings:For highly cited work,scientific influence exists in indirect citations.Topic modeling can reveal how long this influence exists in forward chaining,as well as its influence.Research limitations:This paper measures scientific influence and indirect scientific influence only if the relevant words or phrases are borrowed or used in direct or indirect citations.Paraphrasing or semantically similar concept may be neglected in this research.Practical implications:This paper demonstrates that a scientific influence exists in indirect citations through its analysis of forward chaining.This can serve as an inspiration on how to adequately evaluate research influence.Originality:The main contributions of this paper are the following three aspects.First,besides research dynamics of topic inheritance and topic innovation,we model topic disappearance by using a cross-collection topic model.Second,we explore the length and character of the research impact through“citations of citations”content analysis.Finally,we analyze the research dynamics of artificial intelligence researcher Geoffrey Hinton’s publications and the topic dynamics of forward chaining.
文摘Citation Context Analysis(CCA)is a typical data-driven research field based on full-text information,which breaks the limitations of traditional citation analysis using only bibliographic data,and benefits further studies on various citation behaviors and other core issues behind them,such as citation motivation,citation function and citation sentiment.Corpus for CCA is the most important guarantee and support for these issues.This paper attempts to discuss the corpus construction and mining for CCA in order to comprehensively review the research significance,research status and existing deficiencies in this area.Two main sections in our paper are:1)corpus construction for CCA,its three building tasks,such as citation sentence extraction,citation-reference mapping and citation context extraction,are discussed;2)corpus mining and utilization for CCA,following related topics or situations are explored,including classification of citation motivation(or behavior)and citation sentiment,indexing and retrieval based on citation,citation recommendation and evaluation,citation-based abstracting and review generation automatically,and domains knowledge metrics.Finally,some suggestions and future research directions are briefly listed.