Mazu is the most famous goddess of canal transport in China,and one of the three folk beliefs in China.Japan is our neighbor across the sea.As early as 1000 years ago,Japan was influenced by the Mazu ceremonial cultur...Mazu is the most famous goddess of canal transport in China,and one of the three folk beliefs in China.Japan is our neighbor across the sea.As early as 1000 years ago,Japan was influenced by the Mazu ceremonial culture.Through big data analysis,this study conducted database counting,screening,and analysis on the Mazu culture in Diaolong,the full-text database of Chinese and Japanese ancient books.Besides,it explored the hot topics of concern and emotional attitudes,and then analyzed the important role of Mazu culture in the cultural exchange and mutual learning between China and Japan in the new era,with a view to completing the contemporary task of“people-to-people bond”and achieving common development.展开更多
Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library arc...Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas(TCGA), via a full-text literature analysis.Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from Pub Med Central(PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.Findings: The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing(RNA-seq) platform is the most preferable for use.Research limitations: The current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.Practical implications: This study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.Originality/value: Few studies have been conducted to investigate data usage by governmentfunded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.展开更多
With the emergence and further development of the digital library, the approaches of information acquisition correspondingly change a lot. This paper makes a statistical analysis on the journal downloading and citatio...With the emergence and further development of the digital library, the approaches of information acquisition correspondingly change a lot. This paper makes a statistical analysis on the journal downloading and citation behaviors under the digital environment conceived by the National Science Library(NSL), Chinese Academy of Sciences(CAS). It can be seen that the development of digital resources has influenced scientific research behaviors. For example, the large quantity of full-text downloading will maintain; the trend of journal downloading behaviors is basically same as the journal citation behavior; journals with large quantity of full-text downloading also boast the high cited times, and vice versa. Furthermore, authors make a linear regression analysis, with the journal downloading amount as the independent variable and journal cited times as dependent variable. Then they also prove the positive correlation between the journal downloading and citation behaviors by means of Pearson's correlation coefficient formula.展开更多
学术全文本中包含了多种知识元,对这些知识元进行挖掘与组织,可以有效提升学术资源的利用效率。通过学术知识图谱的构建,将论文中各类隐性“知识元”串联起来,不但可以节省研究者获取知识点的时间,还可以通过知识图谱内的网络社区进行...学术全文本中包含了多种知识元,对这些知识元进行挖掘与组织,可以有效提升学术资源的利用效率。通过学术知识图谱的构建,将论文中各类隐性“知识元”串联起来,不但可以节省研究者获取知识点的时间,还可以通过知识图谱内的网络社区进行知识点的扩充。通过系统而全面的文献调研,本文从宏观、中观和微观3个维度出发,确定了18种学术论文中的关键知识元,并将学术全文本中的描述信息作为知识元对象,设计出学术知识图谱概念框架。然后,选取Journal of the Association for Information Science and Technology(JASIST)期刊的515篇学术全文本,对每篇论文中的关键知识元进行人工标注与基于深度学习的知识元抽取研究。研究内容包括该类知识元在人工标注过程中是否会遇到问题、在自动抽取时是否会达到预期值,从而对参与图谱构建的知识元进行筛选。最终筛选出9种知识元,包括数学公式、软件工具、数据源、具体模型、表、图、研究展望、研究问题和研究结果,与题录数据中的知识元共同生成由头知识元、关系、尾知识元组成的三元组,存入图数据库。最后,对该图谱进行可视化的评估与知识元检索研究,证明了其可行性与可扩展性。研究结果表明,学术全文本中的部分知识元适合大规模的自动化标注,而且各类知识元可以通过互相链接形成密集的知识社区,并具备知识元搜索等功能。展开更多
Algorithms play an increasingly important role in scientific work,especially in data-driven research.Investigating the mention of algorithms in full-text paper helps us understand the use and development of algorithms...Algorithms play an increasingly important role in scientific work,especially in data-driven research.Investigating the mention of algorithms in full-text paper helps us understand the use and development of algorithms in a specific domain.Current research on the mention of algorithms is limited to the academic papers in one language,which is hard to comprehensively investigate the use of algorithms.For example,in papers of Chinese conference,is the mention of algorithms consistent with it in English conference papers?In order to answer this question,this paper takes NLP as an example,and compares the mention frequency,mention location and mention time of the top10 data-mining algorithms between the papers of the famous international conference,Annual Meeting of the Association for Computational Linguistics(ACL),and the Chinese conference,China National Conference on Computational Linguistics(CCL).The results show that compared with ACL,the mention frequency of top10 data-mining algorithms in CCL is slightly lower and the mention time is slightly delayed,while the distribution of mention location is similar.This study can provide a reference for the research related to the mention,citation and evaluation of knowledge entities.展开更多
文摘Mazu is the most famous goddess of canal transport in China,and one of the three folk beliefs in China.Japan is our neighbor across the sea.As early as 1000 years ago,Japan was influenced by the Mazu ceremonial culture.Through big data analysis,this study conducted database counting,screening,and analysis on the Mazu culture in Diaolong,the full-text database of Chinese and Japanese ancient books.Besides,it explored the hot topics of concern and emotional attitudes,and then analyzed the important role of Mazu culture in the cultural exchange and mutual learning between China and Japan in the new era,with a view to completing the contemporary task of“people-to-people bond”and achieving common development.
基金supported by the National Population and Health Scientific Data Sharing Program of Chinathe Knowledge Centre for Engineering Sciences and Technology (Medical Centre)the Fundamental Research Funds for the Central Universities (Grant No.: 13R0101)
文摘Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas(TCGA), via a full-text literature analysis.Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from Pub Med Central(PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.Findings: The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing(RNA-seq) platform is the most preferable for use.Research limitations: The current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.Practical implications: This study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.Originality/value: Few studies have been conducted to investigate data usage by governmentfunded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.
文摘With the emergence and further development of the digital library, the approaches of information acquisition correspondingly change a lot. This paper makes a statistical analysis on the journal downloading and citation behaviors under the digital environment conceived by the National Science Library(NSL), Chinese Academy of Sciences(CAS). It can be seen that the development of digital resources has influenced scientific research behaviors. For example, the large quantity of full-text downloading will maintain; the trend of journal downloading behaviors is basically same as the journal citation behavior; journals with large quantity of full-text downloading also boast the high cited times, and vice versa. Furthermore, authors make a linear regression analysis, with the journal downloading amount as the independent variable and journal cited times as dependent variable. Then they also prove the positive correlation between the journal downloading and citation behaviors by means of Pearson's correlation coefficient formula.
文摘学术全文本中包含了多种知识元,对这些知识元进行挖掘与组织,可以有效提升学术资源的利用效率。通过学术知识图谱的构建,将论文中各类隐性“知识元”串联起来,不但可以节省研究者获取知识点的时间,还可以通过知识图谱内的网络社区进行知识点的扩充。通过系统而全面的文献调研,本文从宏观、中观和微观3个维度出发,确定了18种学术论文中的关键知识元,并将学术全文本中的描述信息作为知识元对象,设计出学术知识图谱概念框架。然后,选取Journal of the Association for Information Science and Technology(JASIST)期刊的515篇学术全文本,对每篇论文中的关键知识元进行人工标注与基于深度学习的知识元抽取研究。研究内容包括该类知识元在人工标注过程中是否会遇到问题、在自动抽取时是否会达到预期值,从而对参与图谱构建的知识元进行筛选。最终筛选出9种知识元,包括数学公式、软件工具、数据源、具体模型、表、图、研究展望、研究问题和研究结果,与题录数据中的知识元共同生成由头知识元、关系、尾知识元组成的三元组,存入图数据库。最后,对该图谱进行可视化的评估与知识元检索研究,证明了其可行性与可扩展性。研究结果表明,学术全文本中的部分知识元适合大规模的自动化标注,而且各类知识元可以通过互相链接形成密集的知识社区,并具备知识元搜索等功能。
基金supported by the National Natural Science Foundation of China(Grant No.72074113)
文摘Algorithms play an increasingly important role in scientific work,especially in data-driven research.Investigating the mention of algorithms in full-text paper helps us understand the use and development of algorithms in a specific domain.Current research on the mention of algorithms is limited to the academic papers in one language,which is hard to comprehensively investigate the use of algorithms.For example,in papers of Chinese conference,is the mention of algorithms consistent with it in English conference papers?In order to answer this question,this paper takes NLP as an example,and compares the mention frequency,mention location and mention time of the top10 data-mining algorithms between the papers of the famous international conference,Annual Meeting of the Association for Computational Linguistics(ACL),and the Chinese conference,China National Conference on Computational Linguistics(CCL).The results show that compared with ACL,the mention frequency of top10 data-mining algorithms in CCL is slightly lower and the mention time is slightly delayed,while the distribution of mention location is similar.This study can provide a reference for the research related to the mention,citation and evaluation of knowledge entities.