Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library arc...Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas(TCGA), via a full-text literature analysis.Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from Pub Med Central(PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.Findings: The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing(RNA-seq) platform is the most preferable for use.Research limitations: The current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.Practical implications: This study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.Originality/value: Few studies have been conducted to investigate data usage by governmentfunded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.展开更多
Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based...Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based information available. An essential task inany scientific research is the study and review of previous works in the specified domain,a process that is referred to as the literature survey process. This process involves theidentification of prior work and evaluating its relevance to the research question. With theenormous number of published studies available online in digital form, this becomes acumbersome task for the researcher. This paper presents the design and implementationof a tool that aims to facilitate this process by identifying relevant work and suggestingclusters of articles by conceptual modeling, thus providing different options that enablethe researcher to visualize a large number of articles in a graphical easy-to-analyze form.The tool helps the researcher in analyzing and synthesizing the literature and building aconceptual understanding of the designated research area. The evaluation of the toolshows that researchers have found it useful and that it supported the process of relevantwork analysis given a specific research question, and 70% of the evaluators of the toolfound it very useful.展开更多
Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for ...Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.Design/methodology/approach: Personal recollections and literature review. Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions). Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery. Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (http://arrowsmith.psych.uic.edu), as does BITOLA which is maintained by Dmitar Hristovski (http:// http://ibmi.mf.uni-lj.si/bitola), and Epiphanet which is maintained by Trevor Cohen (http://epiphanet.uth.tme.edu/). Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists actually use discovery tools and that these are actually able to help them make experimental discoveries in the lab and in the clinic. Originality/value: This paper discusses problems and issues which were inherent in Don's thoughts during his life, including those which have not yet been fully taken up and studied systematically.展开更多
During a century-long time span,a remarkable array of Czech literary works had been introduced to China,marked by selections of different writers,genres,and themes.This study attempts to discuss what Czech literary wo...During a century-long time span,a remarkable array of Czech literary works had been introduced to China,marked by selections of different writers,genres,and themes.This study attempts to discuss what Czech literary works have been translated in China since its beginning in 1921.Besides the selection of texts for introduction,attention has also been given to the formats of their publication,the adoption of introduction approach,choice of intermediate languages,the numbers of different versions,and the frequency of their publication,as well as the textual and paratextual features of the end Chinese products.展开更多
This paper presents the analysis of the text Arrivals,which serves as the first chapter in Rana Dasgupta's fiction Tokyo Cancelled.The analysis mainly focuses on the role of literature devices including foreground...This paper presents the analysis of the text Arrivals,which serves as the first chapter in Rana Dasgupta's fiction Tokyo Cancelled.The analysis mainly focuses on the role of literature devices including foregrounding,metaphor,and voice.Both advantages and limitations of the devices are concerned in the analysis.Nevertheless,with the skillful use of the literature techniques,the charm of the text is enhanced through lines.展开更多
Digitization,informatization,and Internet penetration have led to a significant rise in cross-border e-commerce(CBEC),attracting considerable interest from academia,government,and industry.This study employed a novel ...Digitization,informatization,and Internet penetration have led to a significant rise in cross-border e-commerce(CBEC),attracting considerable interest from academia,government,and industry.This study employed a novel method combining automatic text generation technology and traditional bibliometric analysis to summarize and categorize the research on CBEC evolution from 2000 to 2021.Articles were selected and examined with a focus on four dimensions:customer,risk,supply chain,and platform.Contradictions in these dimensions were found to result in two major obstacles to CBEC development,namely,dataset sharing and platform scalability.These obstacles prevent research on cross-border platforms from moving beyond theory-based studies.Further research needs to examine how soft computing can be used to accelerate and remodel the global trade ecosystem.展开更多
手写体文本识别技术可以将手写文档转录成可编辑的数字文档。但由于手写的书写风格迥异、文档结构千变万化和字符分割识别精度不高等问题,基于神经网络的手写体英文文本识别仍面临着许多挑战。针对上述问题,提出基于卷积神经网络(CNN)和...手写体文本识别技术可以将手写文档转录成可编辑的数字文档。但由于手写的书写风格迥异、文档结构千变万化和字符分割识别精度不高等问题,基于神经网络的手写体英文文本识别仍面临着许多挑战。针对上述问题,提出基于卷积神经网络(CNN)和Transformer的手写体英文文本识别模型。首先利用CNN从输入图像中提取特征,而后将特征输入到Transformer编码器中得到特征序列每一帧的预测,最后经过链接时序分类(CTC)解码器获得最终的预测结果。在公开的IAM(Institut für Angewandte Mathematik)手写体英文单词数据集上进行了大量的实验结果表明,该模型获得了3.60%的字符错误率(CER)和12.70%的单词错误率(WER),验证了所提模型的可行性。展开更多
基金supported by the National Population and Health Scientific Data Sharing Program of Chinathe Knowledge Centre for Engineering Sciences and Technology (Medical Centre)the Fundamental Research Funds for the Central Universities (Grant No.: 13R0101)
文摘Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas(TCGA), via a full-text literature analysis.Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from Pub Med Central(PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.Findings: The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing(RNA-seq) platform is the most preferable for use.Research limitations: The current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.Practical implications: This study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.Originality/value: Few studies have been conducted to investigate data usage by governmentfunded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the source data.
文摘Text visualization is concerned with the representation of text in a graphicalform to facilitate comprehension of large textual data. Its aim is to improve the ability tounderstand and utilize the wealth of text-based information available. An essential task inany scientific research is the study and review of previous works in the specified domain,a process that is referred to as the literature survey process. This process involves theidentification of prior work and evaluating its relevance to the research question. With theenormous number of published studies available online in digital form, this becomes acumbersome task for the researcher. This paper presents the design and implementationof a tool that aims to facilitate this process by identifying relevant work and suggestingclusters of articles by conceptual modeling, thus providing different options that enablethe researcher to visualize a large number of articles in a graphical easy-to-analyze form.The tool helps the researcher in analyzing and synthesizing the literature and building aconceptual understanding of the designated research area. The evaluation of the toolshows that researchers have found it useful and that it supported the process of relevantwork analysis given a specific research question, and 70% of the evaluators of the toolfound it very useful.
基金supported by NIH grants R01LM010817 and P01AG039347
文摘Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.Design/methodology/approach: Personal recollections and literature review. Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions). Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery. Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (http://arrowsmith.psych.uic.edu), as does BITOLA which is maintained by Dmitar Hristovski (http:// http://ibmi.mf.uni-lj.si/bitola), and Epiphanet which is maintained by Trevor Cohen (http://epiphanet.uth.tme.edu/). Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists actually use discovery tools and that these are actually able to help them make experimental discoveries in the lab and in the clinic. Originality/value: This paper discusses problems and issues which were inherent in Don's thoughts during his life, including those which have not yet been fully taken up and studied systematically.
文摘During a century-long time span,a remarkable array of Czech literary works had been introduced to China,marked by selections of different writers,genres,and themes.This study attempts to discuss what Czech literary works have been translated in China since its beginning in 1921.Besides the selection of texts for introduction,attention has also been given to the formats of their publication,the adoption of introduction approach,choice of intermediate languages,the numbers of different versions,and the frequency of their publication,as well as the textual and paratextual features of the end Chinese products.
文摘This paper presents the analysis of the text Arrivals,which serves as the first chapter in Rana Dasgupta's fiction Tokyo Cancelled.The analysis mainly focuses on the role of literature devices including foregrounding,metaphor,and voice.Both advantages and limitations of the devices are concerned in the analysis.Nevertheless,with the skillful use of the literature techniques,the charm of the text is enhanced through lines.
基金supported by the National Natural Science Foundation(NSFC)Programs of China(Grant Nos.:72011540408 and 72032006)the National Research Foundation of Korea(Grant No.:NRF-2020K2A9A2A06069972)the support of the Youth Innovation Team of Shaanxi Universities“Big data and Business Intelligent Innovation Team”and Shaanxi Superiority Funding Project for Scientific and Technological Activities of Overseas Scholars(Grant No.:2018017).
文摘Digitization,informatization,and Internet penetration have led to a significant rise in cross-border e-commerce(CBEC),attracting considerable interest from academia,government,and industry.This study employed a novel method combining automatic text generation technology and traditional bibliometric analysis to summarize and categorize the research on CBEC evolution from 2000 to 2021.Articles were selected and examined with a focus on four dimensions:customer,risk,supply chain,and platform.Contradictions in these dimensions were found to result in two major obstacles to CBEC development,namely,dataset sharing and platform scalability.These obstacles prevent research on cross-border platforms from moving beyond theory-based studies.Further research needs to examine how soft computing can be used to accelerate and remodel the global trade ecosystem.
文摘手写体文本识别技术可以将手写文档转录成可编辑的数字文档。但由于手写的书写风格迥异、文档结构千变万化和字符分割识别精度不高等问题,基于神经网络的手写体英文文本识别仍面临着许多挑战。针对上述问题,提出基于卷积神经网络(CNN)和Transformer的手写体英文文本识别模型。首先利用CNN从输入图像中提取特征,而后将特征输入到Transformer编码器中得到特征序列每一帧的预测,最后经过链接时序分类(CTC)解码器获得最终的预测结果。在公开的IAM(Institut für Angewandte Mathematik)手写体英文单词数据集上进行了大量的实验结果表明,该模型获得了3.60%的字符错误率(CER)和12.70%的单词错误率(WER),验证了所提模型的可行性。