The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of me...The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomies data analysis with respect to their quality and detail of analysis using simulated metagenolnies data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetie metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomies from a bioinformatics viewpoint.展开更多
A sheer number of techniques and web resources are available for software engineering practice and this number continues to grow.Discovering semantically similar or related technical terms and web resources offers the...A sheer number of techniques and web resources are available for software engineering practice and this number continues to grow.Discovering semantically similar or related technical terms and web resources offers the opportunity to design appealing services to facilitate information retrieval and information discovery.In this study,we extract technical terms and web resources from a community of question and answer(Q&A)discussions and propose an approach based on a neural language model to learn the semantic representations of technical terms and web resources in a joint low-dimensional vector space.Our approach maps technical terms and web resources to a semantic vector space based only on the surrounding technical terms and web resources of a technical term(or web resource)in a discussion thread,without the need for mining the text content of the discussion.We apply our approach to Stack Overflow data dump of March 2018.Through both quantitative and qualitative analyses in the clustering,search,and semantic reasoning tasks,we show that the learnt technical-term and web-resource vector representations can capture the semantic relatedness of technical terms and web resources,and they can be exploited to support various search and semantic reasoning tasks,by means of simple K-nearest neighbor search and simple algebraic operations on the learnt vector representations in the embedding space.展开更多
With the development of high-resolution and high-throughput mass spectrometry(MS)technology, a large quantum of proteomic data is continually being generated. Collecting and sharing these data are a challenge that r...With the development of high-resolution and high-throughput mass spectrometry(MS)technology, a large quantum of proteomic data is continually being generated. Collecting and sharing these data are a challenge that requires immense and sustained human effort. In this report, we provide a classification of important web resources for MS-based proteomics and present rating of these web resources, based on whether raw data are stored, whether data submission is supported,and whether data analysis pipelines are provided. These web resources are important for biologists involved in proteomics research.展开更多
There are multitudes of web resources that are quite useful for the microbial scientific research community. Here, we provide a brief introduction on some of the most notable microbial web resources and an evaluation ...There are multitudes of web resources that are quite useful for the microbial scientific research community. Here, we provide a brief introduction on some of the most notable microbial web resources and an evaluation of them based upon our own user experience.展开更多
This paper selects 998 articles as its data sources from four Chinese core journals in the field of Library and Information Science from 2003 to 2007.Some pertinent aspects of reference citations particularly from web...This paper selects 998 articles as its data sources from four Chinese core journals in the field of Library and Information Science from 2003 to 2007.Some pertinent aspects of reference citations particularly from web resources are selected for a focused analysis and discussion.This includes primarily such items as the number of web citations,web citations per each article,the distribution of domain names of web citations and also certain aspects about the institutional and/or geographical affiliations of the author.The evolving situation of utilizing online networked academic information resources in China is the central thematic discussion of this study.The writing of this paper is augmented by the explicatory presentation of 3 graphic figures,6 tables and 18 references.展开更多
This paper introduces a novel architecture of metadata management system based on intelligent cache called Metadata Intelligent Cache Controller (MICC). By using an intelligent cache to control the metadata system, ...This paper introduces a novel architecture of metadata management system based on intelligent cache called Metadata Intelligent Cache Controller (MICC). By using an intelligent cache to control the metadata system, MICC can deal with different scenarios such as splitting and merging of queries into sub-queries for available metadata sets in local, in order to reduce access time of remote queries. Application can find results patially from local cache and the remaining portion of the metadata that can be fetched from remote locations. Using the existing metadata, it can not only enhance the fault tolerance and load balancing of system effectively, but also improve the efficiency of access while ensuring the access quality.展开更多
文摘The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomies data analysis with respect to their quality and detail of analysis using simulated metagenolnies data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetie metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomies from a bioinformatics viewpoint.
基金the National Natural Science Foundation of China(No.61872232)。
文摘A sheer number of techniques and web resources are available for software engineering practice and this number continues to grow.Discovering semantically similar or related technical terms and web resources offers the opportunity to design appealing services to facilitate information retrieval and information discovery.In this study,we extract technical terms and web resources from a community of question and answer(Q&A)discussions and propose an approach based on a neural language model to learn the semantic representations of technical terms and web resources in a joint low-dimensional vector space.Our approach maps technical terms and web resources to a semantic vector space based only on the surrounding technical terms and web resources of a technical term(or web resource)in a discussion thread,without the need for mining the text content of the discussion.We apply our approach to Stack Overflow data dump of March 2018.Through both quantitative and qualitative analyses in the clustering,search,and semantic reasoning tasks,we show that the learnt technical-term and web-resource vector representations can capture the semantic relatedness of technical terms and web resources,and they can be exploited to support various search and semantic reasoning tasks,by means of simple K-nearest neighbor search and simple algebraic operations on the learnt vector representations in the embedding space.
基金supported by the Ministry of Science and Technology of China(Grant Nos.2013CB910801,2012AA020201,2012AA020409,and 2014DFB30010)the National Natural Science Foundation of China(Grant Nos.21105121,21475150,and 61303073)Beijing Municipal Natural Science Foundation of China(Grant No.5122013)
文摘With the development of high-resolution and high-throughput mass spectrometry(MS)technology, a large quantum of proteomic data is continually being generated. Collecting and sharing these data are a challenge that requires immense and sustained human effort. In this report, we provide a classification of important web resources for MS-based proteomics and present rating of these web resources, based on whether raw data are stored, whether data submission is supported,and whether data analysis pipelines are provided. These web resources are important for biologists involved in proteomics research.
基金supported by the National High-tech R&D Program of China(863 Program,Grant No.2014AA021501)the National Scientific-Basic Special Fund from the Ministry of Science and Technology of China(Grant No.2014FY110500)
文摘There are multitudes of web resources that are quite useful for the microbial scientific research community. Here, we provide a brief introduction on some of the most notable microbial web resources and an evaluation of them based upon our own user experience.
基金supported by National Social Science Fund of China(Grant No.08CTQ015)
文摘This paper selects 998 articles as its data sources from four Chinese core journals in the field of Library and Information Science from 2003 to 2007.Some pertinent aspects of reference citations particularly from web resources are selected for a focused analysis and discussion.This includes primarily such items as the number of web citations,web citations per each article,the distribution of domain names of web citations and also certain aspects about the institutional and/or geographical affiliations of the author.The evolving situation of utilizing online networked academic information resources in China is the central thematic discussion of this study.The writing of this paper is augmented by the explicatory presentation of 3 graphic figures,6 tables and 18 references.
基金Supported by the National High-Technology Re-search and Development Programof China (2002AA1Z2308 ,2002AA118030)the Natural Science Foundation of Liaoning Province(20022027)
文摘This paper introduces a novel architecture of metadata management system based on intelligent cache called Metadata Intelligent Cache Controller (MICC). By using an intelligent cache to control the metadata system, MICC can deal with different scenarios such as splitting and merging of queries into sub-queries for available metadata sets in local, in order to reduce access time of remote queries. Application can find results patially from local cache and the remaining portion of the metadata that can be fetched from remote locations. Using the existing metadata, it can not only enhance the fault tolerance and load balancing of system effectively, but also improve the efficiency of access while ensuring the access quality.