为探究湖北襄阳地区高温大曲中芽孢杆菌(Bacillus)菌群多样性,该研究通过下载并筛选高温大曲样品芽孢杆菌相对含量>2.0%的测序数据,采用Illumina Mi Seq高通量测序技术和传统纯培养技术相结合的方法对大曲样品中芽孢杆菌多样性进行...为探究湖北襄阳地区高温大曲中芽孢杆菌(Bacillus)菌群多样性,该研究通过下载并筛选高温大曲样品芽孢杆菌相对含量>2.0%的测序数据,采用Illumina Mi Seq高通量测序技术和传统纯培养技术相结合的方法对大曲样品中芽孢杆菌多样性进行解析。结果表明,高温大曲中芽孢杆菌相对含量>2.0%的样品共有12份,包括8份白色大曲样品和4份黄色大曲样品,所有样品共获得77 825条代表性序列,1 197个操作分类单元(OTUs)。多样性分析结果表明,较之黄色大曲,白色大曲样品中芽孢杆菌的超1指数和发现物种数显著偏高(P<0.05)。OTU分析结果表明,OTU5321和OTU5291在所有样品中均存在,其在白色大曲和黄色大曲中累积包含序列分别占总序列数的49.43%和41.32%。传统纯培养技术分析结果表明,解淀粉芽孢杆菌(Bacillus amyloliquefaciens)为高温大曲中芽孢杆菌的主要可培养菌种。由此可知,白色大曲芽孢杆菌群落结构的丰富度更高,且解淀粉芽孢杆菌为高温大曲中芽孢杆菌的主要可培养菌种,为后续高温大曲中芽孢杆菌的应用及菌种选育具有指导作用。展开更多
Corpus is a kind of important resource for knowledge acquisition in the natural language processing (NLP). However, up to now, in the biomedical domain comparatively fewer corpus focus on semantic association among ...Corpus is a kind of important resource for knowledge acquisition in the natural language processing (NLP). However, up to now, in the biomedical domain comparatively fewer corpus focus on semantic association among all tokens in a sentence. We proposed an annotation scheme based on feature structure theory for enriching biomedical domain corpora with token semantic association (TSA). There are 227 documents of the BioNLP GE ST training data annotated to form TSA corpus in which each annotated item shows a token semantic association that appears as a triple. The annotation of token semantic association has the potential to significantly advance biomedical text mining by providing rich token semantic information for NLP systems especially for the sophisticated IE systems, such as bio-event extraction.展开更多
基金Supported by the National Natural Science Foundation of China(61202304,61173095,61173062,61202193)
文摘Corpus is a kind of important resource for knowledge acquisition in the natural language processing (NLP). However, up to now, in the biomedical domain comparatively fewer corpus focus on semantic association among all tokens in a sentence. We proposed an annotation scheme based on feature structure theory for enriching biomedical domain corpora with token semantic association (TSA). There are 227 documents of the BioNLP GE ST training data annotated to form TSA corpus in which each annotated item shows a token semantic association that appears as a triple. The annotation of token semantic association has the potential to significantly advance biomedical text mining by providing rich token semantic information for NLP systems especially for the sophisticated IE systems, such as bio-event extraction.