Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts...Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the '~ 〉〉 N" problem and is also ubiquitous in other bioinformatics and computational biology fields. The "p 〉〉 N" problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue. Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently. Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task -- predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods. Conclusions: The proposed eRBMs are capable of dealing with the "p 〉〉 N" problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.展开更多
Background: A main goal of metagenomics is taxonomic characterization of microbial communities. Although sequence comparison has been the main method for the taxonomic classification, there is not a clear agreement o...Background: A main goal of metagenomics is taxonomic characterization of microbial communities. Although sequence comparison has been the main method for the taxonomic classification, there is not a clear agreement on similarity calculation and similarity thresholds, especially at higher taxonomic levels such as phylum and class. Thus taxonomic classification of novel metagenomic sequences without close homologs in the biological databases poses a challenge. Methods: In this study, we propose to use the co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities to assist taxonomic classification. We developed a Markov Random Field model to predict taxa of unknown microorganisms using co-abundant associations. Results: Although such associations are intrinsically functional associations, we demonstrate that they are strongly correlated with taxonomic associations and can be combined with sequence comparison methods to predict taxonomic origins of unknown microorganisms at phylum and class levels. Conclusions: With the ever-increasing accumulation of sequence data from microbial communities, we now take the first step to explore these associations for taxonomic identification beyond sequence similarity. Availability and Implementation: Source codes of TACO are freely available at the following URL: https://github.com/ baharvand/OTU-Taxonomy-Identification implemented in C++, supported on Linux and MS Windows.展开更多
基金funded by Merck and the Yale Cancer Centerthe Department of Defense through the Lung Cancer Research Program X81XWH-15-1-0203(S.Goldberg,PI)and W81XWH-16-1-0160(K.Schalper,PI)+8 种基金NIH grants Yale SPORE in Lung Cancer P50CA196530(R.Herbst,PI)R01 CA158167(H.Kluger and G.Desir Pis)K24CA172123(H.Kluger,PI)Yale SPORE in Skin Cancer P50 CA121974(M.Bosenberg and H.Kluger,Pis)R01 CA204002(L.Jilaveanu,PI)the Lung Cancer Research Foundation-LUNGevity and Melanoma Research Alliance,Award«308721(L.Jilaveanu,PI)Stand UpTo Cancer-American Cancer Society Lung Cancer Dream Team Translational Research Grants SU2C-AACR-DT17-15(P.Janne,A.Shaw,J.Wolchok,Pis)SU2C-AACR-DT22-17(L.Diaz,PI)the J.Aron Charitable Foundation(S.Goldberg).
文摘背景与目的我们开展了一项帕博利珠单抗用于伴未治疗脑转移的非小细胞肺癌(non-small cell lung cancer,NSCLC)或黑色素瘤患者的疗效和安全性的II期试验,旨在评估程序性死亡受体1(programmed cell death 1,PD-1)抑制剂在中枢神经系统(central nervous system,CNS)中的疗效。中期结果已发表,现报道对NSCLC队列的更新分析结果。方法这是一项开放性、单中心、II期试验。纳入标准:年龄≥18岁,诊断为晚期NSCLC并伴有≥1个5 mm-20 mm脑转移病灶,既往从未治疗或之前放疗后进展,无神经系统症状,不需要激素治疗且美国东部肿瘤协作组(Eastern Cooperative Oncology Group,ECOG)<2分。患者每2周接受一次帕博利珠单抗(10 mg/kg)治疗。队列1为程序性死亡配体1(programmed cell death ligand 1,PD-L1)≥1%的患者,队列2为PD-L1<1%或未评估的患者。主要终点是脑转移患者缓解比例。所有经治患者均纳入疗效与安全性终点的分析。该研究已结束入组,并于Clinicaltrials.gov登记注册,注册号为NCT02085070。结果2014年3月31日-2018年5月21日,共42例患者接受治疗。中位随访时间为8.3个月(IQR:4.5个月-26.2个月)。队列1的37例患者中11例有脑转移缓解[29.7%(95%CI:15.9%-47.0%)]。队列2未观察到缓解。治疗相关的3级-4级不良事件(adverse events,AEs)包括2例肺炎、1例全身症状、1例结肠炎、1例肾上腺皮质功能不全、1例高血糖症和1例低钾血症。6例(14%)患者发生了治疗相关的严重不良事件,包括肺炎、急性肾损伤、低钾血症和肾上腺皮质功能不全。没有观察到治疗相关死亡病例。结论帕博利珠单抗治疗PD-L1≥1%的NSCLC伴脑转移患者有效,且对所有纳入的未经治疗的脑转移患者安全。需要进一步探索免疫治疗用于NSCLC合并CNS转移。
文摘Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the '~ 〉〉 N" problem and is also ubiquitous in other bioinformatics and computational biology fields. The "p 〉〉 N" problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue. Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently. Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task -- predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods. Conclusions: The proposed eRBMs are capable of dealing with the "p 〉〉 N" problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.
文摘Background: A main goal of metagenomics is taxonomic characterization of microbial communities. Although sequence comparison has been the main method for the taxonomic classification, there is not a clear agreement on similarity calculation and similarity thresholds, especially at higher taxonomic levels such as phylum and class. Thus taxonomic classification of novel metagenomic sequences without close homologs in the biological databases poses a challenge. Methods: In this study, we propose to use the co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities to assist taxonomic classification. We developed a Markov Random Field model to predict taxa of unknown microorganisms using co-abundant associations. Results: Although such associations are intrinsically functional associations, we demonstrate that they are strongly correlated with taxonomic associations and can be combined with sequence comparison methods to predict taxonomic origins of unknown microorganisms at phylum and class levels. Conclusions: With the ever-increasing accumulation of sequence data from microbial communities, we now take the first step to explore these associations for taxonomic identification beyond sequence similarity. Availability and Implementation: Source codes of TACO are freely available at the following URL: https://github.com/ baharvand/OTU-Taxonomy-Identification implemented in C++, supported on Linux and MS Windows.