摘要
Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the '~ 〉〉 N" problem and is also ubiquitous in other bioinformatics and computational biology fields. The "p 〉〉 N" problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue. Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently. Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task -- predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods. Conclusions: The proposed eRBMs are capable of dealing with the "p 〉〉 N" problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.
Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the '~ 〉〉 N" problem and is also ubiquitous in other bioinformatics and computational biology fields. The "p 〉〉 N" problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue. Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently. Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task -- predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods. Conclusions: The proposed eRBMs are capable of dealing with the "p 〉〉 N" problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.