Latent factor(LF) models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS) matrices which are commonly seen in various industrial applications. An LF model usually adopts itera...Latent factor(LF) models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS) matrices which are commonly seen in various industrial applications. An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost. Hence, determining how to accelerate the training process for LF models has become a significant issue. To address this, this work proposes a randomized latent factor(RLF) model. It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices, thereby greatly alleviating computational burden. It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models, RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices, which is especially desired for industrial applications demanding highly efficient models.展开更多
Emergence of drug resistant bacteria is one of the serious problems in today’s public health. However, the relationship between genomic mutation of bacteria and the phenotypic difference of them is still unclear. In ...Emergence of drug resistant bacteria is one of the serious problems in today’s public health. However, the relationship between genomic mutation of bacteria and the phenotypic difference of them is still unclear. In this paper, based on the mutation information in whole genome sequences of 96 MRSA strains, two kinds of phenotypes (pathogenicity and drug resistance) were learnt and predicted by machine learning algorithms. As a result of effective feature selection by cross entropy based sparse logistic regression, these phenotypes could be predicted in sufficiently high accuracy (100% and 97.87%, respectively) with less than 10 features. It means that we could develop a novel rapid test method in the future for checking MRSA phenotypes.展开更多
This paper proposes a novel method for testing the equality of high-dimensional means using a multiple hypothesis test. The proposed method is based on the maximum of standardized partial sums of logarithmic p-values ...This paper proposes a novel method for testing the equality of high-dimensional means using a multiple hypothesis test. The proposed method is based on the maximum of standardized partial sums of logarithmic p-values statistic. Numerical studies show that the method performs well for both normal and non-normal data and has a good power performance under both dense and sparse alternative hypotheses. For illustration, a real data analysis is implemented.展开更多
基金supported in part by the National Natural Science Foundation of China (6177249391646114)+1 种基金Chongqing research program of technology innovation and application (cstc2017rgzn-zdyfX0020)in part by the Pioneer Hundred Talents Program of Chinese Academy of Sciences
文摘Latent factor(LF) models are highly effective in extracting useful knowledge from High-Dimensional and Sparse(HiDS) matrices which are commonly seen in various industrial applications. An LF model usually adopts iterative optimizers,which may consume many iterations to achieve a local optima,resulting in considerable time cost. Hence, determining how to accelerate the training process for LF models has become a significant issue. To address this, this work proposes a randomized latent factor(RLF) model. It incorporates the principle of randomized learning techniques from neural networks into the LF analysis of HiDS matrices, thereby greatly alleviating computational burden. It also extends a standard learning process for randomized neural networks in context of LF analysis to make the resulting model represent an HiDS matrix correctly.Experimental results on three HiDS matrices from industrial applications demonstrate that compared with state-of-the-art LF models, RLF is able to achieve significantly higher computational efficiency and comparable prediction accuracy for missing data.I provides an important alternative approach to LF analysis of HiDS matrices, which is especially desired for industrial applications demanding highly efficient models.
文摘Emergence of drug resistant bacteria is one of the serious problems in today’s public health. However, the relationship between genomic mutation of bacteria and the phenotypic difference of them is still unclear. In this paper, based on the mutation information in whole genome sequences of 96 MRSA strains, two kinds of phenotypes (pathogenicity and drug resistance) were learnt and predicted by machine learning algorithms. As a result of effective feature selection by cross entropy based sparse logistic regression, these phenotypes could be predicted in sufficiently high accuracy (100% and 97.87%, respectively) with less than 10 features. It means that we could develop a novel rapid test method in the future for checking MRSA phenotypes.
基金supported by a grant from the University Grants Council of Hong Kong, National Natural Science Foundation of China (Grant No. 11471335)the Ministry of Education Project of Key Research Institute of Humanities and Social Sciences at Universities (Grant No. 16JJD910002)Fund for Building World-Class Universities (Disciplines) of Renmin University of China
文摘This paper proposes a novel method for testing the equality of high-dimensional means using a multiple hypothesis test. The proposed method is based on the maximum of standardized partial sums of logarithmic p-values statistic. Numerical studies show that the method performs well for both normal and non-normal data and has a good power performance under both dense and sparse alternative hypotheses. For illustration, a real data analysis is implemented.