The generalized singular value decomposition(GSVD)of two matrices with the same number of columns is a very useful tool in many practical applications.However,the GSVD may suffer from heavy computational time and memo...The generalized singular value decomposition(GSVD)of two matrices with the same number of columns is a very useful tool in many practical applications.However,the GSVD may suffer from heavy computational time and memory requirement when the scale of the matrices is quite large.In this paper,we use random projections to capture the most of the action of the matrices and propose randomized algorithms for computing a low-rank approximation of the GSVD.Serval error bounds of the approximation are also presented for the proposed randomized algorithms.Finally,some experimental results show that the proposed randomized algorithms can achieve a good accuracy with less computational cost and storage requirement.展开更多
This study uses <span style="font-family:Verdana;">an empirical</span><span style="font-family:Verdana;"> analysis to quantify the downstream analysis effects of data pre-processi...This study uses <span style="font-family:Verdana;">an empirical</span><span style="font-family:Verdana;"> analysis to quantify the downstream analysis effects of data pre-processing choices. Bootstrap data simulation is used to measure the bias-variance decomposition of an empirical risk function, mean square error (MSE). Results of the risk function decomposition are used to measure the effects of model development choices on </span><span style="font-family:Verdana;">model</span><span style="font-family:Verdana;"> bias, variance, and irreducible error. Measurements of bias and variance are then applied as diagnostic procedures for model pre-processing and development. Best performing model-normalization-data structure combinations were found to illustrate the downstream analysis effects of these model development choices. </span><span style="font-family:Verdana;">In addition</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;">, results found from simulations were verified and expanded to include additional data characteristics (imbalanced, sparse) by testing on benchmark datasets available from the UCI Machine Learning Library. Normalization results on benchmark data were consistent with those found using simulations, while also illustrating that more complex and/or non-linear models provide better performance on datasets with additional complexities. Finally, applying the findings from simulation experiments to previously tested applications led to equivalent or improved results with less model development overhead and processing time.</span>展开更多
基金The research is supported by the National Natural Science Foundation of China under Grant nos.11701409 and 11571171the Natural Science Foundation of Jiangsu Province of China under Grant BK20170591the Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant 17KJB110018.
文摘The generalized singular value decomposition(GSVD)of two matrices with the same number of columns is a very useful tool in many practical applications.However,the GSVD may suffer from heavy computational time and memory requirement when the scale of the matrices is quite large.In this paper,we use random projections to capture the most of the action of the matrices and propose randomized algorithms for computing a low-rank approximation of the GSVD.Serval error bounds of the approximation are also presented for the proposed randomized algorithms.Finally,some experimental results show that the proposed randomized algorithms can achieve a good accuracy with less computational cost and storage requirement.
文摘This study uses <span style="font-family:Verdana;">an empirical</span><span style="font-family:Verdana;"> analysis to quantify the downstream analysis effects of data pre-processing choices. Bootstrap data simulation is used to measure the bias-variance decomposition of an empirical risk function, mean square error (MSE). Results of the risk function decomposition are used to measure the effects of model development choices on </span><span style="font-family:Verdana;">model</span><span style="font-family:Verdana;"> bias, variance, and irreducible error. Measurements of bias and variance are then applied as diagnostic procedures for model pre-processing and development. Best performing model-normalization-data structure combinations were found to illustrate the downstream analysis effects of these model development choices. </span><span style="font-family:Verdana;">In addition</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;">, results found from simulations were verified and expanded to include additional data characteristics (imbalanced, sparse) by testing on benchmark datasets available from the UCI Machine Learning Library. Normalization results on benchmark data were consistent with those found using simulations, while also illustrating that more complex and/or non-linear models provide better performance on datasets with additional complexities. Finally, applying the findings from simulation experiments to previously tested applications led to equivalent or improved results with less model development overhead and processing time.</span>