Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes...Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware.展开更多
This paper presents a generic procedure to implement a scalable and high performance data analysis framework for large-scale scientific simulation within an in-situ infrastructure. It demonstrates a unique capability ...This paper presents a generic procedure to implement a scalable and high performance data analysis framework for large-scale scientific simulation within an in-situ infrastructure. It demonstrates a unique capability for global Earth system simulations using advanced computing technologies (i.e., automated code analysis and instrumentation), in-situ infrastructure (i.e., ADIOS) and big data analysis engines (i.e., SciKit-learn). This paper also includes a useful case that analyzes a globe Earth System simulations with the integration of scalable in-situ infrastructure and advanced data processing package. The in-situ data analysis framework can provides new insights on scientific discoveries in multiscale modeling paradigms.展开更多
文摘Hyperparameter tuning is a key step in developing high-performing machine learning models, but searching large hyperparameter spaces requires extensive computation using standard sequential methods. This work analyzes the performance gains from parallel versus sequential hyperparameter optimization. Using scikit-learn’s Randomized SearchCV, this project tuned a Random Forest classifier for fake news detection via randomized grid search. Setting n_jobs to -1 enabled full parallelization across CPU cores. Results show the parallel implementation achieved over 5× faster CPU times and 3× faster total run times compared to sequential tuning. However, test accuracy slightly dropped from 99.26% sequentially to 99.15% with parallelism, indicating a trade-off between evaluation efficiency and model performance. Still, the significant computational gains allow more extensive hyperparameter exploration within reasonable timeframes, outweighing the small accuracy decrease. Further analysis could better quantify this trade-off across different models, tuning techniques, tasks, and hardware.
文摘This paper presents a generic procedure to implement a scalable and high performance data analysis framework for large-scale scientific simulation within an in-situ infrastructure. It demonstrates a unique capability for global Earth system simulations using advanced computing technologies (i.e., automated code analysis and instrumentation), in-situ infrastructure (i.e., ADIOS) and big data analysis engines (i.e., SciKit-learn). This paper also includes a useful case that analyzes a globe Earth System simulations with the integration of scalable in-situ infrastructure and advanced data processing package. The in-situ data analysis framework can provides new insights on scientific discoveries in multiscale modeling paradigms.