Background Data-driven event analysis has gradually become the backbone of modern competitive sports analysis. Competitive sports data analysis tasks increasingly use computer vision and machine-learning models for in...Background Data-driven event analysis has gradually become the backbone of modern competitive sports analysis. Competitive sports data analysis tasks increasingly use computer vision and machine-learning models for intelligent data analysis. Existing sports visualization systems focus on the player–team data visualization, which is not intuitive enough for team season win–loss data and game time-series data visualization and neglects the prediction of all-star players. Methods This study used an interactive visualization system designed with parallel aggregated ordered hypergraph dynamic hypergraphs, Calliope visualization data story technology,and i Storyline narrative visualization technology to visualize the regular statistics and game time data of players and teams. NPIPVis includes dynamic hypergraphs of a team’s wins and losses and game plot narrative visualization components. In addition, an integrated learning-based all-star player prediction model, SRR-voting, which starts from the existing minority and majority samples, was proposed using the synthetic minority oversampling technique and Random Under Sampler methods to generate and eliminate samples of a certain size to balance the number of allstar and average players in the datasets. Next, a random forest algorithm was introduced to extract and construct the features of players and combined with the voting integrated model to predict the all-star players, using GridSearch CV, to optimize the hyperparameters of each model in integrated learning and then combined with five-fold cross-validation to improve the generalization ability of the model. Finally, the SHapley Additive ex Planations(SHAP) model was introduced to enhance the interpretability of the model. Results The experimental results of comparing the SRR-voting model with six common models show that the accuracy, F1-score, and recall metrics are significantly improved, which verifies the effectiveness and practicality of the SRR-voting model. Conclusions This study combines data visualization and machine learning to design a National Basketball Association data visualization system to help the general audience visualize game data and predict all-star players;this can also be extended to other sports events or related fields.展开更多
基金Supported by the National Natural Science Foundation of China(61862018)the Subject of the Training Plan for Thousands of Young and Middle-aged Backbone Teachers in Guangxi Colleges and Universities(2020QGRW017)。
文摘Background Data-driven event analysis has gradually become the backbone of modern competitive sports analysis. Competitive sports data analysis tasks increasingly use computer vision and machine-learning models for intelligent data analysis. Existing sports visualization systems focus on the player–team data visualization, which is not intuitive enough for team season win–loss data and game time-series data visualization and neglects the prediction of all-star players. Methods This study used an interactive visualization system designed with parallel aggregated ordered hypergraph dynamic hypergraphs, Calliope visualization data story technology,and i Storyline narrative visualization technology to visualize the regular statistics and game time data of players and teams. NPIPVis includes dynamic hypergraphs of a team’s wins and losses and game plot narrative visualization components. In addition, an integrated learning-based all-star player prediction model, SRR-voting, which starts from the existing minority and majority samples, was proposed using the synthetic minority oversampling technique and Random Under Sampler methods to generate and eliminate samples of a certain size to balance the number of allstar and average players in the datasets. Next, a random forest algorithm was introduced to extract and construct the features of players and combined with the voting integrated model to predict the all-star players, using GridSearch CV, to optimize the hyperparameters of each model in integrated learning and then combined with five-fold cross-validation to improve the generalization ability of the model. Finally, the SHapley Additive ex Planations(SHAP) model was introduced to enhance the interpretability of the model. Results The experimental results of comparing the SRR-voting model with six common models show that the accuracy, F1-score, and recall metrics are significantly improved, which verifies the effectiveness and practicality of the SRR-voting model. Conclusions This study combines data visualization and machine learning to design a National Basketball Association data visualization system to help the general audience visualize game data and predict all-star players;this can also be extended to other sports events or related fields.