随着智能手机的普及,越来越多的手机具备评估用户日常活动消耗热量的功能。这类运动健康软件主要依赖智能手机记录用户每天活动状态的数据来计算热量消耗。然而,如何有效地分类和分析这些运动数据仍然是一个挑战。本文的研究目标是对实...随着智能手机的普及,越来越多的手机具备评估用户日常活动消耗热量的功能。这类运动健康软件主要依赖智能手机记录用户每天活动状态的数据来计算热量消耗。然而,如何有效地分类和分析这些运动数据仍然是一个挑战。本文的研究目标是对实验人员的运动数据进行分类和分析,以提高数据处理和分类的准确性。研究主要分为三个部分:1) 数据预处理:通过数据清洗和标准化处理,提取时间域和频域特征,并应用层次聚类算法对实验人员的运动数据进行分类,生成层次树状图展示数据点的层次关系。2) 分类模型评估:使用10名实验人员的运动数据,采用随机森林分类模型进行训练和预测。结果表明,模型整体准确性为65%,其中类别8的分类效果最佳,类别2和3的分类效果较差。3) 数据差异分析:整合数据并使用多元方差分析(MANOVA)检验不同实验人员传感器数据之间的显著差异。结果显示实验人员之间的传感器数据无显著差异。此外,通过相关性分析,计算传感器数据与实验人员特征(年龄、身高、体重)之间的相关系数,并绘制相关性矩阵。本文提出的分类和分析方法有效识别了实验人员的运动数据特征,提供了进一步优化模型和数据处理的建议,以提高分类准确性。With the widespread use of smartphones, more and more smartphones have the ability to evaluate the daily activity energy consumption of users. This feature mainly relies on the smartphone to record daily activity data and calculate energy consumption. However, how to effectively classify and analyze this data is a challenging task. This study conducts experiments on data from laboratory personnel to classify and analyze the data to improve the accuracy and validity of the data processing. The research is divided into three main parts: 1) Data preprocessing: Through data cleaning and standardization, time and frequency domain features are extracted, and unsupervised classification of these features is conducted using hierarchical clustering. A hierarchical tree diagram was generated to display the hierarchical relationship among data points. 2) Classification model evaluation: Using motion data from 10 participants, a Random Forest classification model was trained and tested. The overall accuracy of the model was 65%, with the best performance in classifying category 8, while categories 2 and 3 showed poorer classification results. 3) Data variance analysis: The data were consolidated, and a multivariate analysis of variance (MANOVA) was conducted to assess significant differences in sensor data among participants. The results indicated no significant differences in sensor data across the participants. In addition, relevant analyses are conducted to calculate the correlations between the transmission data and laboratory personnel characteristics (age, height, weight), combining correlation and regression analysis. This study summarizes the problems identified in data classification and analysis and provides further recommendations for model optimization and data processing.展开更多
文摘随着智能手机的普及,越来越多的手机具备评估用户日常活动消耗热量的功能。这类运动健康软件主要依赖智能手机记录用户每天活动状态的数据来计算热量消耗。然而,如何有效地分类和分析这些运动数据仍然是一个挑战。本文的研究目标是对实验人员的运动数据进行分类和分析,以提高数据处理和分类的准确性。研究主要分为三个部分:1) 数据预处理:通过数据清洗和标准化处理,提取时间域和频域特征,并应用层次聚类算法对实验人员的运动数据进行分类,生成层次树状图展示数据点的层次关系。2) 分类模型评估:使用10名实验人员的运动数据,采用随机森林分类模型进行训练和预测。结果表明,模型整体准确性为65%,其中类别8的分类效果最佳,类别2和3的分类效果较差。3) 数据差异分析:整合数据并使用多元方差分析(MANOVA)检验不同实验人员传感器数据之间的显著差异。结果显示实验人员之间的传感器数据无显著差异。此外,通过相关性分析,计算传感器数据与实验人员特征(年龄、身高、体重)之间的相关系数,并绘制相关性矩阵。本文提出的分类和分析方法有效识别了实验人员的运动数据特征,提供了进一步优化模型和数据处理的建议,以提高分类准确性。With the widespread use of smartphones, more and more smartphones have the ability to evaluate the daily activity energy consumption of users. This feature mainly relies on the smartphone to record daily activity data and calculate energy consumption. However, how to effectively classify and analyze this data is a challenging task. This study conducts experiments on data from laboratory personnel to classify and analyze the data to improve the accuracy and validity of the data processing. The research is divided into three main parts: 1) Data preprocessing: Through data cleaning and standardization, time and frequency domain features are extracted, and unsupervised classification of these features is conducted using hierarchical clustering. A hierarchical tree diagram was generated to display the hierarchical relationship among data points. 2) Classification model evaluation: Using motion data from 10 participants, a Random Forest classification model was trained and tested. The overall accuracy of the model was 65%, with the best performance in classifying category 8, while categories 2 and 3 showed poorer classification results. 3) Data variance analysis: The data were consolidated, and a multivariate analysis of variance (MANOVA) was conducted to assess significant differences in sensor data among participants. The results indicated no significant differences in sensor data across the participants. In addition, relevant analyses are conducted to calculate the correlations between the transmission data and laboratory personnel characteristics (age, height, weight), combining correlation and regression analysis. This study summarizes the problems identified in data classification and analysis and provides further recommendations for model optimization and data processing.