本研究旨在通过随机生存森林和Cox比例风险模型,分析并预测肺癌患者的生存时间。研究数据来源于R语言中的survival包中的cancer数据集。首先,采用随机森林方法进行变量选择,结果显示,性别和体能状态是对生存时间具有显著影响的关键变量...本研究旨在通过随机生存森林和Cox比例风险模型,分析并预测肺癌患者的生存时间。研究数据来源于R语言中的survival包中的cancer数据集。首先,采用随机森林方法进行变量选择,结果显示,性别和体能状态是对生存时间具有显著影响的关键变量。接着,本文使用Cox比例风险模型进一步分析了上述变量对生存时间的影响。结果显示,体能状态评分越高,死亡风险越大,而女性患者的生存时间相对较长但统计显著性较低。Cox比例风险模型的分析表明,模型在区分生存时间上的能力较好,且模型整体显著。为了直观展示不同风险组的生存概率差异,绘制了生存曲线,结果表明,高风险组的生存概率显著低于低风险组。通过绘制ROC曲线并计算AUC值,发现模型在区分高低风险患者方面具有中等的预测能力。此外,Bootstrap方法验证了模型的稳定性,性别和体能状态的系数在多次抽样中的估计值较为一致。模型贡献解释中通过Shapley值进一步验证了性别和体能状态是预测生存时间的重要指标,确认了它们在模型中的关键作用。综上所述,本研究通过系统的变量选择、模型分析和多种评估方法,揭示了性别和体能状态对肺癌患者生存时间的显著影响,并验证了模型的稳健性和有效性,为临床实践中预测患者预后提供了重要的参考依据。The aim of this study was to analyze and predict the survival time of lung cancer patients by means of random survival forests and Cox proportional risk models. The study data were obtained from the cancer dataset in the survival package in R language. First, the random forest method was used for variable selection, and the results showed that gender and physical status were the key variables that had a significant effect on survival time. Then, this paper further analyzes the effects of the above variables on survival time using Cox proportional risk model. The results showed that the higher the physical status score, the higher the risk of death, while female patients had relatively longer but less statistically significant survival times. The analysis of the Cox proportional risk model showed the model’s ability to discriminate between survival times was better and the model was overall significant. Survival curves were plotted to visualize the difference in survival probability between different risk groups, and the results showed that the survival probability of the high-risk group was significantly lower than that of the low-risk group. By plotting the ROC curve and calculating the AUC value, the model was found to have moderate predictive ability in distinguishing between high- and low-risk patients. In addition, the Bootstrap method verified the stability of the model, and the coefficients for gender and physical status were more consistent in their estimates across multiple samples. The model contribution interpretation was further validated by the Shapley value that gender and physical fitness status are important predictors of survival time, confirming their key role. In summary, this study revealed the significant effects of gender and physical status on the survival time of lung cancer patients through systematic variable selection, model analysis and multiple assessment methods, and verified the robustness and validity of the model, which provides an important reference for predicting patients’ prognosis in clinical practice.展开更多
电力电缆故障信息的深层次挖掘可提高对电缆故障影响因素的分析。因此,针对某供电公司10 k V电力电缆故障数据,运用统计学模型—Cox比例风险模型,定量分析了电缆故障影响因素,用以指导电缆采购、施工、运行和维护。为确保数据分析的准确...电力电缆故障信息的深层次挖掘可提高对电缆故障影响因素的分析。因此,针对某供电公司10 k V电力电缆故障数据,运用统计学模型—Cox比例风险模型,定量分析了电缆故障影响因素,用以指导电缆采购、施工、运行和维护。为确保数据分析的准确性,提出了电缆数据预处理原则,探讨了合适的样本量大小。运用Cox比例风险模型对电缆故障影响因素进行单因素分析;运用Logistic回归模型确定了电缆故障影响因素类别,并统计计算了各电缆故障影响因素对应的电缆故障率,确定了各影响因素组成元素的相对危险程度,最终证明了Cox比例风险模型分析结果的正确性。结果表明:本体生产厂家M1、附件生产厂家N1、施工单位I3对应的电缆故障率最高分别为0.33、0.29、0.218,企业在进行电缆采购、施工、维护时应着重关注这3家单位。展开更多
文摘本研究旨在通过随机生存森林和Cox比例风险模型,分析并预测肺癌患者的生存时间。研究数据来源于R语言中的survival包中的cancer数据集。首先,采用随机森林方法进行变量选择,结果显示,性别和体能状态是对生存时间具有显著影响的关键变量。接着,本文使用Cox比例风险模型进一步分析了上述变量对生存时间的影响。结果显示,体能状态评分越高,死亡风险越大,而女性患者的生存时间相对较长但统计显著性较低。Cox比例风险模型的分析表明,模型在区分生存时间上的能力较好,且模型整体显著。为了直观展示不同风险组的生存概率差异,绘制了生存曲线,结果表明,高风险组的生存概率显著低于低风险组。通过绘制ROC曲线并计算AUC值,发现模型在区分高低风险患者方面具有中等的预测能力。此外,Bootstrap方法验证了模型的稳定性,性别和体能状态的系数在多次抽样中的估计值较为一致。模型贡献解释中通过Shapley值进一步验证了性别和体能状态是预测生存时间的重要指标,确认了它们在模型中的关键作用。综上所述,本研究通过系统的变量选择、模型分析和多种评估方法,揭示了性别和体能状态对肺癌患者生存时间的显著影响,并验证了模型的稳健性和有效性,为临床实践中预测患者预后提供了重要的参考依据。The aim of this study was to analyze and predict the survival time of lung cancer patients by means of random survival forests and Cox proportional risk models. The study data were obtained from the cancer dataset in the survival package in R language. First, the random forest method was used for variable selection, and the results showed that gender and physical status were the key variables that had a significant effect on survival time. Then, this paper further analyzes the effects of the above variables on survival time using Cox proportional risk model. The results showed that the higher the physical status score, the higher the risk of death, while female patients had relatively longer but less statistically significant survival times. The analysis of the Cox proportional risk model showed the model’s ability to discriminate between survival times was better and the model was overall significant. Survival curves were plotted to visualize the difference in survival probability between different risk groups, and the results showed that the survival probability of the high-risk group was significantly lower than that of the low-risk group. By plotting the ROC curve and calculating the AUC value, the model was found to have moderate predictive ability in distinguishing between high- and low-risk patients. In addition, the Bootstrap method verified the stability of the model, and the coefficients for gender and physical status were more consistent in their estimates across multiple samples. The model contribution interpretation was further validated by the Shapley value that gender and physical fitness status are important predictors of survival time, confirming their key role. In summary, this study revealed the significant effects of gender and physical status on the survival time of lung cancer patients through systematic variable selection, model analysis and multiple assessment methods, and verified the robustness and validity of the model, which provides an important reference for predicting patients’ prognosis in clinical practice.
文摘电力电缆故障信息的深层次挖掘可提高对电缆故障影响因素的分析。因此,针对某供电公司10 k V电力电缆故障数据,运用统计学模型—Cox比例风险模型,定量分析了电缆故障影响因素,用以指导电缆采购、施工、运行和维护。为确保数据分析的准确性,提出了电缆数据预处理原则,探讨了合适的样本量大小。运用Cox比例风险模型对电缆故障影响因素进行单因素分析;运用Logistic回归模型确定了电缆故障影响因素类别,并统计计算了各电缆故障影响因素对应的电缆故障率,确定了各影响因素组成元素的相对危险程度,最终证明了Cox比例风险模型分析结果的正确性。结果表明:本体生产厂家M1、附件生产厂家N1、施工单位I3对应的电缆故障率最高分别为0.33、0.29、0.218,企业在进行电缆采购、施工、维护时应着重关注这3家单位。