摘要
目的 基于随机森林规则提取方法探讨顽固性高血压(RH)发生主要不良心血管事件(MACE)患者的临床信息分类规则及特征。方法 采集住院治疗的2093例RH患者的电子病历信息,其中RH患者807例,RH+MACE患者1286例。收集患者的个人基本信息(包括性别、年龄、既往史、家族史、个人史、传染病史)、中医四诊信息、西医诊断信息数据。统计前期研究发现的与RH患者发生MACE密切相关的140个指标的分布情况,并基于密切相关指标运用随机森林规则提取方法分析RH患者发生MACE的影响特征;对数据进行结构化处理并基于随机森林训练构造多个去相关决策树,生成丰富的决策规则,再采用规则提取方法从规则中提取、度量、修剪、选择规则,利用序贯覆盖法筛选简化的分类规则集,并定义规则的可解释性度量。结果 2093例患者中医四诊信息频次前5位的密切相关指标分别为弦脉1101次(52.6%)、苔腻947次(45.25%)、睡眠欠佳870次(41.57%)、面色少华784次(37.46%);患者基本信息及西医诊断信息频次前5位分别为心率正常1765次(84.33%)、既往高血压病1601次(76.49%)、年龄大于60岁1547次(73.91%)、舒张压正常1428次(68.23%)、高血压3级1151次(54.99%)。RH患者发生MACE影响特征的可解释性分析结果显示,RH+MACE且误差<0.05的规则有3条,规则1:咳嗽c (’1’)&老舌c (’1’)&头晕c (’0’)&胸闷c (’0’);规则2:恶心c (’0’)&颈动脉硬化c (’1’)&弦脉c (’1’);规则3:憋喘c(’1’)。3条规则分类准确率均大于95%(误差小于0.05),并具有一定的临床可解释性。分类规则集在准确性与可解释性方面的统计性能分别为灵敏度0.85、特异度0.73、精度0.83、受试者工作特征曲线下面积(AUC值) 0.80以及可解释性值0.96。结论 咳嗽、老舌、颈动脉硬化、脉弦、憋喘且无头晕、胸闷、恶心症状为RH患者发生MACE的核心临床信息分类要素。
Objective To explore the classification rules and characteristics of clinical information in resistant hypertension(RH)patients with major adverse cardiovascular events(MACE).Methods We collected the electronic medical records of 2503 RH patients,among which 1286 cases had MACE.The basic personal information including gender,age,past history,family history,personal history,and history of infectious diseases,traditional Chinese medicine(TCM)four examinations information,and western medicine diagnosis information were collected.The previous study identified 140 indicators closely related to the occurrence of MACE in RH patients.The distribution of the closely related indicators in 2093 patients was analyzed,and the random forest rule extraction method was used to analyze the characteristics that influence the occurrence of MACE in RH patients.Multiple decorrelation decision trees were buit through structured data processing and random forest training,which can generate a wealth of decision rules.The rule extraction method was used to extract,measure,trim,and select rules.The sequential coverage method was used to filter the simplified classification rule set,and the interpretability measurement of the rules was defined.Results The top frequent five closely related indicators of the TCM four examinations information among 2093 patients were wiry pulse(1101,52.6%),greasy coating(947,45.25%),poor sleep(870,41.57%),and pale complexion(784,37.46%).The top five frequent information related to patients’basic information and western medicine diagnosis information were normal heart rate(1765,84.33%),previous hypertension(1601,76.49%),age older than 60 years old(1547,73.91%),normal diastolic blood pressure(1428,68.23%),and grade 3 hypertension(1151,54.99%).The interpretability analysis of the influencing factors showed that were three rules for predicting RH+MACE with error less than 0.05,which were RuleⅠ:cough c(’1’)&old tongue c(’1’)&dizziness c(’0’)&chest tightness c(’0’);RuleⅡ:Nausea c(’0’)&carotid arteriosclerosis c(’1’)&Xuanmai c(’1’);and RuleⅢ:suffocation c(’1’).The classification accuracy of the three rules has reached more than 95%(error less than 0.05),and has a certain clinical interpretability.The model evaluation index results of the rule set:sensitivity(0.85),specificity(0.73),accuracy(0.83),area under the receiver operating characteristic curve(AUC)value(0.80),and interpretability value(0.96).Conclusion Presentation of cough,old tongue,carotid arteriosclerosis,wiry pulse,wheezing,and with no symptoms of dizziness,chest tightness and nausea are taken as the core classification elements for predicting MACE in RH patients.
作者
向兴华
彭叶辉
杨伟
刘剑
刘大胜
纪鑫毓
雷舒杨
谢飞彪
杨纪雯
王丽颖
韩学杰
商洪才
XIANG Xinghua;PENG Yehui;YANG Wei;LIU Jian;LIU Dasheng;JI Xinyu;LEI Shuyang;XIE Feibiao;YANG Jiwen;WANG Liying;HAN Xuejie;SHANG Hongcai(School of Mathematics and Computational Science,Hunan University of Science and Technology,Xiangtan,411201;Institute of Basic researches in Clinical Medicine,China Academy of Chinese Medical Sciences;Xiyuan Hospital,China Academy of Chinese Medical Sciences;Dongzhimen Hospital,Beijing University of Chinese Medicine)
出处
《中医杂志》
CSCD
北大核心
2022年第7期628-634,共7页
Journal of Traditional Chinese Medicine
基金
国家重点研发计划(2017YFC1700400,2017YFC1700406-2)
国家自然科学基金(81774454,61771491)。
关键词
顽固性高血压
主要不良心血管事件
随机森林规则
可解释性研究
resistant hypertension
major adverse cardiovascular events
random forest rule
interpretability studies