针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取...针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取CART、CHAID、SVM、RF作为对比算法,以准确率、加权查准率、加权查全率、加权F值作为分类模型评价指标,以均方根误差作为回归模型评价指标,采用10个分类数据集和7个回归数据集进行验证。实验结果表明CHAID-RF可行有效。展开更多
The complex composition of herbal metabolites necessitates the development of powerful analytical techniques aimed to identify the bioactive components.The seeds of Descurainia sophia(SDS)are utilized in China as a co...The complex composition of herbal metabolites necessitates the development of powerful analytical techniques aimed to identify the bioactive components.The seeds of Descurainia sophia(SDS)are utilized in China as a cough and asthma relieving agent.Herein,a dimension-enhanced integral approach,by combining ultra-high performance liquid chromatography/ion mobility-quadrupole time-of-flight mass spectrometry(UHPLC/IMQTOF-MS)and intelligent peak annotation,was developed to rapidly characterize the multicomponents from SDS.Good chromatographic separation was achieved within 38 min on a UPLC CSH C18(2.1×100 mm,1.7μm)column which was eluted by 0.1%formic acid in water(water phase)and acetonitrile(organic phase).Collision-induced dissociation-MS^(2)data were acquired by the data-independent high-definition MS^(E)(HDMS^(E))in both the negative and positive electrospray ionization modes.A major components knockout strategy was applied to improve the characterization of those minor ingredients by enhancing the injection volume.Moreover,a self-built chemistry library was established,which could be matched by the UNIFI software enabling automatic peak annotation of the obtained HDMS^(E)data.As a result of applying the intelligent peak annotation workflows and further confirmation process,a total of 53 compounds were identified or tentatively characterized from the SDS,including 29 flavonoids,one uridine derivative,four glucosides,one lignin,one phenolic compound,and 17 others.Notably,four-dimensional information related to the structure(e.g.,retention time,collision cross section,MS^(1)and MS^(2)data)was obtained for each component by the developed integral approach,and the results would greatly benefit the quality control of SDS.展开更多
文摘针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取CART、CHAID、SVM、RF作为对比算法,以准确率、加权查准率、加权查全率、加权F值作为分类模型评价指标,以均方根误差作为回归模型评价指标,采用10个分类数据集和7个回归数据集进行验证。实验结果表明CHAID-RF可行有效。
基金This work was financially supported by the National Key Research and Development Program of China(Grant No.2018YFC1704500)Tianjin Committee of Science and Technology of China(Grant No.21ZYJDJC00080)National Natural Science Foundation of China(Grant No.81872996).
文摘The complex composition of herbal metabolites necessitates the development of powerful analytical techniques aimed to identify the bioactive components.The seeds of Descurainia sophia(SDS)are utilized in China as a cough and asthma relieving agent.Herein,a dimension-enhanced integral approach,by combining ultra-high performance liquid chromatography/ion mobility-quadrupole time-of-flight mass spectrometry(UHPLC/IMQTOF-MS)and intelligent peak annotation,was developed to rapidly characterize the multicomponents from SDS.Good chromatographic separation was achieved within 38 min on a UPLC CSH C18(2.1×100 mm,1.7μm)column which was eluted by 0.1%formic acid in water(water phase)and acetonitrile(organic phase).Collision-induced dissociation-MS^(2)data were acquired by the data-independent high-definition MS^(E)(HDMS^(E))in both the negative and positive electrospray ionization modes.A major components knockout strategy was applied to improve the characterization of those minor ingredients by enhancing the injection volume.Moreover,a self-built chemistry library was established,which could be matched by the UNIFI software enabling automatic peak annotation of the obtained HDMS^(E)data.As a result of applying the intelligent peak annotation workflows and further confirmation process,a total of 53 compounds were identified or tentatively characterized from the SDS,including 29 flavonoids,one uridine derivative,four glucosides,one lignin,one phenolic compound,and 17 others.Notably,four-dimensional information related to the structure(e.g.,retention time,collision cross section,MS^(1)and MS^(2)data)was obtained for each component by the developed integral approach,and the results would greatly benefit the quality control of SDS.