摘要
基于欠采样的不均衡数据分类算法是一种随机数据优化算法,但它不能最好地反映中医临床原始数据的分布并解决数据的特征冗余问题。提出了基于预测风险的最远病例不均衡装袋算法(PRFS-FPUSAB)。该算法中首先基于欠采样提出了改进的抽样方式尽可能地反映原始数据分布,然后结合集成学习、预测风险标准提高不均衡的分类性能并进行特征选择。在中医临床采集的经络电阻数据上的实验结果表明,该算法改善了曲线下面积并且选择的特征也符合中医学相关理论。
An algorithm based on under-sampling unbalanced data classification is a stochastic data optimization algorithm. However, in traditional Chinese medicine (TCM), it is difficult to best reflect the distribution of original clinical data to solve the problem of feature redundancy in data. Therefore, in this paper, the PRFS-FPUSAB algorithm is proposed. In the algorithm, an improved sampling method is proposed based on under-sampling. The original data distribution is reflected as much as possible; then, the classification is improved by combining integrated learning, prediction risk, and feature selection. The experimental results on meridian resistance data collected from TCM show that the algorithm improves the area under the curve, and the selected characteristics are also in accordance with TCM theory.
出处
《智能系统学报》
CSCD
北大核心
2017年第6期848-856,共9页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(81503680)
中央级公益性科研院所基本科研业务费专项资金项目(ZZ0908032)
全民健康保障信息化工程中医药研究项目(215005)
关键词
中医临床
不均衡数据分类
原始数据分布
特征选择
Chinese medicine clinical
imbalance data classification
initial data distribution
feature selection