摘要
针对传统的偏最小二乘法只考虑单特征的重要性以及特征之间存在冗余和多重共线性等问题,将特征之间的统计相关性引入到传统的偏最小二乘分析中,构造了一种基于特征相关的偏最小二乘模型。首先利用特征相关度对特征进行评估预选出特征组;然后将其放入偏最小二乘模型中进行训练,评估该特征组是否可取。结合前向贪心搜索策略依次评价候选特征,并选中使目标函数最小的候选特征加入到已选特征。分别采用麻杏石甘汤君药止咳、平喘和UCI数据集进行分析处理。实验结果表明,该特征选择方法能较好地寻找较优的特征组。
The traditional partial least squares method only considers the importance of single features and it existes the redundancy and multicollinearity among the features.This paper involved the statistical correlation between features into the traditional partial least squares analysis,and constructed the model of PLS feature selection based on feature correlation.Firstly,this paper pre-selected the feature group by using of the feature relevance,and then put into the partial least squares model for training to assess whether the feature group was desirable.Combining with the greedy search strategy,it evaluated the candidate features one by one,and added the candidate features with the smallest objective function to the selected features.Respectively,using the data of the maxingshigan decoction of the monarch drug to treat the asthma or cough and UCI data sets to analyze.The experimental results show that the feature selection method can find an optimal feature group.
作者
曾青霞
杜建强
朱志鹏
聂斌
余日跃
喻芳
Zeng Qingxia;Du Jianqiang;Zhu Zhipeng;Nie Bin;Yu Riyue;Yu Fang(College of Computer Science,Jiangxi University of Traditional Chinese Medicine,Nanchang 330004,China;School of Pharmacy,Jiangxi University of Traditional Chinese Medicine,Nanchang 330004,China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第4期1036-1038,1054,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(61363042
61562045
61762051)
江西省自然科学基金重大项目(20152ACB20007
20171ACE50021)
江西省重点研发计划资助项目(20171ACE50021)
关键词
中医药信息
偏最小二乘法
特征相关
特征选择
TCM information
partial least squares(PLS)
feature correlation
feature selection