摘要
在特征选择问题中,近似马尔科夫毯常用于判断冗余特征,但所得到的冗余特征并不完全相同,因此,在直接使用近似马尔科夫毯删除冗余特征时,存在可能导致信息丢失的情况,影响模型精度。为此,提出一种用于中药代谢组学高维小样本数据的融合Lasso的近似马尔科夫毯特征选择方法。方法分为两个阶段,第一阶段,通过最大信息系数对特征的相关度分析过滤无关特征;第二阶段,采用近似马尔科夫毯构建相似特征组,使用Lasso评估相似特征组中特征影响力,迭代去除冗余特征。通过实验对比表明,该算法可以在一定程度上减少有用信息丢失,去除无关特征和冗余特征,提高模型精度和稳定性。
In feature selection,approximate Markov blankets are often used to judge redundant features,but the redun-dant features obtained are not identical.Therefore,when using approximate Markov blankets directly to delete redundant features,there may be situations that may lead to information loss and affect model accuracy.Therefore,an approximate Markov blanket feature selection method based on Lasso fusion for high-dimensional small sample data of traditional Chinese medicine metabonomics is proposed.The method is divided into two stages.In the first stage,irrelevant features are filtered by analyzing the correlation of features with the maximum information coefficient.In the second stage,approximate Markov blankets are used to construct similar feature groups,Lasso is used to evaluate the influence of features in similar feature groups,and redundant features are removed iteratively.The experimental results show that the algorithm can reduce the loss of useful information,remove irrelevant features and redundant features,and improve the accuracy and stability of the model.
作者
刘明
杜建强
李郅琴
罗计根
聂斌
张梦婷
LIU Ming;DU Jianqiang;LI Zhiqin;LUO Jigen;NIE Bin;ZHANG Mengting(School of Computer,Jiangxi University of Chinese Medicine,Nanchang 330004,China;Informatization Office,Jiangxi Normal University,Nanchang 330022,China)
出处
《计算机工程与应用》
CSCD
北大核心
2024年第8期121-130,共10页
Computer Engineering and Applications
基金
国家自然科学基金(62141202,82160955,82260988)
国家重点研发计划项目(2019YFC1712301)
江西省自然科学基金面上项目(20202BAB202019)
江西省教育厅科学技术研究项目(GJJ190683)
江西中医药大学校级科技创新团队发展计划(CXTD22015)。
关键词
近似马尔科夫毯
Lasso
特征选择
高维小样本
中医药信息
approximate Markov blanke
Lasso
feature selection
high dimensional small sample
traditional Chinese medicine(TCM)information