摘要
复杂产品生产数据具有高维度、不平衡的特点,为在复杂产品的生产阶段有效识别关键质量特性,及时进行质量控制,论文提出了一种基于聚类欠采样的改进随机森林算法(Random forest algorithm base on K-Means clustering under sampling,KMUS-RF),利用K-Means算法对多数样本进行聚类,并根据聚类结果进行多次欠采样形成多个平衡数据集,以随机森林为基分类器进行识别,最终根据分类过程中的特征重要性输出关键质量特性集。算例表明,KMUS-RF算法相比现有的多种分类器有良好的整体分类性能,并能显著降低复杂产品分类的第二类错误率,满足产品实际生产需求。
The production data of complex products have the characteristics of high dimension and imbalance. In order to effectively identify the critical-to-quality characteristics in the production stage of complex products and timely control the quality, this paper proposes an improved random forest algorithm base on K-Means clustering under sampling(KMUS-RF). K-Means algorithm is used to cluster the majority of samples, and multiple undersampling is performed according to the clustering results to form multiple balanced data sets. The random forest based classifier is used for recognition, and finally the critical-to-quality characteristics set is output according to the feature importance in the classification process. Numerical examples show that KMUS-RF algorithm has good overall classification performance compared with existing classifiers, and can significantly reduce the type Ⅱ error rate of complex product classification, and meet the actual production needs of products.
作者
柳嘉昊
LIU Jia-hao(School of Management Science and Engineering,Nanjing University of Finance&Economics,Nanjing 210046,China)
出处
《中小企业管理与科技》
2021年第30期134-137,共4页
Management & Technology of SME
基金
江苏省研究生科研创新计划项目“基于数据挖掘的航空复杂装备产品关键质量特性识别研究”(项目编号:KYCX20_1354)。