摘要
以我国某地区黑色金属加工及冶炼行业的212家中小企业为样本,先使用带行业影响修正的用电量增长率法标注样本,再利用K-means聚类算法和SMOTE过采样技术提取数据集特征和平衡训练集类别,最后采用经MetaCost元代价敏感算法改造后的梯度提升决策树模型进行企业生命周期阶段的识别建模和预测。构建的七项特征具有典型的长尾性质,表现在对模型的预测能力贡献上它们的重要度比较平均一致。经过代价矩阵调参,模型对失衡类别企业样本的最佳查准率和查全率分别为83.3%和88.9%。通过与传统方法结果的横向Kappa一致性检验和纵向实证分析,验证了基于单视角企业用电数据利用机器学习算法模型来识别企业生命周期阶段的可信性和有效性。
Firstly,this paper adopts the power growth rate method with industry impact correction to label samples which are 212 small and medium-sized enterprises with respect to the ferrous metal smelting and rolling industry.Then it uses the K-means clustering and SMOTE oversampling techniques to extract data features and to balance the training set respectively.Finally,it adopts the gradient boosting decision tree model that is modified by MetaCost sensitive algorithm to recognize and predict those enterprises' life cycle phases.The seven constructed features have typical long tail characteristics,which show their importance to the model prediction ability is relatively uniform.By tuning the cost matrix parameter,precision and recall rates for imbalanced testing samples are respectively as high as 83.3% and 88.9%.At last,it confirms the credibility and the validity of machine learning algorithms that determine enterprises' life cycle phases based on single perspective electricity data by means of kappa consistency test and empirical analysis.
作者
刘同新
杨翠红
房勇
张若兴
Liu Tongxin;Yang Cuihong;Fang Yong;Zhang Ruoxing(School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100049,China;Powersmart(Beijing) Science and Technology Co.Ltd,Beijing 100070,China;Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing 100190,China)
出处
《技术经济》
CSSCI
北大核心
2019年第4期107-113,共7页
Journal of Technology Economics
基金
国家自然科学基金项目“基于供给使用表和考虑企业异质性的中国投入产出模型及应用研究”(71673269)
关键词
中小企业
生命周期
机器学习
发展阶段识别
电力数据
small and medium-sized enterprise
life cycle
machine learning
development phase identification
electricity data