摘要
生物质工业分析相比于元素分析或化学分析更便捷且成本更低,用工业分析来预测高位热值具有较高的应用价值。利用机器学习来拟合的过程非常依赖数据本身,即数据的大小以及质量,少数质量差的数据会对拟合效果产生较大的影响,因此有必要探索出适用于现阶段数据量小且存在异常点情况下的线性拟合方法。本文提出了3种异常点的处理方法:方法一是利用四分位距寻找异常点并剔除;方法二是采用中间数据的统计值来对原始数据进行预处理;方法三是利用岭回归替代常规的线性回归增强其抗异常点干扰的能力。分析拟合结果发现,岭回归在现阶段数据量小且存在异常点的情况下具有最好的线性拟合效果。
Since the proximate analysis of biomass is more convenient and costs lower than the ultimate analysis and chemical analysis,using the proximate analysis result to predict high calorific value of the biomass is of great significance.The process of using machine learning to fit dependents strongly on the data itself(the size and quality of the dataset),a small number of poor quality data will have a huge influence on the fitting effect.So,it is necessary to explore a linear fitting method suitable for the case of small amount of data and the existence of outliers at the present stage.On this basis,three methods for handling the abnormal points are put forward:method I uses quartile distance to find the abnormal points and eliminate them,method II preprocesses the original data by using the statistical value of the intermediate data,method II uses ridge regression instead of the conventional linear regression to enhance its ability to resist interference of abnormal points.The results show that,the ridge regression has the best linear fitting effect under the condition of small amount of data and abnormal points at present.
作者
刘苏楠
周劲松
项阳阳
洪东阳
LIU Sunan;ZHOU Jinsong;XIANG Yangyang;HONG Dongyang(College of Energy Engineering, Zhejiang University, Hangzhou 310000, China)
出处
《热力发电》
CAS
北大核心
2018年第12期41-46,共6页
Thermal Power Generation
基金
国家重点基础研究发展计划(973计划)项目(2013CB228100)~~
关键词
生物质
工业分析
元素分析
高位热值
机器学习
线性拟合
异常点
biomiass
proximate analysis
ultimate analysis
high calorific value
machine learning
linear fitting
abnormal point