A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning. Through introducing multiple kernel learning into the SVM incremental learning, large scale data set l...A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning. Through introducing multiple kernel learning into the SVM incremental learning, large scale data set learning problem can be solved effectively. Furthermore, different punishments are adopted in allusion to the training subset and the acquired support vectors, which may help to improve the performance of SVM. Simulation results indicate that the proposed algorithm can not only solve the model selection problem in SVM incremental learning, but also improve the classification or prediction precision.展开更多
As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorit...As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorithm for LS-SVRM are that the training speed is slow, and the generalization performance is not satis- factory, especially for large scale problems. Hence an improved algorithm is proposed. In order to accelerate the training speed, the pruned data point and fast leave-one-out error are employed to validate the temporary model obtained after decremental learning. The novel objective function in the termination condition which in- volves the whole constraints generated by all training data points and three pruning strategies are employed to improve the generali- zation performance. The effectiveness of the proposed algorithm is tested on six benchmark datasets. The sparse LS-SVRM model has a faster training speed and better generalization performance.展开更多
According to the classic Karush-Kuhn-Tucker(KKT)theorem,at every step of incremental support vector machine(SVM)learning,the newly adding sample which violates the KKT conditions will be a new support vector(SV)and mi...According to the classic Karush-Kuhn-Tucker(KKT)theorem,at every step of incremental support vector machine(SVM)learning,the newly adding sample which violates the KKT conditions will be a new support vector(SV)and migrate the old samples between SV set and non-support vector(NSV)set,and at the same time the learning model should be updated based on the SVs.However,it is not exactly clear at this moment that which of the old samples would change between SVs and NSVs.Additionally,the learning model will be unnecessarily updated,which will not greatly increase its accuracy but decrease the training speed.Therefore,how to choose the new SVs from old sets during the incremental stages and when to process incremental steps will greatly influence the accuracy and efficiency of incremental SVM learning.In this work,a new algorithm is proposed to select candidate SVs and use the wrongly predicted sample to trigger the incremental processing simultaneously.Experimental results show that the proposed algorithm can achieve good performance with high efficiency,high speed and good accuracy.展开更多
Indoor air quality becomes increasingly important,partly because the COVID-19 pandemic increases the time people spend indoors.Research into the prediction of indoor volatile organic compounds(VOCs)is traditionally co...Indoor air quality becomes increasingly important,partly because the COVID-19 pandemic increases the time people spend indoors.Research into the prediction of indoor volatile organic compounds(VOCs)is traditionally confined to building materials and furniture.Relatively little research focuses on estimation of human-related VOCs,which have been shown to contribute significantly to indoor air quality,especially in densely-occupied environments.This study applies a machine learning approach to accurately estimate the human-related VOC emissions in a university classroom.The time-resolved concentrations of two typical human-related(ozone-related)VOCs in the classroom over a five-day period were analyzed,i.e.,6-methyl-5-hepten-2-one(6-MHO),4-oxopentanal(4-OPA).By comparing the results for 6-MHO concentration predicted via five machine learning approaches including the random forest regression(RFR),adaptive boosting(Adaboost),gradient boosting regression tree(GBRT),extreme gradient boosting(XGboost),and least squares support vector machine(LSSVM),we find that the LSSVM approach achieves the best performance,by using multi-feature parameters(number of occupants,ozone concentration,temperature,relative humidity)as the input.The LSSVM approach is then used to predict the 4-OPA concentration,with mean absolute percentage error(MAPE)less than 5%,indicating high accuracy.By combining the LSSVM with a kernel density estimation(KDE)method,we further establish an interval prediction model,which can provide uncertainty information and viable option for decision-makers.The machine learning approach in this study can easily incorporate the impact of various factors on VOC emission behaviors,making it especially suitable for concentration prediction and exposure assessment in realistic indoor settings.展开更多
Dry rubber content(DRC)is an important factor to be considered in evaluating the quality of cup lump rubber.The DRC analysis requires prolonged laboratory validation.To develop fast and effective DRC determination met...Dry rubber content(DRC)is an important factor to be considered in evaluating the quality of cup lump rubber.The DRC analysis requires prolonged laboratory validation.To develop fast and effective DRC determination methods,this study proposed methods to evaluate the DRC of cup lump rubber using different spectroscopic measurement approaches.This involved a complete fundamental analysis leading to an efficient measurement method based on either point-based measurement using NIR reflectance spectrometer or area-based measurement using hyperspectral imaging.A dataset was prepared that 120 samples were randomly divided into a calibration set of 90 samples and a validation set of 30 samples.To obtain an average spectrum to represent a cup lump rubber sample,the spectral data were collected by locating and scanning for point-based and area-based measurement,respectively.The spectral data were calibrated using partial least squares regression(PLSR)and the least-squares support vector machine(LS-SVM)methods against the reference values.The experiments showed that the area-based measurement approach with both algorithms performed outstandingly in predicting the DRC of cup lump rubber and was clearly better than the point-based measurement approach.The best predictions of PLSR represented by the coefficient of determination(R2),the root mean square error of prediction(RMSEP)and the residual predictive deviation(RPD)were 0.99,0.72%and 15.17,while the best prediction of LS-SVM were 0.99,0.64%and 16.83,respectively.In summary,the area-based measurement based on the LS-SVM prediction model provided a highly accurate estimate of the DRC of cup lump rubber.展开更多
针对直接移除缺失数据的样本可能会导致因样本数量规模的减少从而降低了分类性能的问题,本文基于同时处理缺失数据与构建模式分类模型的策略,提出使用特权信息学习(learning using privileged information,LUPI)的特权最小二乘支持向量...针对直接移除缺失数据的样本可能会导致因样本数量规模的减少从而降低了分类性能的问题,本文基于同时处理缺失数据与构建模式分类模型的策略,提出使用特权信息学习(learning using privileged information,LUPI)的特权最小二乘支持向量机(privileged least squares support vector machine,P-LSSVM),从而达到既能改进其分类性能,又能在保证无偏的情况下确定缺失特征的重要性。本文的基本思想是将完整数据的训练作为特权信息,以此来引导面向整个不完全数据的最小二乘支持向量机(least squares support vector machine,LSSVM)的学习,通过可加性核表达每个特征(含缺失特征)的重要性,推导完整数据的训练的特权信息,并以此构建PLSSVM,运用所提出的留一交叉验证方法完成无偏的缺失特征重要性识别。实验结果表明,本文提出的方法不但在平均测试精度上优于对比算法,还能同时确定缺失特征的重要性。展开更多
基金supported by the National Natural Science Key Foundation of China(69974021)
文摘A new incremental support vector machine (SVM) algorithm is proposed which is based on multiple kernel learning. Through introducing multiple kernel learning into the SVM incremental learning, large scale data set learning problem can be solved effectively. Furthermore, different punishments are adopted in allusion to the training subset and the acquired support vectors, which may help to improve the performance of SVM. Simulation results indicate that the proposed algorithm can not only solve the model selection problem in SVM incremental learning, but also improve the classification or prediction precision.
基金supported by the National Natural Science Foundation of China (61074127)
文摘As the solutions of the least squares support vector regression machine (LS-SVRM) are not sparse, it leads to slow prediction speed and limits its applications. The defects of the ex- isting adaptive pruning algorithm for LS-SVRM are that the training speed is slow, and the generalization performance is not satis- factory, especially for large scale problems. Hence an improved algorithm is proposed. In order to accelerate the training speed, the pruned data point and fast leave-one-out error are employed to validate the temporary model obtained after decremental learning. The novel objective function in the termination condition which in- volves the whole constraints generated by all training data points and three pruning strategies are employed to improve the generali- zation performance. The effectiveness of the proposed algorithm is tested on six benchmark datasets. The sparse LS-SVRM model has a faster training speed and better generalization performance.
基金supported by the National Natural Science Foundation of China(Nos.U1509207 and 61325019)
文摘According to the classic Karush-Kuhn-Tucker(KKT)theorem,at every step of incremental support vector machine(SVM)learning,the newly adding sample which violates the KKT conditions will be a new support vector(SV)and migrate the old samples between SV set and non-support vector(NSV)set,and at the same time the learning model should be updated based on the SVs.However,it is not exactly clear at this moment that which of the old samples would change between SVs and NSVs.Additionally,the learning model will be unnecessarily updated,which will not greatly increase its accuracy but decrease the training speed.Therefore,how to choose the new SVs from old sets during the incremental stages and when to process incremental steps will greatly influence the accuracy and efficiency of incremental SVM learning.In this work,a new algorithm is proposed to select candidate SVs and use the wrongly predicted sample to trigger the incremental processing simultaneously.Experimental results show that the proposed algorithm can achieve good performance with high efficiency,high speed and good accuracy.
基金supported by the National Natural Science Foundation of China (No.52178062)the Alfred P.Sloan Foundation (No.G-2016-7050)the Opening Fund of State Key Laboratory of Green Building in Western China (LSKF202311).
文摘Indoor air quality becomes increasingly important,partly because the COVID-19 pandemic increases the time people spend indoors.Research into the prediction of indoor volatile organic compounds(VOCs)is traditionally confined to building materials and furniture.Relatively little research focuses on estimation of human-related VOCs,which have been shown to contribute significantly to indoor air quality,especially in densely-occupied environments.This study applies a machine learning approach to accurately estimate the human-related VOC emissions in a university classroom.The time-resolved concentrations of two typical human-related(ozone-related)VOCs in the classroom over a five-day period were analyzed,i.e.,6-methyl-5-hepten-2-one(6-MHO),4-oxopentanal(4-OPA).By comparing the results for 6-MHO concentration predicted via five machine learning approaches including the random forest regression(RFR),adaptive boosting(Adaboost),gradient boosting regression tree(GBRT),extreme gradient boosting(XGboost),and least squares support vector machine(LSSVM),we find that the LSSVM approach achieves the best performance,by using multi-feature parameters(number of occupants,ozone concentration,temperature,relative humidity)as the input.The LSSVM approach is then used to predict the 4-OPA concentration,with mean absolute percentage error(MAPE)less than 5%,indicating high accuracy.By combining the LSSVM with a kernel density estimation(KDE)method,we further establish an interval prediction model,which can provide uncertainty information and viable option for decision-makers.The machine learning approach in this study can easily incorporate the impact of various factors on VOC emission behaviors,making it especially suitable for concentration prediction and exposure assessment in realistic indoor settings.
基金The authors acknowledge the financial support and a research grant provided by the Thailand Research Fund (TRF) and the Faculty of Engineering at Kamphaeng Saen, Kasetsart University, Thailand.
文摘Dry rubber content(DRC)is an important factor to be considered in evaluating the quality of cup lump rubber.The DRC analysis requires prolonged laboratory validation.To develop fast and effective DRC determination methods,this study proposed methods to evaluate the DRC of cup lump rubber using different spectroscopic measurement approaches.This involved a complete fundamental analysis leading to an efficient measurement method based on either point-based measurement using NIR reflectance spectrometer or area-based measurement using hyperspectral imaging.A dataset was prepared that 120 samples were randomly divided into a calibration set of 90 samples and a validation set of 30 samples.To obtain an average spectrum to represent a cup lump rubber sample,the spectral data were collected by locating and scanning for point-based and area-based measurement,respectively.The spectral data were calibrated using partial least squares regression(PLSR)and the least-squares support vector machine(LS-SVM)methods against the reference values.The experiments showed that the area-based measurement approach with both algorithms performed outstandingly in predicting the DRC of cup lump rubber and was clearly better than the point-based measurement approach.The best predictions of PLSR represented by the coefficient of determination(R2),the root mean square error of prediction(RMSEP)and the residual predictive deviation(RPD)were 0.99,0.72%and 15.17,while the best prediction of LS-SVM were 0.99,0.64%and 16.83,respectively.In summary,the area-based measurement based on the LS-SVM prediction model provided a highly accurate estimate of the DRC of cup lump rubber.
文摘针对直接移除缺失数据的样本可能会导致因样本数量规模的减少从而降低了分类性能的问题,本文基于同时处理缺失数据与构建模式分类模型的策略,提出使用特权信息学习(learning using privileged information,LUPI)的特权最小二乘支持向量机(privileged least squares support vector machine,P-LSSVM),从而达到既能改进其分类性能,又能在保证无偏的情况下确定缺失特征的重要性。本文的基本思想是将完整数据的训练作为特权信息,以此来引导面向整个不完全数据的最小二乘支持向量机(least squares support vector machine,LSSVM)的学习,通过可加性核表达每个特征(含缺失特征)的重要性,推导完整数据的训练的特权信息,并以此构建PLSSVM,运用所提出的留一交叉验证方法完成无偏的缺失特征重要性识别。实验结果表明,本文提出的方法不但在平均测试精度上优于对比算法,还能同时确定缺失特征的重要性。