The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will resu...The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will result in rising outlier values and noise.Therefore,the speed and performance of classification could be greatly affected.Given the above problems,this paper starts with the motivation and mathematical representing of classification,puts forward a new classification method based on the relationship between different classification formulations.Combined with the vector characteristics of the actual problem and the choice of matrix characteristics,we firstly analyze the orderly regression to introduce slack variables to solve the constraint problem of the lone point.Then we introduce the fuzzy factors to solve the problem of the gap between the isolated points on the basis of the support vector machine.We introduce the cost control to solve the problem of sample skew.Finally,based on the bi-boundary support vector machine,a twostep weight setting twin classifier is constructed.This can help to identify multitasks with feature-selected patterns without the need for additional optimizers,which solves the problem of large-scale classification that can’t deal effectively with the very low category distribution gap.展开更多
The training algorithm of classical twin support vector regression (TSVR) can be attributed to the solution of a pair of quadratic programming problems (QPPs) with inequality constraints in the dual space.However,this...The training algorithm of classical twin support vector regression (TSVR) can be attributed to the solution of a pair of quadratic programming problems (QPPs) with inequality constraints in the dual space.However,this solution is affected by time and memory constraints when dealing with large datasets.In this paper,we present a least squares version for TSVR in the primal space,termed primal least squares TSVR (PLSTSVR).By introducing the least squares method,the inequality constraints of TSVR are transformed into equality constraints.Furthermore,we attempt to directly solve the two QPPs with equality constraints in the primal space instead of the dual space;thus,we need only to solve two systems of linear equations instead of two QPPs.Experimental results on artificial and benchmark datasets show that PLSTSVR has comparable accuracy to TSVR but with considerably less computational time.We further investigate its validity in predicting the opening price of stock.展开更多
In order to improve the end-point hit rate of basic oxygen furnace steelmaking,a novel dynamic control model was proposed based on an improved twin support vector regression algorithm.The controlled objects were the e...In order to improve the end-point hit rate of basic oxygen furnace steelmaking,a novel dynamic control model was proposed based on an improved twin support vector regression algorithm.The controlled objects were the end-point carbon content and temperature.The proposed control model was established by using the low carbon steel samples collected from a steel plant,which consists of two prediction models,a preprocess model,two regulation units,a controller and a basic oxygen furnace.The test results of 100 heats show that the prediction models can achieve a double hit rate of 90%within the error bound of 0.005 wt.%C and 15℃.The preprocess model was used to predict an initial end-blow oxygen volume.However,the double hit rate of the carbon con tent and temperature only achieves 65%.Then,the oxygen volume and coolant additi ons were adjusted by the regulation units to improve the hit rate.Finally,the double hit rate after the regulation is reached up to 90%.The results indicate that the proposed dynamic control model is efficient to guide the real production for low carbon steel,and the modeling method is also suitable for the applications of other steel grades.展开更多
STATCOM晶闸管阀组本体温度过高,会导致其失效。因此及时、准确地预测出STATCOM晶闸管阀组本体温度对提高STATCOM运行的可靠性至关重要。本文利用最小二乘双支持向量回归机(least square twin support vector regression,LSTSVR)算法,将...STATCOM晶闸管阀组本体温度过高,会导致其失效。因此及时、准确地预测出STATCOM晶闸管阀组本体温度对提高STATCOM运行的可靠性至关重要。本文利用最小二乘双支持向量回归机(least square twin support vector regression,LSTSVR)算法,将STATCOM进水温度、回水温度、进水流量、IGBT模块散热材料的导热系数、STATCOM输出电压、STATCOM输出电流、晶闸管阀组的集电极电流共7个量作为输入量,构建了STATCOM晶闸管阀组本体温度预测模型。与现场实测数据对比的结果表明,利用LSTSVR模型实现了STATCOM晶闸管阀组本体温度的高精度预测,且模型的预测精度优于最小二乘支持向量回归机(least square support vector regression,LSSVR)模型。应用实例也验证了该方法的准确性和有效性。展开更多
孪生支持向量回归机(Twin Support Vector Regression,TSVR or TWSVR)是一种基于统计学习理论的回归算法,它以结构风险最小化原理为理论基础,通过适当地选择函数子集及该子集中的判别函数,使学习机的实际风险达到最小,保证了在有限训练...孪生支持向量回归机(Twin Support Vector Regression,TSVR or TWSVR)是一种基于统计学习理论的回归算法,它以结构风险最小化原理为理论基础,通过适当地选择函数子集及该子集中的判别函数,使学习机的实际风险达到最小,保证了在有限训练样本上得到的小误差分类器对独立测试集的测试误差仍然较小.孪生支持向量回归机通过将线性不可分样本映射到高维特征空间,使得映射后的样本在该高维特征空间内线性可分,保证了其具有较好的泛化性能.孪生支持向量回归机的算法思想基于孪生支持向量机(Twin Support Vector Machine,TWSVM),几何意义是使所有样本点尽可能地处于两条回归超平面的上(下)不敏感边界之间,最终的回归结果由两个超平面的回归值取平均得到.孪生支持向量回归机需求解两个规模较小的二次规划问题(Quadratic Programming Problems,QPPs)便可得到两条具有较小拟合误差的回归超平面,训练时间和拟合精度都高于传统的支持向量回归机(Support Vector Regression,SVR),且其QPPs的对偶问题存在全局最优解,避免了容易陷入局部最优的问题,故孪生支持向量回归机已成为机器学习的热门领域之一.但孪生支持向量回归机作为机器学习领域的一个较新的理论,其数学模型与算法思想都尚不成熟,在泛化性能、求解速度、矩阵稀疏性、参数选取、对偶问题等方面仍存在进一步改进的空间.本文首先给出了两种孪生支持向量回归机的数学模型与几何意义,然后将孪生支持向量回归机的几个常见的改进策略归纳如下.(1)加权孪生支持向量回归机由于孪生支持向量回归机中每个训练样本受到的惩罚是相同的,但每个样本对超平面的影响不同,尤其是噪声和离群值会使算法性能降低,并且在不同位置的训练样本应给予不同的处罚更为合理,因此考虑在孪生支持向量回归机的每个QPP中引入一个加权系数,给予不同位置的训练样本不同程度的惩罚.(2)拉格朗日孪生支持向量回归机由于孪生支持向量回归机的对偶问题中半正定矩阵的逆矩阵可能不存在,若存在,则对偶问题不是严格凸函数,可能存在多个解,因此考虑使用松弛变量的2范数代替原有的1范数,使对偶问题更简单,易于求解.(3)最小二乘孪生支持向量回归机由于孪生支持向量回归机的求解需要在对偶空间进行,得到的解为近似解,考虑通过最小二乘法将原问题的不等式约束转化为等式约束,使得原问题可以在原空间内求解,在很大程度上降低计算时间,提高泛化性能,且不损失精度.(4)v-孪生支持向量回归机通过引入一组参数v1与v2自动调节ε1与ε2的值以控制训练样本的特定部分对两条回归超平面所能造成的最大误差,从而自适应给定数据的结构,提高孪生支持向量回归机的拟合精度.(5)ε-孪生支持向量回归机在孪生支持向量回归机的原问题中引入正则化项以达到结构风险最小化的目的,使对偶问题转化为稳定的正定二次规划问题,并通过SOR求解对偶问题,加快训练速度.(6)孪生参数不敏感支持向量回归机克服参数的选取对孪生支持向量回归机超平面构造的影响,使算法非常适合于存在异方差噪声数据的数据集,训练速度和泛化性能也有提升.本文同时对以上算法的数学模型、改进算法及应用进行了系统地分析与总结,给出了以上算法在9个UCI基准数据集上的回归性能与计算时间,并在模型结构层面逐一分析每个算法的表现与耗时的根本原因.对于其他不便于归类的孪生支持向量回归机改进算法及应用,本文也对其作逐一总结.整体来看,最小二乘孪生支持向量回归机在性能和计算时间方面表现最佳,拉格朗日孪生支持向量回归机、v-孪生支持向量回归机的性能并列次优且计算时间接近,加权孪生支持向量回归机、ε-孪生支持向量回归机和孪生参数不敏感支持向量回归机的性能不理想,但计算时间接近.本文旨在使读者对孪生支持向量回归机的不同改进算法之间的异同点与优缺点产生更深刻的理解与认识,从而将更多优秀的改进策略应用于孪生支持向量回归机,最终为进一步提高孪生支持向量回归机的性能以及扩展孪生支持向量回归机的应用范围提供较为清晰的思路.展开更多
针对目前光滑孪生支持向量回归机(smooth twin support vector regression,STSVR)中采用的Sigmoid光滑函数逼近精度不高,从而导致算法泛化能力不够理想的问题,引入一种具有更强逼近能力的光滑(chen-harker-kanzow-smale,CHKS)函数,采用C...针对目前光滑孪生支持向量回归机(smooth twin support vector regression,STSVR)中采用的Sigmoid光滑函数逼近精度不高,从而导致算法泛化能力不够理想的问题,引入一种具有更强逼近能力的光滑(chen-harker-kanzow-smale,CHKS)函数,采用CHKS函数逼近孪生支持向量回归机的不可微项,并用Newton-Armijo算法求解相应的模型,提出了光滑CHKS孪生支持向量回归机(smooth CHKS twin support vector regression,SCTSVR).不仅从理论上证明了SCTSVR具有严格凸,能满足任意阶光滑和全局收敛的性能,而且在人工数据集和UCI数据集上的实验表明了SCTSVR比STSVR具有更好的回归性能.展开更多
基金Hebei Province Key Research and Development Project(No.20313701D)Hebei Province Key Research and Development Project(No.19210404D)+13 种基金Mobile computing and universal equipment for the Beijing Key Laboratory Open Project,The National Social Science Fund of China(17AJL014)Beijing University of Posts and Telecommunications Construction of World-Class Disciplines and Characteristic Development Guidance Special Fund “Cultural Inheritance and Innovation”Project(No.505019221)National Natural Science Foundation of China(No.U1536112)National Natural Science Foundation of China(No.81673697)National Natural Science Foundation of China(61872046)The National Social Science Fund Key Project of China(No.17AJL014)“Blue Fire Project”(Huizhou)University of Technology Joint Innovation Project(CXZJHZ201729)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201902218004)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201902024006)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201901197007)Industry-University Cooperation Collaborative Education Project of the Ministry of Education(No.201901199005)The Ministry of Education Industry-University Cooperation Collaborative Education Project(No.201901197001)Shijiazhuang science and technology plan project(236240267A)Hebei Province key research and development plan project(20312701D)。
文摘The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will result in rising outlier values and noise.Therefore,the speed and performance of classification could be greatly affected.Given the above problems,this paper starts with the motivation and mathematical representing of classification,puts forward a new classification method based on the relationship between different classification formulations.Combined with the vector characteristics of the actual problem and the choice of matrix characteristics,we firstly analyze the orderly regression to introduce slack variables to solve the constraint problem of the lone point.Then we introduce the fuzzy factors to solve the problem of the gap between the isolated points on the basis of the support vector machine.We introduce the cost control to solve the problem of sample skew.Finally,based on the bi-boundary support vector machine,a twostep weight setting twin classifier is constructed.This can help to identify multitasks with feature-selected patterns without the need for additional optimizers,which solves the problem of large-scale classification that can’t deal effectively with the very low category distribution gap.
基金supported by the National Basic Research Program (973) of China(No.2013CB329502)the National Natural Science Foundation of China(No.61379101)the Fundamental Research Funds for the Central Universities,China(No.2012LWB39)
文摘The training algorithm of classical twin support vector regression (TSVR) can be attributed to the solution of a pair of quadratic programming problems (QPPs) with inequality constraints in the dual space.However,this solution is affected by time and memory constraints when dealing with large datasets.In this paper,we present a least squares version for TSVR in the primal space,termed primal least squares TSVR (PLSTSVR).By introducing the least squares method,the inequality constraints of TSVR are transformed into equality constraints.Furthermore,we attempt to directly solve the two QPPs with equality constraints in the primal space instead of the dual space;thus,we need only to solve two systems of linear equations instead of two QPPs.Experimental results on artificial and benchmark datasets show that PLSTSVR has comparable accuracy to TSVR but with considerably less computational time.We further investigate its validity in predicting the opening price of stock.
基金This work was supported by Liaoning Province PhD Start-up Fund(No.201601291)Liaoning Province Ministry of Education Scientific Study Project(No.2O17LNQN11).
文摘In order to improve the end-point hit rate of basic oxygen furnace steelmaking,a novel dynamic control model was proposed based on an improved twin support vector regression algorithm.The controlled objects were the end-point carbon content and temperature.The proposed control model was established by using the low carbon steel samples collected from a steel plant,which consists of two prediction models,a preprocess model,two regulation units,a controller and a basic oxygen furnace.The test results of 100 heats show that the prediction models can achieve a double hit rate of 90%within the error bound of 0.005 wt.%C and 15℃.The preprocess model was used to predict an initial end-blow oxygen volume.However,the double hit rate of the carbon con tent and temperature only achieves 65%.Then,the oxygen volume and coolant additi ons were adjusted by the regulation units to improve the hit rate.Finally,the double hit rate after the regulation is reached up to 90%.The results indicate that the proposed dynamic control model is efficient to guide the real production for low carbon steel,and the modeling method is also suitable for the applications of other steel grades.
文摘STATCOM晶闸管阀组本体温度过高,会导致其失效。因此及时、准确地预测出STATCOM晶闸管阀组本体温度对提高STATCOM运行的可靠性至关重要。本文利用最小二乘双支持向量回归机(least square twin support vector regression,LSTSVR)算法,将STATCOM进水温度、回水温度、进水流量、IGBT模块散热材料的导热系数、STATCOM输出电压、STATCOM输出电流、晶闸管阀组的集电极电流共7个量作为输入量,构建了STATCOM晶闸管阀组本体温度预测模型。与现场实测数据对比的结果表明,利用LSTSVR模型实现了STATCOM晶闸管阀组本体温度的高精度预测,且模型的预测精度优于最小二乘支持向量回归机(least square support vector regression,LSSVR)模型。应用实例也验证了该方法的准确性和有效性。
文摘孪生支持向量回归机(Twin Support Vector Regression,TSVR or TWSVR)是一种基于统计学习理论的回归算法,它以结构风险最小化原理为理论基础,通过适当地选择函数子集及该子集中的判别函数,使学习机的实际风险达到最小,保证了在有限训练样本上得到的小误差分类器对独立测试集的测试误差仍然较小.孪生支持向量回归机通过将线性不可分样本映射到高维特征空间,使得映射后的样本在该高维特征空间内线性可分,保证了其具有较好的泛化性能.孪生支持向量回归机的算法思想基于孪生支持向量机(Twin Support Vector Machine,TWSVM),几何意义是使所有样本点尽可能地处于两条回归超平面的上(下)不敏感边界之间,最终的回归结果由两个超平面的回归值取平均得到.孪生支持向量回归机需求解两个规模较小的二次规划问题(Quadratic Programming Problems,QPPs)便可得到两条具有较小拟合误差的回归超平面,训练时间和拟合精度都高于传统的支持向量回归机(Support Vector Regression,SVR),且其QPPs的对偶问题存在全局最优解,避免了容易陷入局部最优的问题,故孪生支持向量回归机已成为机器学习的热门领域之一.但孪生支持向量回归机作为机器学习领域的一个较新的理论,其数学模型与算法思想都尚不成熟,在泛化性能、求解速度、矩阵稀疏性、参数选取、对偶问题等方面仍存在进一步改进的空间.本文首先给出了两种孪生支持向量回归机的数学模型与几何意义,然后将孪生支持向量回归机的几个常见的改进策略归纳如下.(1)加权孪生支持向量回归机由于孪生支持向量回归机中每个训练样本受到的惩罚是相同的,但每个样本对超平面的影响不同,尤其是噪声和离群值会使算法性能降低,并且在不同位置的训练样本应给予不同的处罚更为合理,因此考虑在孪生支持向量回归机的每个QPP中引入一个加权系数,给予不同位置的训练样本不同程度的惩罚.(2)拉格朗日孪生支持向量回归机由于孪生支持向量回归机的对偶问题中半正定矩阵的逆矩阵可能不存在,若存在,则对偶问题不是严格凸函数,可能存在多个解,因此考虑使用松弛变量的2范数代替原有的1范数,使对偶问题更简单,易于求解.(3)最小二乘孪生支持向量回归机由于孪生支持向量回归机的求解需要在对偶空间进行,得到的解为近似解,考虑通过最小二乘法将原问题的不等式约束转化为等式约束,使得原问题可以在原空间内求解,在很大程度上降低计算时间,提高泛化性能,且不损失精度.(4)v-孪生支持向量回归机通过引入一组参数v1与v2自动调节ε1与ε2的值以控制训练样本的特定部分对两条回归超平面所能造成的最大误差,从而自适应给定数据的结构,提高孪生支持向量回归机的拟合精度.(5)ε-孪生支持向量回归机在孪生支持向量回归机的原问题中引入正则化项以达到结构风险最小化的目的,使对偶问题转化为稳定的正定二次规划问题,并通过SOR求解对偶问题,加快训练速度.(6)孪生参数不敏感支持向量回归机克服参数的选取对孪生支持向量回归机超平面构造的影响,使算法非常适合于存在异方差噪声数据的数据集,训练速度和泛化性能也有提升.本文同时对以上算法的数学模型、改进算法及应用进行了系统地分析与总结,给出了以上算法在9个UCI基准数据集上的回归性能与计算时间,并在模型结构层面逐一分析每个算法的表现与耗时的根本原因.对于其他不便于归类的孪生支持向量回归机改进算法及应用,本文也对其作逐一总结.整体来看,最小二乘孪生支持向量回归机在性能和计算时间方面表现最佳,拉格朗日孪生支持向量回归机、v-孪生支持向量回归机的性能并列次优且计算时间接近,加权孪生支持向量回归机、ε-孪生支持向量回归机和孪生参数不敏感支持向量回归机的性能不理想,但计算时间接近.本文旨在使读者对孪生支持向量回归机的不同改进算法之间的异同点与优缺点产生更深刻的理解与认识,从而将更多优秀的改进策略应用于孪生支持向量回归机,最终为进一步提高孪生支持向量回归机的性能以及扩展孪生支持向量回归机的应用范围提供较为清晰的思路.
文摘针对目前光滑孪生支持向量回归机(smooth twin support vector regression,STSVR)中采用的Sigmoid光滑函数逼近精度不高,从而导致算法泛化能力不够理想的问题,引入一种具有更强逼近能力的光滑(chen-harker-kanzow-smale,CHKS)函数,采用CHKS函数逼近孪生支持向量回归机的不可微项,并用Newton-Armijo算法求解相应的模型,提出了光滑CHKS孪生支持向量回归机(smooth CHKS twin support vector regression,SCTSVR).不仅从理论上证明了SCTSVR具有严格凸,能满足任意阶光滑和全局收敛的性能,而且在人工数据集和UCI数据集上的实验表明了SCTSVR比STSVR具有更好的回归性能.