半监督分类研究的主要内容是,如何有效地利用大量的无类别标签的数据对分类问题所具有的有用信息。该文提出了一种基于流形判别分析的半监督支持向量机(Semi-Supervised Support Vector Machine Based on Manifold-based Discriminant A...半监督分类研究的主要内容是,如何有效地利用大量的无类别标签的数据对分类问题所具有的有用信息。该文提出了一种基于流形判别分析的半监督支持向量机(Semi-Supervised Support Vector Machine Based on Manifold-based Discriminant Analysis,简称MDASSVM)。通过定义基于流形的类内离散度和类间离散度,充分利用流形判别分析的性质,进一步改进半监督支持向量机,在分类决策时同时考虑样本的边界信息、分布特征以及局部流形结构,该方法不仅继承了传统降维方法的优势,而且进一步提高降维效率。人造数据集和UCI中的部分实际数据集上的实验结果表明,与现有算法相比,数据集通过该算法降维后,能使半监督支持向量机有更高的分类精度。展开更多
This study aimed to address the challenge of accurately and reliably detecting tomatoes in dense planting environments,a critical prerequisite for the automation implementation of robotic harvesting.However,the heavy ...This study aimed to address the challenge of accurately and reliably detecting tomatoes in dense planting environments,a critical prerequisite for the automation implementation of robotic harvesting.However,the heavy reliance on extensive manually annotated datasets for training deep learning models still poses significant limitations to their application in real-world agricultural production environments.To overcome these limitations,we employed domain adaptive learning approach combined with the YOLOv5 model to develop a novel tomato detection model called as TDA-YOLO(tomato detection domain adaptation).We designated the normal illumination scenes in dense planting environments as the source domain and utilized various other illumination scenes as the target domain.To construct bridge mechanism between source and target domains,neural preset for color style transfer is introduced to generate a pseudo-dataset,which served to deal with domain discrepancy.Furthermore,this study combines the semi-supervised learning method to enable the model to extract domain-invariant features more fully,and uses knowledge distillation to improve the model's ability to adapt to the target domain.Additionally,for purpose of promoting inference speed and low computational demand,the lightweight FasterNet network was integrated into the YOLOv5's C3 module,creating a modified C3_Faster module.The experimental results demonstrated that the proposed TDA-YOLO model significantly outperformed original YOLOv5s model,achieving a mAP(mean average precision)of 96.80%for tomato detection across diverse scenarios in dense planting environments,increasing by 7.19 percentage points;Compared with the latest YOLOv8 and YOLOv9,it is also 2.17 and 1.19 percentage points higher,respectively.The model's average detection time per image was an impressive 15 milliseconds,with a FLOPs(floating point operations per second)count of 13.8 G.After acceleration processing,the detection accuracy of the TDA-YOLO model on the Jetson Xavier NX development board is 90.95%,the mAP value is 91.35%,and the detection time of each image is 21 ms,which can still meet the requirements of real-time detection of tomatoes in dense planting environment.The experimental results show that the proposed TDA-YOLO model can accurately and quickly detect tomatoes in dense planting environment,and at the same time avoid the use of a large number of annotated data,which provides technical support for the development of automatic harvesting systems for tomatoes and other fruits.展开更多
At the early stages of deep-water oil exploration and development, fewer and further apart wells are drilled than in onshore oilfields. Supervised least squares support vector machine algorithms are used to predict th...At the early stages of deep-water oil exploration and development, fewer and further apart wells are drilled than in onshore oilfields. Supervised least squares support vector machine algorithms are used to predict the reservoir parameters but the prediction accuracy is low. We combined the least squares support vector machine (LSSVM) algorithm with semi-supervised learning and established a semi-supervised regression model, which we call the semi-supervised least squares support vector machine (SLSSVM) model. The iterative matrix inversion is also introduced to improve the training ability and training time of the model. We use the UCI data to test the generalization of a semi-supervised and a supervised LSSVM models. The test results suggest that the generalization performance of the LSSVM model greatly improves and with decreasing training samples the generalization performance is better. Moreover, for small-sample models, the SLSSVM method has higher precision than the semi-supervised K-nearest neighbor (SKNN) method. The new semi- supervised LSSVM algorithm was used to predict the distribution of porosity and sandstone in the Jingzhou study area.展开更多
Direct online measurement on product quality of industrial processes is difficult to be realized,which leads to a large number of unlabeled samples in modeling data.Therefore,it needs to employ semi-supervised learnin...Direct online measurement on product quality of industrial processes is difficult to be realized,which leads to a large number of unlabeled samples in modeling data.Therefore,it needs to employ semi-supervised learning(SSL)method to establish the soft sensor model of product quality.Considering the slow time-varying characteristic of industrial processes,the model parameters should be updated smoothly.According to this characteristic,this paper proposes an online adaptive semi-supervised learning algorithm based on random vector functional link network(RVFLN),denoted as OAS-RVFLN.By introducing a L2-fusion term that can be seen a weight deviation constraint,the proposed algorithm unifies the offline and online learning,and achieves smoothness of model parameter update.Empirical evaluations both on benchmark testing functions and datasets reveal that the proposed OAS-RVFLN can outperform the conventional methods in learning speed and accuracy.Finally,the OAS-RVFLN is applied to the coal dense medium separation process in coal industry to estimate the ash content of coal product,which further verifies its effectiveness and potential of industrial application.展开更多
This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consi...This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consisted of two support vector regressors (SVRs). Nonlinear relationship between water quality variables and SPOT 5 spectrum was described by the two SVRs, and semi-supervised co-training algorithm for the SVRs was es-tablished. The model was used for retrieving concentrations of four representative pollution indicators―permangan- ate index (CODmn), ammonia nitrogen (NH3-N), chemical oxygen demand (COD) and dissolved oxygen (DO) of the Weihe River in Shaanxi Province, China. The spatial distribution map for those variables over a part of the Weihe River was also produced. SVR can be used to implement any nonlinear mapping readily, and semi-supervis- ed learning can make use of both labeled and unlabeled samples. By integrating the two SVRs and using semi-supervised learning, we provide an operational method when paired samples are limited. The results show that it is much better than the multiple statistical regression method, and can provide the whole water pollution condi-tions for management fast and can be extended to hyperspectral remote sensing applications.展开更多
Intelligent seismic facies identification based on deep learning can alleviate the time-consuming and labor-intensive problem of manual interpretation,which has been widely applied.Supervised learning can realize faci...Intelligent seismic facies identification based on deep learning can alleviate the time-consuming and labor-intensive problem of manual interpretation,which has been widely applied.Supervised learning can realize facies identification with high efficiency and accuracy;however,it depends on the usage of a large amount of well-labeled data.To solve this issue,we propose herein an incremental semi-supervised method for intelligent facies identification.Our method considers the continuity of the lateral variation of strata and uses cosine similarity to quantify the similarity of the seismic data feature domain.The maximum-diff erence sample in the neighborhood of the currently used training data is then found to reasonably expand the training sets.This process continuously increases the amount of training data and learns its distribution.We integrate old knowledge while absorbing new ones to realize incremental semi-supervised learning and achieve the purpose of evolving the network models.In this work,accuracy and confusion matrix are employed to jointly control the predicted results of the model from both overall and partial aspects.The obtained values are then applied to a three-dimensional(3D)real dataset and used to quantitatively evaluate the results.Using unlabeled data,our proposed method acquires more accurate and stable testing results compared to conventional supervised learning algorithms that only use well-labeled data.A considerable improvement for small-sample categories is also observed.Using less than 1%of the training data,the proposed method can achieve an average accuracy of over 95%on the 3D dataset.In contrast,the conventional supervised learning algorithm achieved only approximately 85%.展开更多
Objective To explore the semi-supervised learning(SSL) algorithm for long-tail endoscopic image classification with limited annotations.Method We explored semi-supervised long-tail endoscopic image classification in H...Objective To explore the semi-supervised learning(SSL) algorithm for long-tail endoscopic image classification with limited annotations.Method We explored semi-supervised long-tail endoscopic image classification in HyperKvasir,the largest gastrointestinal public dataset with 23 diverse classes.Semi-supervised learning algorithm FixMatch was applied based on consistency regularization and pseudo-labeling.After splitting the training dataset and the test dataset at a ratio of 4:1,we sampled 20%,50%,and 100% labeled training data to test the classification with limited annotations.Results The classification performance was evaluated by micro-average and macro-average evaluation metrics,with the Mathews correlation coefficient(MCC) as the overall evaluation.SSL algorithm improved the classification performance,with MCC increasing from 0.8761 to 0.8850,from 0.8983 to 0.8994,and from 0.9075 to 0.9095 with 20%,50%,and 100% ratio of labeled training data,respectively.With a 20% ratio of labeled training data,SSL improved both the micro-average and macro-average classification performance;while for the ratio of 50% and 100%,SSL improved the micro-average performance but hurt macro-average performance.Through analyzing the confusion matrix and labeling bias in each class,we found that the pseudo-based SSL algorithm exacerbated the classifier’ s preference for the head class,resulting in improved performance in the head class and degenerated performance in the tail class.Conclusion SSL can improve the classification performance for semi-supervised long-tail endoscopic image classification,especially when the labeled data is extremely limited,which may benefit the building of assisted diagnosis systems for low-volume hospitals.However,the pseudo-labeling strategy may amplify the effect of class imbalance,which hurts the classification performance for the tail class.展开更多
Multi-label data with high dimensionality often occurs,which will produce large time and energy overheads when directly used in classification tasks.To solve this problem,a novel algorithm called multi-label dimension...Multi-label data with high dimensionality often occurs,which will produce large time and energy overheads when directly used in classification tasks.To solve this problem,a novel algorithm called multi-label dimensionality reduction via semi-supervised discriminant analysis(MSDA) was proposed.It was expected to derive an objective discriminant function as smooth as possible on the data manifold by multi-label learning and semi-supervised learning.By virtue of the latent imformation,which was provided by the graph weighted matrix of sample attributes and the similarity correlation matrix of partial sample labels,MSDA readily made the separability between different classes achieve maximization and estimated the intrinsic geometric structure in the lower manifold space by employing unlabeled data.Extensive experimental results on several real multi-label datasets show that after dimensionality reduction using MSDA,the average classification accuracy is about 9.71% higher than that of other algorithms,and several evaluation metrices like Hamming-loss are also superior to those of other dimensionality reduction methods.展开更多
Experimentation data of perspex glass sheet cutting, using CO2 laser, with missing values were modelled with semi-supervised artificial neural networks. Factorial design of experiment was selected for the verification...Experimentation data of perspex glass sheet cutting, using CO2 laser, with missing values were modelled with semi-supervised artificial neural networks. Factorial design of experiment was selected for the verification of orthogonal array based model prediction. It shows improvement in modelling of edge quality and kerf width by applying semi-supervised learning algorithm, based on novel error assessment on simulations. The results are expected to depict better prediction on average by utilizing the systematic randomized techniques to initialize the neural network weights and increase the number of initialization. Missing values handling is difficult with statistical tools and supervised learning techniques; on the other hand, semi-supervised learning generates better results with the smallest datasets even with missing values.展开更多
To achieve fine segmentation of complex natural images, people often resort to an interactive segmentation paradigm, since fully automatic methods often fail to obtain a result consistent with the ground truth. Howeve...To achieve fine segmentation of complex natural images, people often resort to an interactive segmentation paradigm, since fully automatic methods often fail to obtain a result consistent with the ground truth. However, when the foreground and background share some similar areas in color, the fine segmentation result of conventional interactive methods usually relies on the increase o f manual labels. This paper presents a novel interactive image segmentation method via a regression-based ensemble model with semi-supervised learning. The task is formulated as a non-linear problem integrating two complementary spline regressors and strengthening the robustness of each regressor via semi-supervised learning. First, two spline regressors with a complementary nature are constructed based on multivariate adaptive regression splines (MARS) and smooth thin plate spline regression (TPSR). Then, a regressor boosting method based on a clustering hypothesis and semi-supervised learning is proposed to assist the training of MARS and TPSR by using the region segmentation information contained in unlabeled pixels. Next, a support vector regression (SVR) based decision fusion model is adopted to integrate the results of MARS and TPSR. Finally, the GraphCut is introduced and combined with the SVR ensemble results to achieve image segmentation. Extensive experimental results on benchmark datasets of BSDS500 and Pascal VOC have demonstrated the effectiveness of our method, and the com- parison with experiment results has validated that the proposed method is comparable with the state-of-the-art methods for in- teractive natural image segmentation.展开更多
Semi-supervised learning is an emerging computational paradigm for machine learning,that aims to make better use of large amounts of inexpensive unlabeled data to improve the learning performance.While various methods...Semi-supervised learning is an emerging computational paradigm for machine learning,that aims to make better use of large amounts of inexpensive unlabeled data to improve the learning performance.While various methods have been proposed based on different intuitions,the crucial issue of generalization performance is still poorly understood.In this paper,we investigate the convergence property of the Laplacian regularized least squares regression,a semi-supervised learning algorithm based on manifold regularization.Moreover,the improvement of error bounds in terms of the number of labeled and unlabeled data is presented for the first time as far as we know.The convergence rate depends on the approximation property and the capacity of the reproducing kernel Hilbert space measured by covering numbers.Some new techniques are exploited for the analysis since an extra regularizer is introduced.展开更多
文摘半监督分类研究的主要内容是,如何有效地利用大量的无类别标签的数据对分类问题所具有的有用信息。该文提出了一种基于流形判别分析的半监督支持向量机(Semi-Supervised Support Vector Machine Based on Manifold-based Discriminant Analysis,简称MDASSVM)。通过定义基于流形的类内离散度和类间离散度,充分利用流形判别分析的性质,进一步改进半监督支持向量机,在分类决策时同时考虑样本的边界信息、分布特征以及局部流形结构,该方法不仅继承了传统降维方法的优势,而且进一步提高降维效率。人造数据集和UCI中的部分实际数据集上的实验结果表明,与现有算法相比,数据集通过该算法降维后,能使半监督支持向量机有更高的分类精度。
基金The National Natural Science Foundation of China (32371993)The Natural Science Research Key Project of Anhui Provincial University(2022AH040125&2023AH040135)The Key Research and Development Plan of Anhui Province (202204c06020022&2023n06020057)。
文摘This study aimed to address the challenge of accurately and reliably detecting tomatoes in dense planting environments,a critical prerequisite for the automation implementation of robotic harvesting.However,the heavy reliance on extensive manually annotated datasets for training deep learning models still poses significant limitations to their application in real-world agricultural production environments.To overcome these limitations,we employed domain adaptive learning approach combined with the YOLOv5 model to develop a novel tomato detection model called as TDA-YOLO(tomato detection domain adaptation).We designated the normal illumination scenes in dense planting environments as the source domain and utilized various other illumination scenes as the target domain.To construct bridge mechanism between source and target domains,neural preset for color style transfer is introduced to generate a pseudo-dataset,which served to deal with domain discrepancy.Furthermore,this study combines the semi-supervised learning method to enable the model to extract domain-invariant features more fully,and uses knowledge distillation to improve the model's ability to adapt to the target domain.Additionally,for purpose of promoting inference speed and low computational demand,the lightweight FasterNet network was integrated into the YOLOv5's C3 module,creating a modified C3_Faster module.The experimental results demonstrated that the proposed TDA-YOLO model significantly outperformed original YOLOv5s model,achieving a mAP(mean average precision)of 96.80%for tomato detection across diverse scenarios in dense planting environments,increasing by 7.19 percentage points;Compared with the latest YOLOv8 and YOLOv9,it is also 2.17 and 1.19 percentage points higher,respectively.The model's average detection time per image was an impressive 15 milliseconds,with a FLOPs(floating point operations per second)count of 13.8 G.After acceleration processing,the detection accuracy of the TDA-YOLO model on the Jetson Xavier NX development board is 90.95%,the mAP value is 91.35%,and the detection time of each image is 21 ms,which can still meet the requirements of real-time detection of tomatoes in dense planting environment.The experimental results show that the proposed TDA-YOLO model can accurately and quickly detect tomatoes in dense planting environment,and at the same time avoid the use of a large number of annotated data,which provides technical support for the development of automatic harvesting systems for tomatoes and other fruits.
基金supported by the "12th Five Year Plan" National Science and Technology Major Special Subject:Well Logging Data and Seismic Data Fusion Technology Research(No.2011ZX05023-005-006)
文摘At the early stages of deep-water oil exploration and development, fewer and further apart wells are drilled than in onshore oilfields. Supervised least squares support vector machine algorithms are used to predict the reservoir parameters but the prediction accuracy is low. We combined the least squares support vector machine (LSSVM) algorithm with semi-supervised learning and established a semi-supervised regression model, which we call the semi-supervised least squares support vector machine (SLSSVM) model. The iterative matrix inversion is also introduced to improve the training ability and training time of the model. We use the UCI data to test the generalization of a semi-supervised and a supervised LSSVM models. The test results suggest that the generalization performance of the LSSVM model greatly improves and with decreasing training samples the generalization performance is better. Moreover, for small-sample models, the SLSSVM method has higher precision than the semi-supervised K-nearest neighbor (SKNN) method. The new semi- supervised LSSVM algorithm was used to predict the distribution of porosity and sandstone in the Jingzhou study area.
基金Projects(61603393,61973306)supported in part by the National Natural Science Foundation of ChinaProject(BK20160275)supported by the Natural Science Foundation of Jiangsu Province,China+1 种基金Projects(2015M581885,2018T110571)supported by the Postdoctoral Science Foundation of ChinaProject(PAL-N201706)supported by the Open Project Foundation of State Key Laboratory of Synthetical Automation for Process Industries of Northeastern University,China
文摘Direct online measurement on product quality of industrial processes is difficult to be realized,which leads to a large number of unlabeled samples in modeling data.Therefore,it needs to employ semi-supervised learning(SSL)method to establish the soft sensor model of product quality.Considering the slow time-varying characteristic of industrial processes,the model parameters should be updated smoothly.According to this characteristic,this paper proposes an online adaptive semi-supervised learning algorithm based on random vector functional link network(RVFLN),denoted as OAS-RVFLN.By introducing a L2-fusion term that can be seen a weight deviation constraint,the proposed algorithm unifies the offline and online learning,and achieves smoothness of model parameter update.Empirical evaluations both on benchmark testing functions and datasets reveal that the proposed OAS-RVFLN can outperform the conventional methods in learning speed and accuracy.Finally,the OAS-RVFLN is applied to the coal dense medium separation process in coal industry to estimate the ash content of coal product,which further verifies its effectiveness and potential of industrial application.
基金Under the auspices of National Natural Science Foundation of China (No. 40671133)Fundamental Research Funds for the Central Universities (No. GK200902015)
文摘This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consisted of two support vector regressors (SVRs). Nonlinear relationship between water quality variables and SPOT 5 spectrum was described by the two SVRs, and semi-supervised co-training algorithm for the SVRs was es-tablished. The model was used for retrieving concentrations of four representative pollution indicators―permangan- ate index (CODmn), ammonia nitrogen (NH3-N), chemical oxygen demand (COD) and dissolved oxygen (DO) of the Weihe River in Shaanxi Province, China. The spatial distribution map for those variables over a part of the Weihe River was also produced. SVR can be used to implement any nonlinear mapping readily, and semi-supervis- ed learning can make use of both labeled and unlabeled samples. By integrating the two SVRs and using semi-supervised learning, we provide an operational method when paired samples are limited. The results show that it is much better than the multiple statistical regression method, and can provide the whole water pollution condi-tions for management fast and can be extended to hyperspectral remote sensing applications.
基金financially supported by the National Key R&D Program of China(No.2018YFA0702504)the National Natural Science Foundation of China(No.42174152 and No.41974140)+1 种基金the Science Foundation of China University of Petroleum,Beijing(No.2462020YXZZ008 and No.2462020QZDX003)the Strategic Cooperation Technology Projects of CNPC and CUPB(No.ZLZX2020-03).
文摘Intelligent seismic facies identification based on deep learning can alleviate the time-consuming and labor-intensive problem of manual interpretation,which has been widely applied.Supervised learning can realize facies identification with high efficiency and accuracy;however,it depends on the usage of a large amount of well-labeled data.To solve this issue,we propose herein an incremental semi-supervised method for intelligent facies identification.Our method considers the continuity of the lateral variation of strata and uses cosine similarity to quantify the similarity of the seismic data feature domain.The maximum-diff erence sample in the neighborhood of the currently used training data is then found to reasonably expand the training sets.This process continuously increases the amount of training data and learns its distribution.We integrate old knowledge while absorbing new ones to realize incremental semi-supervised learning and achieve the purpose of evolving the network models.In this work,accuracy and confusion matrix are employed to jointly control the predicted results of the model from both overall and partial aspects.The obtained values are then applied to a three-dimensional(3D)real dataset and used to quantitatively evaluate the results.Using unlabeled data,our proposed method acquires more accurate and stable testing results compared to conventional supervised learning algorithms that only use well-labeled data.A considerable improvement for small-sample categories is also observed.Using less than 1%of the training data,the proposed method can achieve an average accuracy of over 95%on the 3D dataset.In contrast,the conventional supervised learning algorithm achieved only approximately 85%.
文摘Objective To explore the semi-supervised learning(SSL) algorithm for long-tail endoscopic image classification with limited annotations.Method We explored semi-supervised long-tail endoscopic image classification in HyperKvasir,the largest gastrointestinal public dataset with 23 diverse classes.Semi-supervised learning algorithm FixMatch was applied based on consistency regularization and pseudo-labeling.After splitting the training dataset and the test dataset at a ratio of 4:1,we sampled 20%,50%,and 100% labeled training data to test the classification with limited annotations.Results The classification performance was evaluated by micro-average and macro-average evaluation metrics,with the Mathews correlation coefficient(MCC) as the overall evaluation.SSL algorithm improved the classification performance,with MCC increasing from 0.8761 to 0.8850,from 0.8983 to 0.8994,and from 0.9075 to 0.9095 with 20%,50%,and 100% ratio of labeled training data,respectively.With a 20% ratio of labeled training data,SSL improved both the micro-average and macro-average classification performance;while for the ratio of 50% and 100%,SSL improved the micro-average performance but hurt macro-average performance.Through analyzing the confusion matrix and labeling bias in each class,we found that the pseudo-based SSL algorithm exacerbated the classifier’ s preference for the head class,resulting in improved performance in the head class and degenerated performance in the tail class.Conclusion SSL can improve the classification performance for semi-supervised long-tail endoscopic image classification,especially when the labeled data is extremely limited,which may benefit the building of assisted diagnosis systems for low-volume hospitals.However,the pseudo-labeling strategy may amplify the effect of class imbalance,which hurts the classification performance for the tail class.
基金Project(60425310) supported by the National Science Fund for Distinguished Young ScholarsProject(10JJ6094) supported by the Hunan Provincial Natural Foundation of China
文摘Multi-label data with high dimensionality often occurs,which will produce large time and energy overheads when directly used in classification tasks.To solve this problem,a novel algorithm called multi-label dimensionality reduction via semi-supervised discriminant analysis(MSDA) was proposed.It was expected to derive an objective discriminant function as smooth as possible on the data manifold by multi-label learning and semi-supervised learning.By virtue of the latent imformation,which was provided by the graph weighted matrix of sample attributes and the similarity correlation matrix of partial sample labels,MSDA readily made the separability between different classes achieve maximization and estimated the intrinsic geometric structure in the lower manifold space by employing unlabeled data.Extensive experimental results on several real multi-label datasets show that after dimensionality reduction using MSDA,the average classification accuracy is about 9.71% higher than that of other algorithms,and several evaluation metrices like Hamming-loss are also superior to those of other dimensionality reduction methods.
文摘Experimentation data of perspex glass sheet cutting, using CO2 laser, with missing values were modelled with semi-supervised artificial neural networks. Factorial design of experiment was selected for the verification of orthogonal array based model prediction. It shows improvement in modelling of edge quality and kerf width by applying semi-supervised learning algorithm, based on novel error assessment on simulations. The results are expected to depict better prediction on average by utilizing the systematic randomized techniques to initialize the neural network weights and increase the number of initialization. Missing values handling is difficult with statistical tools and supervised learning techniques; on the other hand, semi-supervised learning generates better results with the smallest datasets even with missing values.
基金the National Natural Science Foundation of China (Nos. 61071176, 61171192, and 61272337) and the Doctoral
文摘To achieve fine segmentation of complex natural images, people often resort to an interactive segmentation paradigm, since fully automatic methods often fail to obtain a result consistent with the ground truth. However, when the foreground and background share some similar areas in color, the fine segmentation result of conventional interactive methods usually relies on the increase o f manual labels. This paper presents a novel interactive image segmentation method via a regression-based ensemble model with semi-supervised learning. The task is formulated as a non-linear problem integrating two complementary spline regressors and strengthening the robustness of each regressor via semi-supervised learning. First, two spline regressors with a complementary nature are constructed based on multivariate adaptive regression splines (MARS) and smooth thin plate spline regression (TPSR). Then, a regressor boosting method based on a clustering hypothesis and semi-supervised learning is proposed to assist the training of MARS and TPSR by using the region segmentation information contained in unlabeled pixels. Next, a support vector regression (SVR) based decision fusion model is adopted to integrate the results of MARS and TPSR. Finally, the GraphCut is introduced and combined with the SVR ensemble results to achieve image segmentation. Extensive experimental results on benchmark datasets of BSDS500 and Pascal VOC have demonstrated the effectiveness of our method, and the com- parison with experiment results has validated that the proposed method is comparable with the state-of-the-art methods for in- teractive natural image segmentation.
基金supported by National Natural Science Foundation of China (Grant Nos.11171014 and 11101024)National Basic Research Program of China (973 Project) (Grant No. 2010CB731900)
文摘Semi-supervised learning is an emerging computational paradigm for machine learning,that aims to make better use of large amounts of inexpensive unlabeled data to improve the learning performance.While various methods have been proposed based on different intuitions,the crucial issue of generalization performance is still poorly understood.In this paper,we investigate the convergence property of the Laplacian regularized least squares regression,a semi-supervised learning algorithm based on manifold regularization.Moreover,the improvement of error bounds in terms of the number of labeled and unlabeled data is presented for the first time as far as we know.The convergence rate depends on the approximation property and the capacity of the reproducing kernel Hilbert space measured by covering numbers.Some new techniques are exploited for the analysis since an extra regularizer is introduced.