The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited stand...The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited standard samples with labeled certified concentrations are available. A novel semi-supervised LIBS quantitative analysis method is proposed, based on co-training regression model with selection of effective unlabeled samples. The main idea of the proposed method is to obtain better regression performance by adding effective unlabeled samples in semisupervised learning. First, effective unlabeled samples are selected according to the testing samples by Euclidean metric. Two original regression models based on least squares support vector machine with different parameters are trained by the labeled samples separately, and then the effective unlabeled samples predicted by the two models are used to enlarge the training dataset based on labeling confidence estimation. The final predictions of the proposed method on the testing samples will be determined by weighted combinations of the predictions of two updated regression models. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples were carried out, in which 5 samples with labeled concentrations and 11 unlabeled samples were used to train the regression models and the remaining 7 samples were used for testing. With the numbers of effective unlabeled samples increasing, the root mean square error of the proposed method went down from 1.80% to 0.84% and the relative prediction error was reduced from 9.15% to 4.04%.展开更多
Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类...Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类簇中心和局部密度可选出数据空间结构表现较好的样本。DPC-TT算法采用密度峰值聚类算法获取训练数据的类簇中心和样本的局部密度,对类簇中心的截断距离范围内的样本认定为空间结构表现较好,标记为核心数据,使用核心数据更新分类器,可降低迭代过程中的训练噪声,进而提高分类器的性能。实验结果表明:相比于标准Tritraining算法及其改进算法,DPC-TT算法具有更好的分类性能。展开更多
This study describes the planning process of a major multi-disciplinary research project that aims to enhance effectiveness of the Higher Degree Research (HDR) training process in computing-related disciplines by ap...This study describes the planning process of a major multi-disciplinary research project that aims to enhance effectiveness of the Higher Degree Research (HDR) training process in computing-related disciplines by applying the threshold concept theories and framework. Two specific disciplines, the computer science and information systems were chosen for the study that closely represents the two ends of a wide range of computing discipline spectrum within the faculties of science, engineering, business and education. The ultimate goal of the above major project, when completed, is to enhance productivity of research training process in computing schools. The foreshadowed problem is that for many of HDR students it takes a long time to produce a specific result expected to be done in much shorter periods of time; and yet at some stage the student seems to get over this hurdle almost overnight. By adopting a threshold concept framework this study extends existing studies in the above area by specifically targeting HDR process in computing disciplines, and provides plans for a wide range of studies that will ideally lead to identification of threshold concept for HDR students in computing disciplines. The issue under development in the current study is how can the process of overcoming the above hurdles be facilitated? How can the productivity of various resources utilized during the above long and frustrating waiting periods be increased by shortening the waiting times?展开更多
Leveraging the Baidu Qianfan model platform,this paper designs and implements a highly efficient and accurate scoring system for subjective questions,focusing primarily on questions in the field of computer network te...Leveraging the Baidu Qianfan model platform,this paper designs and implements a highly efficient and accurate scoring system for subjective questions,focusing primarily on questions in the field of computer network technology.The system enhances the foundational model by utilizing Qianfan’s training tools and integrating advanced techniques,such as supervised fine-tuning.In the data preparation phase,a comprehensive collection of subjective data related to computer network technology is gathered,cleaned,and labeled.During model training and evaluation,optimal hyperparameters and tuning strategies are applied,resulting in a model capable of scoring with high accuracy.Evaluation results demonstrate that the proposed model performs well across multiple dimensions-content,expression,and development scores-yielding results comparable to those of manual scoring.展开更多
本文充分利用网页数据的超链接关系和文本信息,提出了一种用于网页分类的归纳式半监督学习算法:基于图的Co-training网页分类算法(Graph based Co-training algorithmfor web page classification),简称GCo-training,并从理论上证明了...本文充分利用网页数据的超链接关系和文本信息,提出了一种用于网页分类的归纳式半监督学习算法:基于图的Co-training网页分类算法(Graph based Co-training algorithmfor web page classification),简称GCo-training,并从理论上证明了算法的有效性.GCo-training在Co-training算法框架下,迭代地学习一个基于由超链接信息构造的图的半监督分类器和一个基于文本特征的Bayes分类器.基于图的半监督分类器只利用少量的标记数据,通过挖掘数据间大量的关系信息就可达到比较高的预测精度,可为Bayes分类器提供大量的标记信息;反过来学习大量标记信息后的Bayes分类器也可为基于图的分类器提供有效信息.迭代过程中,二者互相帮助,不断提高各自的性能,而后Bayes分类器可以用来预测大量未见数据的类别.在Web→KB数据集上的实验结果表明,与利用文本特征和锚文本特征的Co-training算法和基于EM的Bayes算法相比,GCo-training算法性能优越.展开更多
Tri-training能有效利用无标记样例提高泛化能力.针对Tri-training迭代中无标记样例常被错误标记而形成训练集噪声,导致性能不稳定的缺点,文中提出ADE-Tri-training(Tri-training with Adaptive Data Editing)新算法.它不仅利用Remove O...Tri-training能有效利用无标记样例提高泛化能力.针对Tri-training迭代中无标记样例常被错误标记而形成训练集噪声,导致性能不稳定的缺点,文中提出ADE-Tri-training(Tri-training with Adaptive Data Editing)新算法.它不仅利用Remove Only剪辑操作对每次迭代可能产生的误标记样例识别并移除,更重要的是采用自适应策略来确定Remove Only触发与抑制的恰当时机.文中证明,PAC理论下自适应策略中一系列判别充分条件可同时确保新训练集规模迭代增大和新假设分类错误率迭代降低更多.UCI数据集上实验结果表明:ADE-Tri-training具有更好的分类泛化性能和健壮性.展开更多
基金supported by National Natural Science Foundation of China (No. 51674032)
文摘The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited standard samples with labeled certified concentrations are available. A novel semi-supervised LIBS quantitative analysis method is proposed, based on co-training regression model with selection of effective unlabeled samples. The main idea of the proposed method is to obtain better regression performance by adding effective unlabeled samples in semisupervised learning. First, effective unlabeled samples are selected according to the testing samples by Euclidean metric. Two original regression models based on least squares support vector machine with different parameters are trained by the labeled samples separately, and then the effective unlabeled samples predicted by the two models are used to enlarge the training dataset based on labeling confidence estimation. The final predictions of the proposed method on the testing samples will be determined by weighted combinations of the predictions of two updated regression models. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples were carried out, in which 5 samples with labeled concentrations and 11 unlabeled samples were used to train the regression models and the remaining 7 samples were used for testing. With the numbers of effective unlabeled samples increasing, the root mean square error of the proposed method went down from 1.80% to 0.84% and the relative prediction error was reduced from 9.15% to 4.04%.
文摘Tri-training利用无标签数据进行分类可有效提高分类器的泛化能力,但其易将无标签数据误标,从而形成训练噪声。提出一种基于密度峰值聚类的Tri-training(Tri-training with density peaks clustering,DPC-TT)算法。密度峰值聚类通过类簇中心和局部密度可选出数据空间结构表现较好的样本。DPC-TT算法采用密度峰值聚类算法获取训练数据的类簇中心和样本的局部密度,对类簇中心的截断距离范围内的样本认定为空间结构表现较好,标记为核心数据,使用核心数据更新分类器,可降低迭代过程中的训练噪声,进而提高分类器的性能。实验结果表明:相比于标准Tritraining算法及其改进算法,DPC-TT算法具有更好的分类性能。
文摘This study describes the planning process of a major multi-disciplinary research project that aims to enhance effectiveness of the Higher Degree Research (HDR) training process in computing-related disciplines by applying the threshold concept theories and framework. Two specific disciplines, the computer science and information systems were chosen for the study that closely represents the two ends of a wide range of computing discipline spectrum within the faculties of science, engineering, business and education. The ultimate goal of the above major project, when completed, is to enhance productivity of research training process in computing schools. The foreshadowed problem is that for many of HDR students it takes a long time to produce a specific result expected to be done in much shorter periods of time; and yet at some stage the student seems to get over this hurdle almost overnight. By adopting a threshold concept framework this study extends existing studies in the above area by specifically targeting HDR process in computing disciplines, and provides plans for a wide range of studies that will ideally lead to identification of threshold concept for HDR students in computing disciplines. The issue under development in the current study is how can the process of overcoming the above hurdles be facilitated? How can the productivity of various resources utilized during the above long and frustrating waiting periods be increased by shortening the waiting times?
文摘Leveraging the Baidu Qianfan model platform,this paper designs and implements a highly efficient and accurate scoring system for subjective questions,focusing primarily on questions in the field of computer network technology.The system enhances the foundational model by utilizing Qianfan’s training tools and integrating advanced techniques,such as supervised fine-tuning.In the data preparation phase,a comprehensive collection of subjective data related to computer network technology is gathered,cleaned,and labeled.During model training and evaluation,optimal hyperparameters and tuning strategies are applied,resulting in a model capable of scoring with high accuracy.Evaluation results demonstrate that the proposed model performs well across multiple dimensions-content,expression,and development scores-yielding results comparable to those of manual scoring.
基金Supported by the National Natural Science Foundation of China under Grant Nos.60702033 60772076 (国家自然科学基金)+3 种基金the National High-Tech Research and Development Plan of China under Grant No.2007AA01Z171 (国家高技术研究发展计划(863)the Science Fund for Distinguished Young Scholars of Heilongjiang Province of China under Grant No.JC200611 (黑龙江省杰出青年科学基金)the Natural Science Foundation of Heilongjiang Province of China under Grant No.ZJG0705 (黑龙江省自然科学重点基金)the Foundation of Harbin Institute of Technology of China under Grant No.HIT.2003.53 (哈尔滨工业大学校基金)
文摘本文充分利用网页数据的超链接关系和文本信息,提出了一种用于网页分类的归纳式半监督学习算法:基于图的Co-training网页分类算法(Graph based Co-training algorithmfor web page classification),简称GCo-training,并从理论上证明了算法的有效性.GCo-training在Co-training算法框架下,迭代地学习一个基于由超链接信息构造的图的半监督分类器和一个基于文本特征的Bayes分类器.基于图的半监督分类器只利用少量的标记数据,通过挖掘数据间大量的关系信息就可达到比较高的预测精度,可为Bayes分类器提供大量的标记信息;反过来学习大量标记信息后的Bayes分类器也可为基于图的分类器提供有效信息.迭代过程中,二者互相帮助,不断提高各自的性能,而后Bayes分类器可以用来预测大量未见数据的类别.在Web→KB数据集上的实验结果表明,与利用文本特征和锚文本特征的Co-training算法和基于EM的Bayes算法相比,GCo-training算法性能优越.
文摘Tri-training能有效利用无标记样例提高泛化能力.针对Tri-training迭代中无标记样例常被错误标记而形成训练集噪声,导致性能不稳定的缺点,文中提出ADE-Tri-training(Tri-training with Adaptive Data Editing)新算法.它不仅利用Remove Only剪辑操作对每次迭代可能产生的误标记样例识别并移除,更重要的是采用自适应策略来确定Remove Only触发与抑制的恰当时机.文中证明,PAC理论下自适应策略中一系列判别充分条件可同时确保新训练集规模迭代增大和新假设分类错误率迭代降低更多.UCI数据集上实验结果表明:ADE-Tri-training具有更好的分类泛化性能和健壮性.