为有效提高C语言中间表达式解读程序信息的能力,在结合C语言自身语言特点的基础上,引入流程控制图(Flow Control Graph,FCG)匹配自动评分方法,该方法能够根据C语言的特点,计算学生答案与参考答案之间的相似度,给出评分。引入最近邻(Flow...为有效提高C语言中间表达式解读程序信息的能力,在结合C语言自身语言特点的基础上,引入流程控制图(Flow Control Graph,FCG)匹配自动评分方法,该方法能够根据C语言的特点,计算学生答案与参考答案之间的相似度,给出评分。引入最近邻(Flow Control-KNN,FC-KNN)算法来对FCG算法进行模板脱敏,在FCG的基础上,运用k临近算法根据提取的特征对程序进行评分。实验结果证明,FCG和FC-KNN算法在独立运行时分别具有91.5%和92.3%的平均准确率,而经过融合后,算法之间实现了优势互补,准确率提升到94.0%,在独立运行的情况下,FC-KNN算法的评分效果较FCG好,准确性更高,对两种算法进行数据融合、优势互补,验证了集成后的分类模型在评分的整个过程中,均能够达到良好的分类效果,具有较高的准确率。展开更多
In the petroleum industry, sensor data and information are valuable. It can detect, predict and help to understand processes during oil production. Offshore wells require more attention. Once workovers, maintenance, a...In the petroleum industry, sensor data and information are valuable. It can detect, predict and help to understand processes during oil production. Offshore wells require more attention. Once workovers, maintenance, and intervention are more costly than onshore wells. Coupling data-driven methods for well-monitoring applications, two unsupervised classification methods, one statistical and one machine learning-based, are proposed to detect anomalies in well data. The novelty is presented by applying a Control Chart us</span><span style="font-family:Verdana;">ing a 3 standard deviations window for the Permanent Downhole Gauge Pr</span><span style="font-family:Verdana;">es</span><span style="font-family:Verdana;">sure sensor (P-PDG), and a Fuzzy C-means algorithm to classify data from pr</span><span style="font-family:Verdana;">essure and temperature sensors in an offshore field. The main goal in structuring a classified data set is using it to train machine learning models to monitor and manage petroleum production. Modeling applications for early fault detection systems in offshore production, based on real-time data from production sensors, require classified data sets. Then, labeling two target classes</span></span><span style="font-family:Verdana;">:</span><span style="font-family:""><span style="font-family:Verdana;"> “normal” and “fault” is a key step to be implemented in order to train the machine learning models. Therefore, this paper applies two methodologies to classify a real-time data set to create a training data set divided into “normal” </span><span style="font-family:Verdana;">and “fault” classes. Thus, it is possible to visualize the abnormal events poi</span><span style="font-family:Verdana;">nted out by the methodologies and compare how sensible is each method. In addition, </span></span><span style="font-family:Verdana;">it </span><span style="font-family:""><span style="font-family:Verdana;">is proposed a random forest application to test the performance of the classified data sets from both methods. The results have shown that the con</span><span style="font-family:Verdana;">trol chart method presents higher sensibility than fuzzy c-means, however, th</span><span style="font-family:Verdana;">e </span><span style="font-family:Verdana;">differences between are insignificant. The random forest performance displ</span><span style="font-family:Verdana;">ayed sensitivity and specificity values of 99.91% and 100% for the data set classified by the control chart method and 94.01% and 99.98% for the data set classified by fuzzy c-means algorithm.展开更多
A unified approach is proposed for making a continuity adjustment on some control charts for attributes, e.g., np-chart and c-chart, through adding a uniform (0, 1) random observation to the conventional sample statis...A unified approach is proposed for making a continuity adjustment on some control charts for attributes, e.g., np-chart and c-chart, through adding a uniform (0, 1) random observation to the conventional sample statistic (e.g., and c <SUB>i </SUB>). The adjusted sample statistic then has a continuous distribution. Consequently, given any Type I risk α (the probability that the sample statistic is on or beyond the control limits), control charts achieving the exact value of α can be readily constructed. Guidelines are given for when to use the continuity adjustment control chart, the conventional Shewhart control chart (with ±3 standard deviations control limits), and the control chart based on the exact distribution of the sample statistic before adjustment.展开更多
文摘为有效提高C语言中间表达式解读程序信息的能力,在结合C语言自身语言特点的基础上,引入流程控制图(Flow Control Graph,FCG)匹配自动评分方法,该方法能够根据C语言的特点,计算学生答案与参考答案之间的相似度,给出评分。引入最近邻(Flow Control-KNN,FC-KNN)算法来对FCG算法进行模板脱敏,在FCG的基础上,运用k临近算法根据提取的特征对程序进行评分。实验结果证明,FCG和FC-KNN算法在独立运行时分别具有91.5%和92.3%的平均准确率,而经过融合后,算法之间实现了优势互补,准确率提升到94.0%,在独立运行的情况下,FC-KNN算法的评分效果较FCG好,准确性更高,对两种算法进行数据融合、优势互补,验证了集成后的分类模型在评分的整个过程中,均能够达到良好的分类效果,具有较高的准确率。
文摘In the petroleum industry, sensor data and information are valuable. It can detect, predict and help to understand processes during oil production. Offshore wells require more attention. Once workovers, maintenance, and intervention are more costly than onshore wells. Coupling data-driven methods for well-monitoring applications, two unsupervised classification methods, one statistical and one machine learning-based, are proposed to detect anomalies in well data. The novelty is presented by applying a Control Chart us</span><span style="font-family:Verdana;">ing a 3 standard deviations window for the Permanent Downhole Gauge Pr</span><span style="font-family:Verdana;">es</span><span style="font-family:Verdana;">sure sensor (P-PDG), and a Fuzzy C-means algorithm to classify data from pr</span><span style="font-family:Verdana;">essure and temperature sensors in an offshore field. The main goal in structuring a classified data set is using it to train machine learning models to monitor and manage petroleum production. Modeling applications for early fault detection systems in offshore production, based on real-time data from production sensors, require classified data sets. Then, labeling two target classes</span></span><span style="font-family:Verdana;">:</span><span style="font-family:""><span style="font-family:Verdana;"> “normal” and “fault” is a key step to be implemented in order to train the machine learning models. Therefore, this paper applies two methodologies to classify a real-time data set to create a training data set divided into “normal” </span><span style="font-family:Verdana;">and “fault” classes. Thus, it is possible to visualize the abnormal events poi</span><span style="font-family:Verdana;">nted out by the methodologies and compare how sensible is each method. In addition, </span></span><span style="font-family:Verdana;">it </span><span style="font-family:""><span style="font-family:Verdana;">is proposed a random forest application to test the performance of the classified data sets from both methods. The results have shown that the con</span><span style="font-family:Verdana;">trol chart method presents higher sensibility than fuzzy c-means, however, th</span><span style="font-family:Verdana;">e </span><span style="font-family:Verdana;">differences between are insignificant. The random forest performance displ</span><span style="font-family:Verdana;">ayed sensitivity and specificity values of 99.91% and 100% for the data set classified by the control chart method and 94.01% and 99.98% for the data set classified by fuzzy c-means algorithm.
基金the Natural Science and Engineering Research Council of Canada and Research Grant Council of Hong Kong grants.
文摘A unified approach is proposed for making a continuity adjustment on some control charts for attributes, e.g., np-chart and c-chart, through adding a uniform (0, 1) random observation to the conventional sample statistic (e.g., and c <SUB>i </SUB>). The adjusted sample statistic then has a continuous distribution. Consequently, given any Type I risk α (the probability that the sample statistic is on or beyond the control limits), control charts achieving the exact value of α can be readily constructed. Guidelines are given for when to use the continuity adjustment control chart, the conventional Shewhart control chart (with ±3 standard deviations control limits), and the control chart based on the exact distribution of the sample statistic before adjustment.