In this paper, we propose a Smooth Quantile Boost Classification (SQBC) algorithm for binary classification problem. The SQBC algorithm directly uses a smooth function to approximate the “check function” of the quan...In this paper, we propose a Smooth Quantile Boost Classification (SQBC) algorithm for binary classification problem. The SQBC algorithm directly uses a smooth function to approximate the “check function” of the quantile regression. Compared to other boost-based classification algorithms, the proposed algorithm is more accurate, flexible and robust to noisy predictors. Furthermore, the SQBC algorithm also can work well in high dimensional space. Extensive numerical experiments show that our proposed method has better performance on randomly simulations and real data.展开更多
In this paper,a new quadratic kernel-free least square twin support vector machine(QLSTSVM)is proposed for binary classification problems.The advantage of QLSTSVM is that there is no need to select the kernel function...In this paper,a new quadratic kernel-free least square twin support vector machine(QLSTSVM)is proposed for binary classification problems.The advantage of QLSTSVM is that there is no need to select the kernel function and related parameters for nonlinear classification problems.After using consensus technique,we adopt alternating direction method of multipliers to solve the reformulated consensus QLSTSVM directly.To reduce CPU time,the Karush-Kuhn-Tucker(KKT)conditions is also used to solve the QLSTSVM.The performance of QLSTSVM is tested on two artificial datasets and several University of California Irvine(UCI)benchmark datasets.Numerical results indicate that the QLSTSVM may outperform several existing methods for solving twin support vector machine with Gaussian kernel in terms of the classification accuracy and operation time.展开更多
The proposed deep learning algorithm will be integrated as a binary classifier under the umbrella of a multi-class classification tool to facilitate the automated detection of non-healthy deformities, anatomical landm...The proposed deep learning algorithm will be integrated as a binary classifier under the umbrella of a multi-class classification tool to facilitate the automated detection of non-healthy deformities, anatomical landmarks, pathological findings, other anomalies and normal cases, by examining medical endoscopic images of GI tract. Each binary classifier is trained to detect one specific non-healthy condition. The algorithm analyzed in the present work expands the ability of detection of this tool by classifying GI tract image snapshots into two classes, depicting haemorrhage and non-haemorrhage state. The proposed algorithm is the result of the collaboration between interdisciplinary specialists on AI and Data Analysis, Computer Vision, Gastroenterologists of four University Gastroenterology Departments of Greek Medical Schools. The data used are 195 videos (177 from non-healthy cases and 18 from healthy cases) videos captured from the PillCam<sup>(R)</sup> Medronics device, originated from 195 patients, all diagnosed with different forms of angioectasia, haemorrhages and other diseases from different sites of the gastrointestinal (GI), mainly including difficult cases of diagnosis. Our AI algorithm is based on convolutional neural network (CNN) trained on annotated images at image level, using a semantic tag indicating whether the image contains angioectasia and haemorrhage traces or not. At least 22 CNN architectures were created and evaluated some of which pre-trained applying transfer learning on ImageNet data. All the CNN variations were introduced, trained to a prevalence dataset of 50%, and evaluated of unseen data. On test data, the best results were obtained from our CNN architectures which do not utilize backbone of transfer learning. Across a balanced dataset from no-healthy images and healthy images from 39 videos from different patients, identified correct diagnosis with sensitivity 90%, specificity 92%, precision 91.8%, FPR 8%, FNR 10%. Besides, we compared the performance of our best CNN algorithm versus our same goal algorithm based on HSV colorimetric lesions features extracted of pixel-level annotations, both algorithms trained and tested on the same data. It is evaluated that the CNN trained on image level annotated images, is 9% less sensitive, achieves 2.6% less precision, 1.2% less FPR, and 7% less FNR, than that based on HSV filters, extracted from on pixel-level annotated training data.展开更多
Diabetes is a serious health condition that can cause several issues in human body organs such as the heart and kidney as well as a serious eye disease called diabetic retinopathy(DR).Early detection and treatment are...Diabetes is a serious health condition that can cause several issues in human body organs such as the heart and kidney as well as a serious eye disease called diabetic retinopathy(DR).Early detection and treatment are crucial to prevent complete blindness or partial vision loss.Traditional detection methods,which involve ophthalmologists examining retinal fundus images,are subjective,expensive,and time-consuming.Therefore,this study employs artificial intelligence(AI)technology to perform faster and more accurate binary classifications and determine the presence of DR.In this regard,we employed three promising machine learning models namely,support vector machine(SVM),k-nearest neighbors(KNN),and Histogram Gradient Boosting(HGB),after carefully selecting features using transfer learning on the fundus images of the Asia Pacific Tele-Ophthalmology Society(APTOS)(a standard dataset),which includes 3662 images and originally categorized DR into five levels,now simplified to a binary format:No DR and DR(Classes 1-4).The results demonstrate that the SVM model outperformed the other approaches in the literature with the same dataset,achieving an excellent accuracy of 96.9%,compared to 95.6%for both the KNN and HGB models.This approach is evaluated by medical health professionals and offers a valuable pathway for the early detection of DR and can be successfully employed as a clinical decision support system.展开更多
Recently,research has been conducted to assist in the processing and analysis of histopathological images using machine learning algorithms.In this study,we established machine learning-based algorithms to detect phot...Recently,research has been conducted to assist in the processing and analysis of histopathological images using machine learning algorithms.In this study,we established machine learning-based algorithms to detect photothrombotic lesions in histological images of photothrombosis-induced rabbit brains.Six machine learning-based algorithms for binary classification were applied,and the accu-racies were compared to classify normal tissues and photothrombotic lesions.The lesion classification model consisting of a 3-layered neural network with a rectified linear unit(ReLU)activation function,Xavier initialization,and Adam optimization using datasets with a unit size of 128×128 pixels yielded the highest accuracy(0.975).In the validation using the tested histological images,it was confirmed that the model could identify regions where brain damage occurred due to photochemical ischemic stroke.Through the development of machine learning-based photothrombotic lesion classi-fication models and performance comparisons,we confirmed that machine learning algorithms have the potential to be utilized in histopathology and various medical diagnostic techniques.展开更多
Currently,breast cancer has been amajor cause of deaths in women worldwide and the World Health Organization(WHO)has confirmed this.The severity of this disease can be minimized to the large extend,if it is diagnosed ...Currently,breast cancer has been amajor cause of deaths in women worldwide and the World Health Organization(WHO)has confirmed this.The severity of this disease can be minimized to the large extend,if it is diagnosed properly at an early stage of the disease.Therefore,the proper treatment of a patient having cancer can be processed in better way,if it can be diagnosed properly as early as possible using the better algorithms.Moreover,it has been currently observed that the deep neural networks have delivered remarkable performance for detecting cancer in histopathological images of breast tissues.To address the above said issues,this paper presents a hybrid model using the transfer learning to study the histopathological images,which help in detection and rectification of the disease at a low cost.Extensive dataset experiments were carried out to validate the suggested hybrid model in this paper.The experimental results show that the proposed model outperformed the baseline methods,with F-scores of 0.81 for DenseNet+Logistic Regression hybrid model,(F-score:0.73)for Visual Geometry Group(VGG)+Logistic Regression hybrid model,(F-score:0.74)for VGG+Random Forest,(F-score:0.79)for DenseNet+Random Forest,and(F-score:0.79)for VGG+Densenet+Logistic Regression hybrid model on the dataset of histopathological images.展开更多
This paper proposes a methodology for an alternative history matching process enhanced by the incorporation of a simplified binary interpretation of reservoir saturation logs(RST) as objective function. Incorporating ...This paper proposes a methodology for an alternative history matching process enhanced by the incorporation of a simplified binary interpretation of reservoir saturation logs(RST) as objective function. Incorporating fluids saturation logs during the history matching phase unlocks the possibility to adjust or select models that better represent the near wellbore waterfront movement, which is particularly important for uncertainty mitigation during future well interference assessments in water driven reservoirs. For the purposes of this study, a semi-synthetic open-source reservoir model was used as base case to evaluate the proposed methodology. The reservoir model represents a water driven, highly heterogenous sandstone reservoir from Namorado field in Brazil. To effectively compare the proposed methodology against the conventional methods, a commercial reservoir simulator was used in combination with a state-of-the-art benchmarking workflow based on the Big LoopTMapproach. A well-known group of binary metrics were evaluated to be used as the objective function, and the Matthew correlation coefficient(MCC) has been proved to offer the best results when using binary data from water saturation logs. History matching results obtained with the proposed methodology allowed the selection of a more reliable group of reservoir models,especially for cases with high heterogeneity. The methodology also offers additional information and understanding of sweep behaviour behind the well casing at specific production zones, thus revealing full model potential to define new wells and reservoir development opportunities.展开更多
Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct ...Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct judgment of its category. In this paper, mathematical models and methods such as Chi-square test, weighted average method, principal component analysis, cluster analysis, binary classification model and grey correlation analysis were used comprehensively to analyze the data of sample glass products combined with their categories. The results showed that the weathered high-potassium glass could be divided into 12, 9, 10 and 27, 7, 22 and so on.展开更多
COVID-19 is a pandemic that has affected nearly every country in the world.At present,sustainable development in the area of public health is considered vital to securing a promising and prosperous future for humans.H...COVID-19 is a pandemic that has affected nearly every country in the world.At present,sustainable development in the area of public health is considered vital to securing a promising and prosperous future for humans.However,widespread diseases,such as COVID-19,create numerous challenges to this goal,and some of those challenges are not yet defined.In this study,a Shallow Single-Layer Perceptron Neural Network(SSLPNN)and Gaussian Process Regression(GPR)model were used for the classification and prediction of confirmed COVID-19 cases in five geographically distributed regions of Asia with diverse settings and environmental conditions:namely,China,South Korea,Japan,Saudi Arabia,and Pakistan.Significant environmental and non-environmental features were taken as the input dataset,and confirmed COVID-19 cases were taken as the output dataset.A correlation analysis was done to identify patterns in the cases related to fluctuations in the associated variables.The results of this study established that the population and air quality index of a region had a statistically significant influence on the cases.However,age and the human development index had a negative influence on the cases.The proposed SSLPNN-based classification model performed well when predicting the classes of confirmed cases.During training,the binary classification model was highly accurate,with a Root Mean Square Error(RMSE)of 0.91.Likewise,the results of the regression analysis using the GPR technique with Matern 5/2 were highly accurate(RMSE=0.95239)when predicting the number of confirmed COVID-19 cases in an area.However,dynamic management has occupied a core place in studies on the sustainable development of public health but dynamic management depends on proactive strategies based on statistically verified approaches,like Artificial Intelligence(AI).In this study,an SSLPNN model has been trained to fit public health associated data into an appropriate class,allowing GPR to predict the number of confirmed COVID-19 cases in an area based on the given values of selected parameters. Therefore, this tool can help authorities in different ecological settingseffectively manage COVID-19.展开更多
Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the verac...Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.展开更多
This paper presents a novel bootstrap based method for Receiver Operating Characteristic (ROC) analysis of Fisher classifier. By defining Fisher classifier’s output as a statistic, the bootstrap technique is used to ...This paper presents a novel bootstrap based method for Receiver Operating Characteristic (ROC) analysis of Fisher classifier. By defining Fisher classifier’s output as a statistic, the bootstrap technique is used to obtain the sampling distributions of the outputs for the positive class and the negative class respectively. As a result, the ROC curve is a plot of all the (False Positive Rate (FPR), True Positive Rate (TPR)) pairs by varying the decision threshold over the whole range of the boot- strap sampling distributions. The advantage of this method is, the bootstrap based ROC curves are much stable than those of the holdout or cross-validation, indicating a more stable ROC analysis of Fisher classifier. Experiments on five data sets publicly available demonstrate the effectiveness of the proposed method.展开更多
We continue our study on classification learning algorithms generated by Tikhonov regularization schemes associated with Gaussian kernels and general convex loss functions. Our main purpose of this paper is to improve...We continue our study on classification learning algorithms generated by Tikhonov regularization schemes associated with Gaussian kernels and general convex loss functions. Our main purpose of this paper is to improve error bounds by presenting a new comparison theorem associated with general convex loss functions and Tsybakov noise conditions. Some concrete examples are provided to illustrate the improved learning rates which demonstrate the effect of various loss functions for learning algorithms. In our analysis, the convexity of the loss functions plays a central role.展开更多
Imaging logging has become a popular means of well logging because it can visually represent the lithologic and structural characteristics of strata.The manual interpretation of imaging logging is affected by the limi...Imaging logging has become a popular means of well logging because it can visually represent the lithologic and structural characteristics of strata.The manual interpretation of imaging logging is affected by the limitations of the naked eye and experiential factors.As a result,manual interpretation accuracy is low.Therefore,it is highly useful to develop effective automatic imaging logging interpretation by machine learning.Resistivity imaging logging is the most widely used technology for imaging logging.In this paper,we propose an automatic extraction procedure for the geological features in resistivity imaging logging images.This procedure is based on machine learning and achieves good results in practical applications.Acknowledging that the existence of valueless data significantly affects the recognition effect,we propose three strategies for the identification of valueless data based on binary classification.We compare the effect of the three strategies both on an experimental dataset and in a production environment,and find that the merging method is the best performing of the three strategies.It effectively identifies the valueless data in the well logging images,thus significantly improving the automatic recognition effect of geological features in resistivity logging images.展开更多
In estimation and prediction theory,considerable attention is paid to the question of hav-ing unbiased estimators on a global population level.Recent developments in neural network modelling have mainly focused on acc...In estimation and prediction theory,considerable attention is paid to the question of hav-ing unbiased estimators on a global population level.Recent developments in neural network modelling have mainly focused on accuracy on a granular sample level,and the question of unbi-asedness on the population level has almost completely been neglected by that community.We discuss this question within neural network regression models,and we provide methods of receiving unbiased estimators for these models on the global population level.展开更多
With rapid development of E-commerce, a large amount of data including reviews about different types of products can be accessed within short time. On top of this, opinion mining is becoming increasingly effective to ...With rapid development of E-commerce, a large amount of data including reviews about different types of products can be accessed within short time. On top of this, opinion mining is becoming increasingly effective to extract valuable information for product design, improvement and brand marketing, especially with fine-grained opinion mining. However, limited by the unstructured and causal expression of opinions, one cannot extract valuable information conveniently. In this paper, we propose an integrated strategy to automatically extract feature-based information, with which one can easily acquire detailed opinion about certain products.For adaptation to the reviews' characteristics, our strategy is made up of a multi-label classification(MLC) for reviews, a binary classification(BC) for sentences and a sentence-level sequence labelling with a deep learning method. During experiment, our approach achieves 82% accuracy in the final sequence labelling task under the setting of a 20-fold cross validation. In addition, the strategy can be expediently employed in other reviews as long as there is an according amount of labelled data for startup.展开更多
Driven by the need of a plethora of machine learning applications,several attempts have been made at improving the performance of classifiers applied to imbalanced datasets.In this paper,we present a fast maximum entr...Driven by the need of a plethora of machine learning applications,several attempts have been made at improving the performance of classifiers applied to imbalanced datasets.In this paper,we present a fast maximum entropy machine(MEM)combined with a synthetic minority over-sampling technique for handling binary classification problems with high imbalance ratios,large numbers of data samples,and medium/large numbers of features.A random Fourier feature representation of kernel functions and primal estimated sub-gradient solver for support vector machine(PEGASOS)are applied to speed up the classic MEM.Experiments have been conducted using various real datasets(including two China Mobile datasets and several other standard test datasets)with various configurations.The obtained results demonstrate that the proposed algorithm has extremely low complexity but an excellent overall classification performance(in terms of several widely used evaluation metrics)as compared to the classic MEM and some other state-of-the-art methods.The proposed algorithm is particularly valuable in big data applications owing to its significantly low computational complexity.展开更多
文摘In this paper, we propose a Smooth Quantile Boost Classification (SQBC) algorithm for binary classification problem. The SQBC algorithm directly uses a smooth function to approximate the “check function” of the quantile regression. Compared to other boost-based classification algorithms, the proposed algorithm is more accurate, flexible and robust to noisy predictors. Furthermore, the SQBC algorithm also can work well in high dimensional space. Extensive numerical experiments show that our proposed method has better performance on randomly simulations and real data.
基金This research was supported by the National Natural Science Foundation of China(No.11771275).
文摘In this paper,a new quadratic kernel-free least square twin support vector machine(QLSTSVM)is proposed for binary classification problems.The advantage of QLSTSVM is that there is no need to select the kernel function and related parameters for nonlinear classification problems.After using consensus technique,we adopt alternating direction method of multipliers to solve the reformulated consensus QLSTSVM directly.To reduce CPU time,the Karush-Kuhn-Tucker(KKT)conditions is also used to solve the QLSTSVM.The performance of QLSTSVM is tested on two artificial datasets and several University of California Irvine(UCI)benchmark datasets.Numerical results indicate that the QLSTSVM may outperform several existing methods for solving twin support vector machine with Gaussian kernel in terms of the classification accuracy and operation time.
文摘The proposed deep learning algorithm will be integrated as a binary classifier under the umbrella of a multi-class classification tool to facilitate the automated detection of non-healthy deformities, anatomical landmarks, pathological findings, other anomalies and normal cases, by examining medical endoscopic images of GI tract. Each binary classifier is trained to detect one specific non-healthy condition. The algorithm analyzed in the present work expands the ability of detection of this tool by classifying GI tract image snapshots into two classes, depicting haemorrhage and non-haemorrhage state. The proposed algorithm is the result of the collaboration between interdisciplinary specialists on AI and Data Analysis, Computer Vision, Gastroenterologists of four University Gastroenterology Departments of Greek Medical Schools. The data used are 195 videos (177 from non-healthy cases and 18 from healthy cases) videos captured from the PillCam<sup>(R)</sup> Medronics device, originated from 195 patients, all diagnosed with different forms of angioectasia, haemorrhages and other diseases from different sites of the gastrointestinal (GI), mainly including difficult cases of diagnosis. Our AI algorithm is based on convolutional neural network (CNN) trained on annotated images at image level, using a semantic tag indicating whether the image contains angioectasia and haemorrhage traces or not. At least 22 CNN architectures were created and evaluated some of which pre-trained applying transfer learning on ImageNet data. All the CNN variations were introduced, trained to a prevalence dataset of 50%, and evaluated of unseen data. On test data, the best results were obtained from our CNN architectures which do not utilize backbone of transfer learning. Across a balanced dataset from no-healthy images and healthy images from 39 videos from different patients, identified correct diagnosis with sensitivity 90%, specificity 92%, precision 91.8%, FPR 8%, FNR 10%. Besides, we compared the performance of our best CNN algorithm versus our same goal algorithm based on HSV colorimetric lesions features extracted of pixel-level annotations, both algorithms trained and tested on the same data. It is evaluated that the CNN trained on image level annotated images, is 9% less sensitive, achieves 2.6% less precision, 1.2% less FPR, and 7% less FNR, than that based on HSV filters, extracted from on pixel-level annotated training data.
文摘Diabetes is a serious health condition that can cause several issues in human body organs such as the heart and kidney as well as a serious eye disease called diabetic retinopathy(DR).Early detection and treatment are crucial to prevent complete blindness or partial vision loss.Traditional detection methods,which involve ophthalmologists examining retinal fundus images,are subjective,expensive,and time-consuming.Therefore,this study employs artificial intelligence(AI)technology to perform faster and more accurate binary classifications and determine the presence of DR.In this regard,we employed three promising machine learning models namely,support vector machine(SVM),k-nearest neighbors(KNN),and Histogram Gradient Boosting(HGB),after carefully selecting features using transfer learning on the fundus images of the Asia Pacific Tele-Ophthalmology Society(APTOS)(a standard dataset),which includes 3662 images and originally categorized DR into five levels,now simplified to a binary format:No DR and DR(Classes 1-4).The results demonstrate that the SVM model outperformed the other approaches in the literature with the same dataset,achieving an excellent accuracy of 96.9%,compared to 95.6%for both the KNN and HGB models.This approach is evaluated by medical health professionals and offers a valuable pathway for the early detection of DR and can be successfully employed as a clinical decision support system.
基金This research was supported by grants from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI)funded by the Ministry of Health&Welfare(Hl17C1501)from Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Science&ICT(NRF-2020R1C1C1012230)S.H,Cho was supported by the semester internship program between Daegu Catholic University and Daegu-Gyeongbuk Medical Innovation Foundation.
文摘Recently,research has been conducted to assist in the processing and analysis of histopathological images using machine learning algorithms.In this study,we established machine learning-based algorithms to detect photothrombotic lesions in histological images of photothrombosis-induced rabbit brains.Six machine learning-based algorithms for binary classification were applied,and the accu-racies were compared to classify normal tissues and photothrombotic lesions.The lesion classification model consisting of a 3-layered neural network with a rectified linear unit(ReLU)activation function,Xavier initialization,and Adam optimization using datasets with a unit size of 128×128 pixels yielded the highest accuracy(0.975).In the validation using the tested histological images,it was confirmed that the model could identify regions where brain damage occurred due to photochemical ischemic stroke.Through the development of machine learning-based photothrombotic lesion classi-fication models and performance comparisons,we confirmed that machine learning algorithms have the potential to be utilized in histopathology and various medical diagnostic techniques.
文摘Currently,breast cancer has been amajor cause of deaths in women worldwide and the World Health Organization(WHO)has confirmed this.The severity of this disease can be minimized to the large extend,if it is diagnosed properly at an early stage of the disease.Therefore,the proper treatment of a patient having cancer can be processed in better way,if it can be diagnosed properly as early as possible using the better algorithms.Moreover,it has been currently observed that the deep neural networks have delivered remarkable performance for detecting cancer in histopathological images of breast tissues.To address the above said issues,this paper presents a hybrid model using the transfer learning to study the histopathological images,which help in detection and rectification of the disease at a low cost.Extensive dataset experiments were carried out to validate the suggested hybrid model in this paper.The experimental results show that the proposed model outperformed the baseline methods,with F-scores of 0.81 for DenseNet+Logistic Regression hybrid model,(F-score:0.73)for Visual Geometry Group(VGG)+Logistic Regression hybrid model,(F-score:0.74)for VGG+Random Forest,(F-score:0.79)for DenseNet+Random Forest,and(F-score:0.79)for VGG+Densenet+Logistic Regression hybrid model on the dataset of histopathological images.
文摘This paper proposes a methodology for an alternative history matching process enhanced by the incorporation of a simplified binary interpretation of reservoir saturation logs(RST) as objective function. Incorporating fluids saturation logs during the history matching phase unlocks the possibility to adjust or select models that better represent the near wellbore waterfront movement, which is particularly important for uncertainty mitigation during future well interference assessments in water driven reservoirs. For the purposes of this study, a semi-synthetic open-source reservoir model was used as base case to evaluate the proposed methodology. The reservoir model represents a water driven, highly heterogenous sandstone reservoir from Namorado field in Brazil. To effectively compare the proposed methodology against the conventional methods, a commercial reservoir simulator was used in combination with a state-of-the-art benchmarking workflow based on the Big LoopTMapproach. A well-known group of binary metrics were evaluated to be used as the objective function, and the Matthew correlation coefficient(MCC) has been proved to offer the best results when using binary data from water saturation logs. History matching results obtained with the proposed methodology allowed the selection of a more reliable group of reservoir models,especially for cases with high heterogeneity. The methodology also offers additional information and understanding of sweep behaviour behind the well casing at specific production zones, thus revealing full model potential to define new wells and reservoir development opportunities.
文摘Glass is the precious material evidence of the trade of the early Silk Road. The ancient glass was easily affected by the environmental impact and weathering, and the change of composition ratios affected the correct judgment of its category. In this paper, mathematical models and methods such as Chi-square test, weighted average method, principal component analysis, cluster analysis, binary classification model and grey correlation analysis were used comprehensively to analyze the data of sample glass products combined with their categories. The results showed that the weathered high-potassium glass could be divided into 12, 9, 10 and 27, 7, 22 and so on.
文摘COVID-19 is a pandemic that has affected nearly every country in the world.At present,sustainable development in the area of public health is considered vital to securing a promising and prosperous future for humans.However,widespread diseases,such as COVID-19,create numerous challenges to this goal,and some of those challenges are not yet defined.In this study,a Shallow Single-Layer Perceptron Neural Network(SSLPNN)and Gaussian Process Regression(GPR)model were used for the classification and prediction of confirmed COVID-19 cases in five geographically distributed regions of Asia with diverse settings and environmental conditions:namely,China,South Korea,Japan,Saudi Arabia,and Pakistan.Significant environmental and non-environmental features were taken as the input dataset,and confirmed COVID-19 cases were taken as the output dataset.A correlation analysis was done to identify patterns in the cases related to fluctuations in the associated variables.The results of this study established that the population and air quality index of a region had a statistically significant influence on the cases.However,age and the human development index had a negative influence on the cases.The proposed SSLPNN-based classification model performed well when predicting the classes of confirmed cases.During training,the binary classification model was highly accurate,with a Root Mean Square Error(RMSE)of 0.91.Likewise,the results of the regression analysis using the GPR technique with Matern 5/2 were highly accurate(RMSE=0.95239)when predicting the number of confirmed COVID-19 cases in an area.However,dynamic management has occupied a core place in studies on the sustainable development of public health but dynamic management depends on proactive strategies based on statistically verified approaches,like Artificial Intelligence(AI).In this study,an SSLPNN model has been trained to fit public health associated data into an appropriate class,allowing GPR to predict the number of confirmed COVID-19 cases in an area based on the given values of selected parameters. Therefore, this tool can help authorities in different ecological settingseffectively manage COVID-19.
文摘Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.
基金the Natural Science Foundation of Zhejiang Province of China (No. Y104540)the Foundation of the Key Laboratory of Advanced Information Science and Network Technology of Beijing, China (No.TDXX0509).
文摘This paper presents a novel bootstrap based method for Receiver Operating Characteristic (ROC) analysis of Fisher classifier. By defining Fisher classifier’s output as a statistic, the bootstrap technique is used to obtain the sampling distributions of the outputs for the positive class and the negative class respectively. As a result, the ROC curve is a plot of all the (False Positive Rate (FPR), True Positive Rate (TPR)) pairs by varying the decision threshold over the whole range of the boot- strap sampling distributions. The advantage of this method is, the bootstrap based ROC curves are much stable than those of the holdout or cross-validation, indicating a more stable ROC analysis of Fisher classifier. Experiments on five data sets publicly available demonstrate the effectiveness of the proposed method.
文摘We continue our study on classification learning algorithms generated by Tikhonov regularization schemes associated with Gaussian kernels and general convex loss functions. Our main purpose of this paper is to improve error bounds by presenting a new comparison theorem associated with general convex loss functions and Tsybakov noise conditions. Some concrete examples are provided to illustrate the improved learning rates which demonstrate the effect of various loss functions for learning algorithms. In our analysis, the convexity of the loss functions plays a central role.
文摘Imaging logging has become a popular means of well logging because it can visually represent the lithologic and structural characteristics of strata.The manual interpretation of imaging logging is affected by the limitations of the naked eye and experiential factors.As a result,manual interpretation accuracy is low.Therefore,it is highly useful to develop effective automatic imaging logging interpretation by machine learning.Resistivity imaging logging is the most widely used technology for imaging logging.In this paper,we propose an automatic extraction procedure for the geological features in resistivity imaging logging images.This procedure is based on machine learning and achieves good results in practical applications.Acknowledging that the existence of valueless data significantly affects the recognition effect,we propose three strategies for the identification of valueless data based on binary classification.We compare the effect of the three strategies both on an experimental dataset and in a production environment,and find that the merging method is the best performing of the three strategies.It effectively identifies the valueless data in the well logging images,thus significantly improving the automatic recognition effect of geological features in resistivity logging images.
文摘In estimation and prediction theory,considerable attention is paid to the question of hav-ing unbiased estimators on a global population level.Recent developments in neural network modelling have mainly focused on accuracy on a granular sample level,and the question of unbi-asedness on the population level has almost completely been neglected by that community.We discuss this question within neural network regression models,and we provide methods of receiving unbiased estimators for these models on the global population level.
基金the National Natural Science Foundation of China(No.61375053)
文摘With rapid development of E-commerce, a large amount of data including reviews about different types of products can be accessed within short time. On top of this, opinion mining is becoming increasingly effective to extract valuable information for product design, improvement and brand marketing, especially with fine-grained opinion mining. However, limited by the unstructured and causal expression of opinions, one cannot extract valuable information conveniently. In this paper, we propose an integrated strategy to automatically extract feature-based information, with which one can easily acquire detailed opinion about certain products.For adaptation to the reviews' characteristics, our strategy is made up of a multi-label classification(MLC) for reviews, a binary classification(BC) for sentences and a sentence-level sequence labelling with a deep learning method. During experiment, our approach achieves 82% accuracy in the final sequence labelling task under the setting of a 20-fold cross validation. In addition, the strategy can be expediently employed in other reviews as long as there is an according amount of labelled data for startup.
基金The author Feng Yin was funded by the Shenzhen Science and Technology Innovation Council(No.JCYJ20170307155957688)and by National Natural Science Foundation of China Key Project(No.61731018)The authors Feng Yin and Shuguang(Robert)Cui were funded by Shenzhen Fundamental Research Funds under Grant(Key Lab)No.ZDSYS201707251409055,Grant(Peacock)No.KQTD2015033114415450,and Guangdong province“The Pearl River Talent Recruitment Program Innovative and Entrepreneurial Teams in 2017”-Data Driven Evolution of Future Intelligent Network Team.The associate editor coordinating the review of this paper and approving it for publication was X.Cheng.
文摘Driven by the need of a plethora of machine learning applications,several attempts have been made at improving the performance of classifiers applied to imbalanced datasets.In this paper,we present a fast maximum entropy machine(MEM)combined with a synthetic minority over-sampling technique for handling binary classification problems with high imbalance ratios,large numbers of data samples,and medium/large numbers of features.A random Fourier feature representation of kernel functions and primal estimated sub-gradient solver for support vector machine(PEGASOS)are applied to speed up the classic MEM.Experiments have been conducted using various real datasets(including two China Mobile datasets and several other standard test datasets)with various configurations.The obtained results demonstrate that the proposed algorithm has extremely low complexity but an excellent overall classification performance(in terms of several widely used evaluation metrics)as compared to the classic MEM and some other state-of-the-art methods.The proposed algorithm is particularly valuable in big data applications owing to its significantly low computational complexity.