Fraud detection for credit/debit card,loan defaulters and similar types is achievable with the assistance of Machine Learning(ML)algorithms as they are well capable of learning from previous fraud trends or historical...Fraud detection for credit/debit card,loan defaulters and similar types is achievable with the assistance of Machine Learning(ML)algorithms as they are well capable of learning from previous fraud trends or historical data and spot them in current or future transactions.Fraudulent cases are scant in the comparison of non-fraudulent observations,almost in all the datasets.In such cases detecting fraudulent transaction are quite difficult.The most effective way to prevent loan default is to identify non-performing loans as soon as possible.Machine learning algorithms are coming into sight as adept at handling such data with enough computing influence.In this paper,the rendering of different machine learning algorithms such as Decision Tree,Random Forest,linear regression,and Gradient Boosting method are compared for detection and prediction of fraud cases using loan fraudulent manifestations.Further model accuracy metric have been performed with confusion matrix and calculation of accuracy,precision,recall and F-1 score along with Receiver Operating Characteristic(ROC)curves.展开更多
Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technol...Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technologies,prediction and identification of PPIs have become a research hotspot in proteomics.In this study,we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB).First,the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC),pseudo position-specific scoring matrix(Pse PSSM),reduced sequence and index-vectors(RSIV),and autocorrelation descriptor(AD).Second,to remove redundancy and noise,we employ L1-regularized logistic regression(L1-RLR)to select an optimal feature subset.Finally,GTB-PPI model is constructed.Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively.In addition,GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans,Escherichia coli,Homo sapiens,and Mus musculus,the one-core PPI network for CD9,and the crossover PPI network for the Wnt-related signaling pathways.The results show that GTB-PPI can significantly improve accuracy of PPI prediction.The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.展开更多
为揭示新冠疫情背景下公交客流量变化的空间影响因素,以疫情前后公交站点层面客流变化量为因变量,以建成环境、病毒感染情况及病毒传播途径等指标为自变量,构建新冠疫情与建成环境对公交客流量共同影响的线性回归(Ordinary Least Square...为揭示新冠疫情背景下公交客流量变化的空间影响因素,以疫情前后公交站点层面客流变化量为因变量,以建成环境、病毒感染情况及病毒传播途径等指标为自变量,构建新冠疫情与建成环境对公交客流量共同影响的线性回归(Ordinary Least Squares,OLS)模型与梯度提升回归树(Gradient Boosting Regression Trees,GBRT)模型。以广州市为实证对象,基于公交IC卡数据、兴趣点数据(Point of Interest,POI)及道路网络数据等多源异构数据进行模型实证分析。结果表明:考虑非线性效应的GBRT模型比OLS模型具有更好的拟合度;同时,常规公交站点的公交线路数量(22.02%)和到市中心距离(13.56%)是影响疫情背景下公交客流量变化的最重要因素,片区病毒感染与传播情况对疫情防控常态化时期的公交客流量作用有限,居民日常公交出行已经从疫情的影响下逐渐恢复。展开更多
文摘Fraud detection for credit/debit card,loan defaulters and similar types is achievable with the assistance of Machine Learning(ML)algorithms as they are well capable of learning from previous fraud trends or historical data and spot them in current or future transactions.Fraudulent cases are scant in the comparison of non-fraudulent observations,almost in all the datasets.In such cases detecting fraudulent transaction are quite difficult.The most effective way to prevent loan default is to identify non-performing loans as soon as possible.Machine learning algorithms are coming into sight as adept at handling such data with enough computing influence.In this paper,the rendering of different machine learning algorithms such as Decision Tree,Random Forest,linear regression,and Gradient Boosting method are compared for detection and prediction of fraud cases using loan fraudulent manifestations.Further model accuracy metric have been performed with confusion matrix and calculation of accuracy,precision,recall and F-1 score along with Receiver Operating Characteristic(ROC)curves.
基金supported by the National Natural Science Foundation of China(Grant No.61863010)the Key Research and Development Program of Shandong Province of China(Grant No.2019GGX101001)the Natural Science Foundation of Shandong Province of China(Grant No.ZR2018MC007)。
文摘Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technologies,prediction and identification of PPIs have become a research hotspot in proteomics.In this study,we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB).First,the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC),pseudo position-specific scoring matrix(Pse PSSM),reduced sequence and index-vectors(RSIV),and autocorrelation descriptor(AD).Second,to remove redundancy and noise,we employ L1-regularized logistic regression(L1-RLR)to select an optimal feature subset.Finally,GTB-PPI model is constructed.Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively.In addition,GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans,Escherichia coli,Homo sapiens,and Mus musculus,the one-core PPI network for CD9,and the crossover PPI network for the Wnt-related signaling pathways.The results show that GTB-PPI can significantly improve accuracy of PPI prediction.The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.
文摘为揭示新冠疫情背景下公交客流量变化的空间影响因素,以疫情前后公交站点层面客流变化量为因变量,以建成环境、病毒感染情况及病毒传播途径等指标为自变量,构建新冠疫情与建成环境对公交客流量共同影响的线性回归(Ordinary Least Squares,OLS)模型与梯度提升回归树(Gradient Boosting Regression Trees,GBRT)模型。以广州市为实证对象,基于公交IC卡数据、兴趣点数据(Point of Interest,POI)及道路网络数据等多源异构数据进行模型实证分析。结果表明:考虑非线性效应的GBRT模型比OLS模型具有更好的拟合度;同时,常规公交站点的公交线路数量(22.02%)和到市中心距离(13.56%)是影响疫情背景下公交客流量变化的最重要因素,片区病毒感染与传播情况对疫情防控常态化时期的公交客流量作用有限,居民日常公交出行已经从疫情的影响下逐渐恢复。