Predicting the Underlying Structure for Phylogenetic Trees Using Neural Networks and Logistic Regression

Predicting the Underlying Structure for Phylogenetic Trees Using Neural Networks and Logistic Regression

下载PDF

导出

摘要 Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ from those needed when a population is not structured. In this paper, we compared two supervised machine learning techniques, that is artificial neural network (ANN) and logistic regression models for prediction of an underlying structure for phylogenetic trees. We carried out parameter tuning for the models to identify optimal models. We then performed 10-fold cross-validation on the optimal models for both logistic regression?and ANN. We also performed a non-supervised technique called clustering to identify the number of clusters that could be identified from simulated phylogenetic trees. The trees were from?both structured?and non-structured populations. Clustering and prediction using classification techniques were?done using tree statistics such as Colless, Sackin and cophenetic indices, among others. Results from 10-fold cross-validation revealed that both logistic regression and ANN models had comparable results, with both models having average accuracy rates of over 0.75. Most of the clustering indices used resulted in 2 or 3 as the optimal number of clusters. Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ from those needed when a population is not structured. In this paper, we compared two supervised machine learning techniques, that is artificial neural network (ANN) and logistic regression models for prediction of an underlying structure for phylogenetic trees. We carried out parameter tuning for the models to identify optimal models. We then performed 10-fold cross-validation on the optimal models for both logistic regression?and ANN. We also performed a non-supervised technique called clustering to identify the number of clusters that could be identified from simulated phylogenetic trees. The trees were from?both structured?and non-structured populations. Clustering and prediction using classification techniques were?done using tree statistics such as Colless, Sackin and cophenetic indices, among others. Results from 10-fold cross-validation revealed that both logistic regression and ANN models had comparable results, with both models having average accuracy rates of over 0.75. Most of the clustering indices used resulted in 2 or 3 as the optimal number of clusters.

作者 Hassan W. Kayondo Samuel Mwalili

出处《Open Journal of Statistics》 2020年第2期239-251,共13页 统计学期刊（英文）

关键词 Artificial NEURAL Networks LOGISTIC Regression PHYLOGENETIC TREE TREE STATISTICS Classification Clustering Artificial Neural Networks Logistic Regression Phylogenetic Tree Tree Statistics Classification Clustering

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Bo Xiao,Zhen Wang,Qi Liu,Xiaodong Liu.SMK-means:An Improved Mini Batch K-means Algorithm Based on Mapreduce with Big Data[J].Computers, Materials & Continua,2018(9):365-379. 被引量：1
2Hassan W. Kayondo,Samuel Mwalili,John M. Mango.Inferring Multi-Type Birth-Death Parameters for a Structured Host Population with Application to HIV Epidemic in Africa[J].Computational Molecular Bioscience,2019,9(4):108-131.
3曹鑫磊,冯锋.基于机器学习的细粒度空气质量时间预测器[J].环境保护科学,2020,46(2):81-84.
4Xiaoyue Dai,Yunling Zhu,Wen Xia,Longkun Ding,Yue Xi,Liang Wu,Chengxue Yi.The Analysis of Human Papillomavirus Type 16 E6/E7 Genetic Variability in Jingjiang, Jiangsu Province, China[J].Journal of Biosciences and Medicines,2020,8(3):89-103.
5Kemal Akyol,BahaŞen.Modeling and Predicting of News Popularity in Social Media Sources[J].Computers, Materials & Continua,2019(7):69-80.
6Nasser Ibrahim Abu-El-Noor,Yousef Ibrahim Aljeesh.Identifying and Prioritizing the Research Needs Related to Mental Health in Gaza Strip, Palestine[J].Open Journal of Psychiatry,2015,5(1):19-25.
7XUE Songyan,LI Ang,WANG Jinfei,YI Na,MA Yi,Rahim TAFAZOLLI,Terence DODGSON.To Learn or Not to Learn:Deep Learning Assisted Wireless Modem Design[J].ZTE Communications,2019,17(4):3-11. 被引量：1
8Zhe Liu,Bao Xiang,Yuqing Song,Hu Lu,Qingfeng Liu.An Improved Unsupervised Image Segmentation Method Based on Multi-Objective Particle Swarm Optimization Clustering Algorithm[J].Computers, Materials & Continua,2019(2):451-461. 被引量：3
9Wen-Hua Yu,Gabor Csorba,Yi Wu.Tube-nosed variations–a new species of the genus Murina(Chiroptera: Vespertilionidae) from China[J].Zoological Research,2020,41(1):70-77. 被引量：10
10Malik Yousef,Naim Najami,Loai Abedallah,Waleed Khalifa.Computational Approaches for Biomarker Discovery[J].Journal of Intelligent Learning Systems and Applications,2014,6(4):153-161.

Open Journal of Statistics

2020年第2期

浏览历史

内容加载中请稍等...

Predicting the Underlying Structure for Phylogenetic Trees Using Neural Networks and Logistic Regression

相关作者

相关机构

相关主题

浏览历史