Fertility is the most crucial step in the development process,which is controlled by many fertility-related proteins,including spermatogenesis-,oogenesis-and embryogenesis-related proteins.The identification of fertil...Fertility is the most crucial step in the development process,which is controlled by many fertility-related proteins,including spermatogenesis-,oogenesis-and embryogenesis-related proteins.The identification of fertility-related proteins can provide important clues for studying the role of these proteins in development.Therefore,in this study,we constructed a two-layer classifier to identify fertility-related proteins.In this classifier,we first used the composition of amino acids(AA)and their physical and chemical properties to code these three fertility-related proteins.Then,the feature set is optimized by analysis of variance(ANOVA)and incremental feature selection(IFS)to obtain the optimal feature subset.Through five-fold cross-validation(CV)and independent data tests,the performance of models constructed by different machine learning(ML)methods is evaluated and compared.Finally,based on support vector machine(SVM),we obtained a two-layer model to classify three fertility-related proteins.On the independent test data set,the accuracy(ACC)and the area under the receiver operating characteristic curve(AUC)of the first layer classifier are 81.95%and 0.89,respectively,and them of the second layer classifier are 84.74%and 0.90,respectively.These results show that the proposed model has stable performance and satisfactory prediction accuracy,and can become a powerful model to identify more fertility related proteins.展开更多
基金funded by the Sichuan Major Science and Technology Project(2021ZDZX0009)the National Natural Science Foundation of China(Grant No.035Z2060).
文摘Fertility is the most crucial step in the development process,which is controlled by many fertility-related proteins,including spermatogenesis-,oogenesis-and embryogenesis-related proteins.The identification of fertility-related proteins can provide important clues for studying the role of these proteins in development.Therefore,in this study,we constructed a two-layer classifier to identify fertility-related proteins.In this classifier,we first used the composition of amino acids(AA)and their physical and chemical properties to code these three fertility-related proteins.Then,the feature set is optimized by analysis of variance(ANOVA)and incremental feature selection(IFS)to obtain the optimal feature subset.Through five-fold cross-validation(CV)and independent data tests,the performance of models constructed by different machine learning(ML)methods is evaluated and compared.Finally,based on support vector machine(SVM),we obtained a two-layer model to classify three fertility-related proteins.On the independent test data set,the accuracy(ACC)and the area under the receiver operating characteristic curve(AUC)of the first layer classifier are 81.95%and 0.89,respectively,and them of the second layer classifier are 84.74%and 0.90,respectively.These results show that the proposed model has stable performance and satisfactory prediction accuracy,and can become a powerful model to identify more fertility related proteins.