Nanopores employ the ionic current from the single molecule blockage to identify the structure,conformation,chemical groups and charges of a single molecule.Despite the tremendous development in designing sensitive po...Nanopores employ the ionic current from the single molecule blockage to identify the structure,conformation,chemical groups and charges of a single molecule.Despite the tremendous development in designing sensitive pore-forming materials,at some extent,the analyte with the single group difference still exhibits similar residual current or duration time.The serious overlap in the statistical results of residual current and duration time brings the difficulties in the nanopore discrimination of each single molecules from the mixture.In this paper,we present the AdaBoost-based machine learning model to identify the multiple analyte with single group difference in the mixed blockages.A set of feature vectors which is obtained from Hidden Markov Model(HMM)is used to train the AdaBoost model.By employing the aerolysin sensing of 5ʹ-AAAA-3ʹ(AA3)and 5ʹ-GAAA-3ʹ(GA3)as the model system,our results show that AdaBoost model increases the identification accu-racy from~0.293 to above 0.991.Furthermore,five sets of mixed blockages of AA3 and GA3 further validate the average accuracy of training and validation,which are 0.997 and 0.989,respectively.The proposed methods improve the capacity of wild-type biological nanopore in efficiently identify the single nucleotide difference without designing of protein and optimizing of the experimental condition.Therefore,the AdaBoost-based machine learning approach could promote the nanopore practical application such as genetic and epigenetic detection.展开更多
基金This research was supported by the National Natural Science Foundation of China(6187118,2183400 and 21711530216)the“Chen Guang”project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation(17CG27).
文摘Nanopores employ the ionic current from the single molecule blockage to identify the structure,conformation,chemical groups and charges of a single molecule.Despite the tremendous development in designing sensitive pore-forming materials,at some extent,the analyte with the single group difference still exhibits similar residual current or duration time.The serious overlap in the statistical results of residual current and duration time brings the difficulties in the nanopore discrimination of each single molecules from the mixture.In this paper,we present the AdaBoost-based machine learning model to identify the multiple analyte with single group difference in the mixed blockages.A set of feature vectors which is obtained from Hidden Markov Model(HMM)is used to train the AdaBoost model.By employing the aerolysin sensing of 5ʹ-AAAA-3ʹ(AA3)and 5ʹ-GAAA-3ʹ(GA3)as the model system,our results show that AdaBoost model increases the identification accu-racy from~0.293 to above 0.991.Furthermore,five sets of mixed blockages of AA3 and GA3 further validate the average accuracy of training and validation,which are 0.997 and 0.989,respectively.The proposed methods improve the capacity of wild-type biological nanopore in efficiently identify the single nucleotide difference without designing of protein and optimizing of the experimental condition.Therefore,the AdaBoost-based machine learning approach could promote the nanopore practical application such as genetic and epigenetic detection.