Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants.Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein...Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants.Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as "hypothetical" were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58.展开更多
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61302186 and 61271378)the Funding from the State Key Laboratory of Bioelectronics of Southeast University
文摘Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants.Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as "hypothetical" were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58.