Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Cenome Project, it comes the postgenome era when the proteomics technology is emerging. This paper stud...Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Cenome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (∑, +, *) is introduced, where ∑ is the set of 64 codons. According to the characteristics of (∑, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, +,×) is a field. Furthermore, the operational results display that the eodon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysiea Siniea 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).展开更多
The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of Multiple...The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of MultipleSequence Alignment (MSA). The techniques that concern with speed ignoreaccuracy, whereas techniques that concern with accuracy ignore speed. Theterm alignment means to get the similarity in different sequences with highaccuracy. The more growing number of sequences leads to a very complexand complicated problem. Because of the emergence;rapid development;anddependence on gene sequencing, sequence alignment has become importantin every biological relationship analysis process. Calculating the numberof similar amino acids is the primary method for proving that there is arelationship between two sequences. The time is a main issue in any alignmenttechnique. In this paper, a more effective MSA method for handling themassive multiple protein sequences alignment maintaining the highest accuracy with less time consumption is proposed. The proposed method dependson Artificial Fish Swarm (AFS) algorithm that can break down the mostchallenges of MSA problems. The AFS is exploited to obtain high accuracyin adequate time. ASF has been increasing popularly in various applicationssuch as artificial intelligence, computer vision, machine learning, and dataintensive application. It basically mimics the behavior of fish trying to getthe food in nature. The proposed mechanisms of AFS that is like preying,swarming, following, moving, and leaping help in increasing the accuracy andconcerning the speed by decreasing execution time. The sense organs that aidthe artificial fishes to collect information and vision from the environmenthelp in concerning the accuracy. These features of the proposed AFS make thealignment operation more efficient and are suitable especially for large-scaledata. The implementation and experimental results put the proposed AFS as afirst choice in the queue of alignment compared to the well-known algorithmsin multiple sequence alignment.展开更多
A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the ne...A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CCR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CCR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.展开更多
Protein sequences as special heterogeneous sequences are rare in the amino acid sequence space. The specific sequen- tial order of amino acids of a protein is essential to its 3D structure. On the whole, the correlati...Protein sequences as special heterogeneous sequences are rare in the amino acid sequence space. The specific sequen- tial order of amino acids of a protein is essential to its 3D structure. On the whole, the correlation between sequence and structure of a protein is not so strong. How well would a protein sequence contain its structural information? How does a sequence determine its native structure? Keeping the globular proteins in mind, we discuss several problems from sequence to structure.展开更多
Matrix-assisted laser desorption/ionization(MALDI)mass spectrometry(MS)plays an indispensable role in analyzing protein covalent structures.The reliable identification of amino acid residues and modifications relies o...Matrix-assisted laser desorption/ionization(MALDI)mass spectrometry(MS)plays an indispensable role in analyzing protein covalent structures.The reliable identification of amino acid residues and modifications relies on the mass accuracy,which is highly dependent on calibration.However,the accuracy provided by the currently available calibrants still needs further improvement in terms of compatibility with multiple tandem MS modes or ion polarity modes,calibratable range,and minimizing suppression of and interference with analyte signals.Here aiming at developing a versatile calibrant to solve these problem,we designed a synthetic peptide format of calibrant R_x(GDP_n)_m(referred to as“Gly-Asp-Pro,GDP”)according to the chemical natures of amino acids and polypeptide fragmentation rules in tandem MS.With four types of amino acid residues selected and arranged through rational designs,a GDP peptide produces highly regulated fragments that give rise to evenly spaced signals in each tandem MS mode and is compatible with both positive and negative ion modes.In internal calibration,its regulated fragmentation pattern minimizes interference with analyte signals,and using a single peptide as the input minimizes suppression of the analyte signals.As demonstrated by analyses of proteins including monoclonal antibody and Aβ-42,these features allowed significant increase of the mass accuracy and precision,which improved sequence coverage and sequence resolution in sequence analyses(including de novo sequencing).This rational design strategy may also inspire further development of synthetic calibrants that benefit structural analysis of biomolecules.展开更多
In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequ...In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequently, we look here for the sequences 1) composing human and mouse proteins different from antigen receptors, 2) identical with or highly similar to nucleotide sequence representatives of conserved variable immunoglobulin segments and 3) identical with or closely related to phosphorylation sites. More precisely, we searched for the corresponding actual pairs of DNA and protein sequence segments using five-step bilingual approach employing among others a) different types of BLAST searches, b) two in-principle-different machine-learning methods predicting phosphorylated sites and c) two large databases recording existing phosphorylation sites. The approach identified seven existing phosphorylation sites and thirty-seven related human and mouse segments achieving limits for several predictions or phylogenic parameters. Mostly serines phosporylated with ataxia-telangiectasia-related kinase (involved in regulation of DNA-double-strand-break repair) were indicated or predicted in this study. Hypermutation motifs, located in effective positions of the selected sequence segments, occurred significantly less frequently in transcribed than non-transcribed DNA strands suggesting thus the incidence of mutation events. In addition, marked differences between the numbers and proportions of human and mouse cancer-related sequence items were found in different steps of selection process. The possible role of hypermutation changes within the selected segments and the observed structural relationships are discussed here with respect to DNA damage, carcinogenesis, cancer vaccination, ageing and evolution. Taken together, our data represent additional and sometimes perhaps complementary information to the existing databases of empirically proven phosphorylation sites or pathogenically important spots.展开更多
BACKGROUND Special AT-rich sequence binding protein 2(SATB2)-associated syndrome(SAS;OMIM 612313)is an autosomal dominant disorder.Alterations in the SATB2 gene have been identified as causative.CASE SUMMARY We report...BACKGROUND Special AT-rich sequence binding protein 2(SATB2)-associated syndrome(SAS;OMIM 612313)is an autosomal dominant disorder.Alterations in the SATB2 gene have been identified as causative.CASE SUMMARY We report a case of a 13-year-old Chinese boy with lifelong global developmental delay,speech and language delay,and intellectual disabilities.He had short stature and irregular dentition,but no other abnormal clinical findings.A de novo heterozygous nonsense point mutation was detected by genetic analysis in exon 6 of SATB2,c.687C>A(p.Y229X)(NCBI reference sequence:NM_001172509.2),and neither of his parents had the mutation.This mutation is the first reported and was evaluated as pathogenic according to the guidelines from the American College of Medical Genetics and Genomics.SAS was diagnosed,and special education performed.Our report of a SAS case in China caused by a SATB2 mutation expanded the genotype options for the disease.The heterogeneous manifestations can be induced by complicated pathogenic involvements and functions of SATB2 from reviewed literatures:(1)SATB2 haploinsufficiency;(2)the interference of truncated SATB2 protein to wild-type SATB2;and(3)different numerous genes regulated by SATB2 in brain and skeletal development in different developmental stages.CONCLUSION Global developmental delays are usually the initial presentations,and the diagnosis was challenging before other presentations occurred.Regular follow-up and genetic analysis can help to diagnose SAS early.Verification for genes affected by SATB2 mutations for heterogeneous manifestations may help to clarify the possible pathogenesis of SAS in the future.展开更多
Biological raw data are growing exponentially, providing a large amount of information on what life is. It is believed that potential functions and the rules governing protein behaviors can be revealed from analysis o...Biological raw data are growing exponentially, providing a large amount of information on what life is. It is believed that potential functions and the rules governing protein behaviors can be revealed from analysis on known native structures of proteins. Many knowledge-based potentials for proteins have been proposed. Contrary to most existing review articles which mainly describe technical details and applications of various potential models, the main foci for the discussion here are ideas and concepts involving the construction of potentials, including the relation between free energy and energy, the additivity of potentials of mean force and some key issues in potential construction. Sequence analysis is briefly viewed from an energetic viewpoint.展开更多
Machine intelligence,is out of the system by the artificial intelligence shown.It is usually achieved by the average computer intelligence.Rough sets and Information Granules in uncertainty management and soft computi...Machine intelligence,is out of the system by the artificial intelligence shown.It is usually achieved by the average computer intelligence.Rough sets and Information Granules in uncertainty management and soft computing and granular computing is widely used in many fields,such as in protein sequence analysis and biobasis determination,TSM and Web service classification Etc.展开更多
A novel method based on discrete wavelet transform (DWT) and cross-covariance for revealing the evolution of species at different spatial resolutions is presented. The trypsin proteins of different species are chose...A novel method based on discrete wavelet transform (DWT) and cross-covariance for revealing the evolution of species at different spatial resolutions is presented. The trypsin proteins of different species are chosen as an example to describe the evolution relationship according to the evolution vectors by using this method. The results indicated that this method is a promising approach to reveal species evolution at different spatial resolutions.展开更多
Mitotic catastrophe(MC)is a form of programmed cell death induced by mitotic process disorders,which is very important in tumor prevention,development,and drug resistance.Because rapidly increased data for MC is vigor...Mitotic catastrophe(MC)is a form of programmed cell death induced by mitotic process disorders,which is very important in tumor prevention,development,and drug resistance.Because rapidly increased data for MC is vigorously promoting the tumor-related biomedical and clinical study,it is urgent for us to develop a professional and comprehensive database to curate MC-related data.Mitotic Catastrophe Database(MCDB)consists of 1214 genes/proteins and 5014 compounds collected and organized from more than 8000 research articles.Also,MCDB defines the confidence level,classification criteria,and uniform naming rules for MC-related data,which greatly improves data reliability and retrieval convenience.Moreover,MCDB develops protein sequence alignment and target prediction functions.The former can be used to predict new potential MC-related genes and proteins,and the latter can facilitate the identification of potential target proteins of unknown MC-related compounds.In short,MCDB is such a proprietary,standard,and comprehensive database for MC-relate data that will facilitate the exploration of MC from chemists to biologists in the fields of medicinal chemistry,molecular biology,bioinformatics,oncology and so on.The MCDB is distributed on http://www.combio-lezhang.online/MCDB/indexhtml/.展开更多
Background Farnesoid X receptor (FXR) regulates tumorigenesis, but its clinical significance in gallbladder cancer (GBC) remains unclear. This study investigated its clinical and prognostic significance in GBC pat...Background Farnesoid X receptor (FXR) regulates tumorigenesis, but its clinical significance in gallbladder cancer (GBC) remains unclear. This study investigated its clinical and prognostic significance in GBC patients, as well as its association with the anti-apoptotic protein, myeloid cell leukemia sequence 1 (MCL1) protein. Methods FXR and MCL1 expression in 42 primary GBC and 15 normal gallbladder tissues were analyzed by immunohistochemistry. The patients and samples were collected from Ren Ji Hospital from January 2005 to December 2010. Their association with clinicopathologic factors and prognosis, as well as the correlation between FXR and MCL1 protein expression were analyzed by statistical analyses. Results Compared with normal gallbladder tissues, FXR expression was decreased and MCL1 expression was increased in GBC, during progression of tumor node metastasis (TNM) stage. The Kaplan-Meier survival analysis showed that FXR low-expression and MCL1 over-expression were significantly associated with overall poor survival. Furthermore, multivariate analysis showed that FXR and MCL1 are both prognostic factors for GBC patients. FXR low-expression was significantly correlated with MCL1 over-expression. Conclusion FXR might be a new molecular marker to predict the prognosis of patients with GBC and a novel therapeutic target. Chin Med J 2014;127 (14): 2637-2642展开更多
Single molecule protein sequencing would tremendously impact in proteomics and human biology and it would promote the development of novel diagnostic and therapeutic approaches.However,its technological realization ca...Single molecule protein sequencing would tremendously impact in proteomics and human biology and it would promote the development of novel diagnostic and therapeutic approaches.However,its technological realization can only be envisioned,and huge challenges need to be overcome.Major difficulties are inherent to the structure of proteins,which are composed by several different amino-acids.Despite long standing efforts,only few complex techniques,such as Edman degradation,liquid chromatography and mass spectroscopy,make protein sequencing possible.Unfortunately,these techniques present significant limitations in terms of amount of sample required and dynamic range of measurement.It is known that proteins can distinguish closely similar molecules.Moreover,several proteins can work as biological nanopores in order to perform single molecule detection and sequencing.Unfortunately,while DNA sequencing by means of nanopores is demonstrated,very few examples of nanopores able to perform reliable protein-sequencing have been reported sofar.Here,we investigate,by means of molecular dynamics simulations,how a re-engineered protein,acting as biological nanopore,can be used to recognize the sequence of a translocating peptide by sensing the MshapeH of individual amino-acids.In our simulations we demonstrate that it is possible to discriminate with high fidelity,9 different amino-acids in a short peptide translocating through the engineered construct.The method,here shown for fluorescence-based sequencing,does not require any labelling of the peptidic analyte.These results can pave the way for a new and highly sensitive method of sequencing.展开更多
Objective To obtain the nucleotide sequence and deduced amino acid sequence of cholesteryl ester transfer protein (CETP) cDNA from the tree shrew (Tupaia glis).Methods The cDNA sequence of the tree shrew CETP was obta...Objective To obtain the nucleotide sequence and deduced amino acid sequence of cholesteryl ester transfer protein (CETP) cDNA from the tree shrew (Tupaia glis).Methods The cDNA sequence of the tree shrew CETP was obtained by utilizing the technique of switching mechanism at 5' end of RNA transcript (SMART) and rapid amplification of cDNA end (RACE) from the first strand of the cDNA. The amino acid sequence of CETP was deduced from the cDNA sequence and its primary and secondary structures were predicted.Results The sequence of CETP cDNA from tree shrew (GenBank accession number AF334033) covers 1636 bp, including 178 bp at the 3' end of the untranslated region and a 1458 bp fragment in a coding region, which provides the complete sequence of mature tree shrew CETP, although not the initiator methionine. The first 24 bp encodes a partial signal peptide. The mature protein consists of 477 amino acids and is longer than the human version by one amino acid (Gly318). Comparing this amino acid sequence with those of other animals' CETPs, the identity between tree shrew and human and rabbit CETP is 88% and 82%, respectively. The protein is extremely hydrophobic as it contains many hydrophobic residues, especially at the C-terminal, consistent with its function in the transfer of neutral lipids. The amino acid residues concerning with binding and transferring neutral lipids are highly conserved. There is a deletion of an N-linked glycosylation site at Asn342 in the tree shrew CETP protein that may participate in the removal of peripheral cholesterol and cholesteryl ester by increasing its activity of transferring cholesteryl ester.Conclusion The possible glycosylation in the tree shrew CETP may be involved in the molecular mechanism of its insusceptibility to atherosclerosis.展开更多
Abstract Objective To develop the vaccine of Chinese Schistosomiasis japonicum, we try to prepare the 23kDa membrane protein of Schistosoma japonicum Chinese strain with the gene cloning techniques. Methods A pair ...Abstract Objective To develop the vaccine of Chinese Schistosomiasis japonicum, we try to prepare the 23kDa membrane protein of Schistosoma japonicum Chinese strain with the gene cloning techniques. Methods A pair of primers P1 and P2 was systhesized according to the DNA sequence of the 23kDa membrane protein of Schistosoma mansoni, a BamH1 site was added at 5' end of the primer P1 and the Sall site was added at the 5' end of the primer P2. The gene DNA fragment of the 23kDa membrane protein (SjC23) of Schistosoma japonicum was amplified from the cDNA library of Schistosoma japonica by PCR, the purified target DNA fragment was inserted into the vector pUC18/19 to form the recombinant, and sequenced in Livopool University, UK and Fudan Universtiy, China respectively. The DNA sequence was analyzed with Dnasis software, and the amino sequence was deduced with the SWISS PORT software. Results The size of DNA of 23kDa membrane protein of Schistosoma japonica Chinese strain (SjC23) was 657bp, and it was the same size as that of Sm23 and Sj23 Philippine strain. The DNA sequence of Sj23 Chinese strain (SjC23) was 100% in homology with the SjC23 Philippine strain, and 79.5% in homology with Sm23. The deduced amino acid sequence of SjC23 was 84% in homology with the Sm23, and 100% in homology with Philippine strain. There were two hydrophilic domains in the SjC23, one was located at the N terminal (amino acid 36-56), and another was at the C terminal (amino acid 108-183). Conclusions The gene of the 23kDa membrane protein of Schistosoma japonica Chinese strain has been cloned, and this work has laid the foundation for the development of the vaccine of Schistosoma japonica Chinese strain.展开更多
基金Project supported in part by the International Technology Collaboration Research Program of China (Grant No 2007DFA706700)
文摘Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Cenome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (∑, +, *) is introduced, where ∑ is the set of 64 codons. According to the characteristics of (∑, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, +,×) is a field. Furthermore, the operational results display that the eodon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysiea Siniea 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).
基金The authors extend their appreciation to the Deanship of Scientific Research at Jouf University for funding this work through research Grant No(DSR2020–01–414).
文摘The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of MultipleSequence Alignment (MSA). The techniques that concern with speed ignoreaccuracy, whereas techniques that concern with accuracy ignore speed. Theterm alignment means to get the similarity in different sequences with highaccuracy. The more growing number of sequences leads to a very complexand complicated problem. Because of the emergence;rapid development;anddependence on gene sequencing, sequence alignment has become importantin every biological relationship analysis process. Calculating the numberof similar amino acids is the primary method for proving that there is arelationship between two sequences. The time is a main issue in any alignmenttechnique. In this paper, a more effective MSA method for handling themassive multiple protein sequences alignment maintaining the highest accuracy with less time consumption is proposed. The proposed method dependson Artificial Fish Swarm (AFS) algorithm that can break down the mostchallenges of MSA problems. The AFS is exploited to obtain high accuracyin adequate time. ASF has been increasing popularly in various applicationssuch as artificial intelligence, computer vision, machine learning, and dataintensive application. It basically mimics the behavior of fish trying to getthe food in nature. The proposed mechanisms of AFS that is like preying,swarming, following, moving, and leaping help in increasing the accuracy andconcerning the speed by decreasing execution time. The sense organs that aidthe artificial fishes to collect information and vision from the environmenthelp in concerning the accuracy. These features of the proposed AFS make thealignment operation more efficient and are suitable especially for large-scaledata. The implementation and experimental results put the proposed AFS as afirst choice in the queue of alignment compared to the well-known algorithmsin multiple sequence alignment.
基金Project supported by the National Natural Science Foundation of China (Grant No 60575038)the Natural Science Foundation of Jiangnan University, China (Grant No 20070365)the Program for Innovative Research Team of Jiangnan University, China
文摘A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CCR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CCR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.
基金supported by the National Natural Science Foundation of China (Grant Nos. 11175224 and 11121403)
文摘Protein sequences as special heterogeneous sequences are rare in the amino acid sequence space. The specific sequen- tial order of amino acids of a protein is essential to its 3D structure. On the whole, the correlation between sequence and structure of a protein is not so strong. How well would a protein sequence contain its structural information? How does a sequence determine its native structure? Keeping the globular proteins in mind, we discuss several problems from sequence to structure.
基金supported by grants from the National Natural Science Foundation of China(No.21974069)Open Fund Programs of Shenzhen Bay Laboratory(No.SZBL2020090501001)。
文摘Matrix-assisted laser desorption/ionization(MALDI)mass spectrometry(MS)plays an indispensable role in analyzing protein covalent structures.The reliable identification of amino acid residues and modifications relies on the mass accuracy,which is highly dependent on calibration.However,the accuracy provided by the currently available calibrants still needs further improvement in terms of compatibility with multiple tandem MS modes or ion polarity modes,calibratable range,and minimizing suppression of and interference with analyte signals.Here aiming at developing a versatile calibrant to solve these problem,we designed a synthetic peptide format of calibrant R_x(GDP_n)_m(referred to as“Gly-Asp-Pro,GDP”)according to the chemical natures of amino acids and polypeptide fragmentation rules in tandem MS.With four types of amino acid residues selected and arranged through rational designs,a GDP peptide produces highly regulated fragments that give rise to evenly spaced signals in each tandem MS mode and is compatible with both positive and negative ion modes.In internal calibration,its regulated fragmentation pattern minimizes interference with analyte signals,and using a single peptide as the input minimizes suppression of the analyte signals.As demonstrated by analyses of proteins including monoclonal antibody and Aβ-42,these features allowed significant increase of the mass accuracy and precision,which improved sequence coverage and sequence resolution in sequence analyses(including de novo sequencing).This rational design strategy may also inspire further development of synthetic calibrants that benefit structural analysis of biomolecules.
文摘In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequently, we look here for the sequences 1) composing human and mouse proteins different from antigen receptors, 2) identical with or highly similar to nucleotide sequence representatives of conserved variable immunoglobulin segments and 3) identical with or closely related to phosphorylation sites. More precisely, we searched for the corresponding actual pairs of DNA and protein sequence segments using five-step bilingual approach employing among others a) different types of BLAST searches, b) two in-principle-different machine-learning methods predicting phosphorylated sites and c) two large databases recording existing phosphorylation sites. The approach identified seven existing phosphorylation sites and thirty-seven related human and mouse segments achieving limits for several predictions or phylogenic parameters. Mostly serines phosporylated with ataxia-telangiectasia-related kinase (involved in regulation of DNA-double-strand-break repair) were indicated or predicted in this study. Hypermutation motifs, located in effective positions of the selected sequence segments, occurred significantly less frequently in transcribed than non-transcribed DNA strands suggesting thus the incidence of mutation events. In addition, marked differences between the numbers and proportions of human and mouse cancer-related sequence items were found in different steps of selection process. The possible role of hypermutation changes within the selected segments and the observed structural relationships are discussed here with respect to DNA damage, carcinogenesis, cancer vaccination, ageing and evolution. Taken together, our data represent additional and sometimes perhaps complementary information to the existing databases of empirically proven phosphorylation sites or pathogenically important spots.
文摘BACKGROUND Special AT-rich sequence binding protein 2(SATB2)-associated syndrome(SAS;OMIM 612313)is an autosomal dominant disorder.Alterations in the SATB2 gene have been identified as causative.CASE SUMMARY We report a case of a 13-year-old Chinese boy with lifelong global developmental delay,speech and language delay,and intellectual disabilities.He had short stature and irregular dentition,but no other abnormal clinical findings.A de novo heterozygous nonsense point mutation was detected by genetic analysis in exon 6 of SATB2,c.687C>A(p.Y229X)(NCBI reference sequence:NM_001172509.2),and neither of his parents had the mutation.This mutation is the first reported and was evaluated as pathogenic according to the guidelines from the American College of Medical Genetics and Genomics.SAS was diagnosed,and special education performed.Our report of a SAS case in China caused by a SATB2 mutation expanded the genotype options for the disease.The heterogeneous manifestations can be induced by complicated pathogenic involvements and functions of SATB2 from reviewed literatures:(1)SATB2 haploinsufficiency;(2)the interference of truncated SATB2 protein to wild-type SATB2;and(3)different numerous genes regulated by SATB2 in brain and skeletal development in different developmental stages.CONCLUSION Global developmental delays are usually the initial presentations,and the diagnosis was challenging before other presentations occurred.Regular follow-up and genetic analysis can help to diagnose SAS early.Verification for genes affected by SATB2 mutations for heterogeneous manifestations may help to clarify the possible pathogenesis of SAS in the future.
基金Project supported in part by the National Natural Science Foundation of China(Grant Nos.11175224 and 11121403)
文摘Biological raw data are growing exponentially, providing a large amount of information on what life is. It is believed that potential functions and the rules governing protein behaviors can be revealed from analysis on known native structures of proteins. Many knowledge-based potentials for proteins have been proposed. Contrary to most existing review articles which mainly describe technical details and applications of various potential models, the main foci for the discussion here are ideas and concepts involving the construction of potentials, including the relation between free energy and energy, the additivity of potentials of mean force and some key issues in potential construction. Sequence analysis is briefly viewed from an energetic viewpoint.
文摘Machine intelligence,is out of the system by the artificial intelligence shown.It is usually achieved by the average computer intelligence.Rough sets and Information Granules in uncertainty management and soft computing and granular computing is widely used in many fields,such as in protein sequence analysis and biobasis determination,TSM and Web service classification Etc.
基金We thank the National Natural Science Foundation of China(Project No.29975033)the Education Office Program of Jiangxi province([2005]242)for financial support
文摘A novel method based on discrete wavelet transform (DWT) and cross-covariance for revealing the evolution of species at different spatial resolutions is presented. The trypsin proteins of different species are chosen as an example to describe the evolution relationship according to the evolution vectors by using this method. The results indicated that this method is a promising approach to reveal species evolution at different spatial resolutions.
基金supported by grants from National Natural Science Foundation of China(Grant Nos.81803755 and 81922064)National Science and Technology Major Project(Grant No.2018ZX10201002,China)+1 种基金China Postdoctoral ScienceFoundation(2018M640926 and 2020M673221)Sichuan University Postdoctoral Research and Development Foundation(2020SCU12062 and 2020SCU12056,China)。
文摘Mitotic catastrophe(MC)is a form of programmed cell death induced by mitotic process disorders,which is very important in tumor prevention,development,and drug resistance.Because rapidly increased data for MC is vigorously promoting the tumor-related biomedical and clinical study,it is urgent for us to develop a professional and comprehensive database to curate MC-related data.Mitotic Catastrophe Database(MCDB)consists of 1214 genes/proteins and 5014 compounds collected and organized from more than 8000 research articles.Also,MCDB defines the confidence level,classification criteria,and uniform naming rules for MC-related data,which greatly improves data reliability and retrieval convenience.Moreover,MCDB develops protein sequence alignment and target prediction functions.The former can be used to predict new potential MC-related genes and proteins,and the latter can facilitate the identification of potential target proteins of unknown MC-related compounds.In short,MCDB is such a proprietary,standard,and comprehensive database for MC-relate data that will facilitate the exploration of MC from chemists to biologists in the fields of medicinal chemistry,molecular biology,bioinformatics,oncology and so on.The MCDB is distributed on http://www.combio-lezhang.online/MCDB/indexhtml/.
文摘Background Farnesoid X receptor (FXR) regulates tumorigenesis, but its clinical significance in gallbladder cancer (GBC) remains unclear. This study investigated its clinical and prognostic significance in GBC patients, as well as its association with the anti-apoptotic protein, myeloid cell leukemia sequence 1 (MCL1) protein. Methods FXR and MCL1 expression in 42 primary GBC and 15 normal gallbladder tissues were analyzed by immunohistochemistry. The patients and samples were collected from Ren Ji Hospital from January 2005 to December 2010. Their association with clinicopathologic factors and prognosis, as well as the correlation between FXR and MCL1 protein expression were analyzed by statistical analyses. Results Compared with normal gallbladder tissues, FXR expression was decreased and MCL1 expression was increased in GBC, during progression of tumor node metastasis (TNM) stage. The Kaplan-Meier survival analysis showed that FXR low-expression and MCL1 over-expression were significantly associated with overall poor survival. Furthermore, multivariate analysis showed that FXR and MCL1 are both prognostic factors for GBC patients. FXR low-expression was significantly correlated with MCL1 over-expression. Conclusion FXR might be a new molecular marker to predict the prognosis of patients with GBC and a novel therapeutic target. Chin Med J 2014;127 (14): 2637-2642
基金the Horizon 2020 Program,FET-Open:PROSEQO,Grant Agreement no.[687089].We acknowledge PRACE for awarding us access to Marconi at CINECA,Italy.
文摘Single molecule protein sequencing would tremendously impact in proteomics and human biology and it would promote the development of novel diagnostic and therapeutic approaches.However,its technological realization can only be envisioned,and huge challenges need to be overcome.Major difficulties are inherent to the structure of proteins,which are composed by several different amino-acids.Despite long standing efforts,only few complex techniques,such as Edman degradation,liquid chromatography and mass spectroscopy,make protein sequencing possible.Unfortunately,these techniques present significant limitations in terms of amount of sample required and dynamic range of measurement.It is known that proteins can distinguish closely similar molecules.Moreover,several proteins can work as biological nanopores in order to perform single molecule detection and sequencing.Unfortunately,while DNA sequencing by means of nanopores is demonstrated,very few examples of nanopores able to perform reliable protein-sequencing have been reported sofar.Here,we investigate,by means of molecular dynamics simulations,how a re-engineered protein,acting as biological nanopore,can be used to recognize the sequence of a translocating peptide by sensing the MshapeH of individual amino-acids.In our simulations we demonstrate that it is possible to discriminate with high fidelity,9 different amino-acids in a short peptide translocating through the engineered construct.The method,here shown for fluorescence-based sequencing,does not require any labelling of the peptidic analyte.These results can pave the way for a new and highly sensitive method of sequencing.
基金This work was supported by the grants from the National Sciences Foundation of China(No.39770168)and the National Program forKey Basic Research Projects-973(No.G2000056902).
文摘Objective To obtain the nucleotide sequence and deduced amino acid sequence of cholesteryl ester transfer protein (CETP) cDNA from the tree shrew (Tupaia glis).Methods The cDNA sequence of the tree shrew CETP was obtained by utilizing the technique of switching mechanism at 5' end of RNA transcript (SMART) and rapid amplification of cDNA end (RACE) from the first strand of the cDNA. The amino acid sequence of CETP was deduced from the cDNA sequence and its primary and secondary structures were predicted.Results The sequence of CETP cDNA from tree shrew (GenBank accession number AF334033) covers 1636 bp, including 178 bp at the 3' end of the untranslated region and a 1458 bp fragment in a coding region, which provides the complete sequence of mature tree shrew CETP, although not the initiator methionine. The first 24 bp encodes a partial signal peptide. The mature protein consists of 477 amino acids and is longer than the human version by one amino acid (Gly318). Comparing this amino acid sequence with those of other animals' CETPs, the identity between tree shrew and human and rabbit CETP is 88% and 82%, respectively. The protein is extremely hydrophobic as it contains many hydrophobic residues, especially at the C-terminal, consistent with its function in the transfer of neutral lipids. The amino acid residues concerning with binding and transferring neutral lipids are highly conserved. There is a deletion of an N-linked glycosylation site at Asn342 in the tree shrew CETP protein that may participate in the removal of peripheral cholesterol and cholesteryl ester by increasing its activity of transferring cholesteryl ester.Conclusion The possible glycosylation in the tree shrew CETP may be involved in the molecular mechanism of its insusceptibility to atherosclerosis.
文摘Abstract Objective To develop the vaccine of Chinese Schistosomiasis japonicum, we try to prepare the 23kDa membrane protein of Schistosoma japonicum Chinese strain with the gene cloning techniques. Methods A pair of primers P1 and P2 was systhesized according to the DNA sequence of the 23kDa membrane protein of Schistosoma mansoni, a BamH1 site was added at 5' end of the primer P1 and the Sall site was added at the 5' end of the primer P2. The gene DNA fragment of the 23kDa membrane protein (SjC23) of Schistosoma japonicum was amplified from the cDNA library of Schistosoma japonica by PCR, the purified target DNA fragment was inserted into the vector pUC18/19 to form the recombinant, and sequenced in Livopool University, UK and Fudan Universtiy, China respectively. The DNA sequence was analyzed with Dnasis software, and the amino sequence was deduced with the SWISS PORT software. Results The size of DNA of 23kDa membrane protein of Schistosoma japonica Chinese strain (SjC23) was 657bp, and it was the same size as that of Sm23 and Sj23 Philippine strain. The DNA sequence of Sj23 Chinese strain (SjC23) was 100% in homology with the SjC23 Philippine strain, and 79.5% in homology with Sm23. The deduced amino acid sequence of SjC23 was 84% in homology with the Sm23, and 100% in homology with Philippine strain. There were two hydrophilic domains in the SjC23, one was located at the N terminal (amino acid 36-56), and another was at the C terminal (amino acid 108-183). Conclusions The gene of the 23kDa membrane protein of Schistosoma japonica Chinese strain has been cloned, and this work has laid the foundation for the development of the vaccine of Schistosoma japonica Chinese strain.