Background:With the diffusion of SARS-CoV-2 around the world,human health is being threatened.As there is no effective vaccine yet,the development of the vaccine is urgently in progress.Materials and methods:Immunoinf...Background:With the diffusion of SARS-CoV-2 around the world,human health is being threatened.As there is no effective vaccine yet,the development of the vaccine is urgently in progress.Materials and methods:Immunoinformatics methods were applied to predict epitopes from the Spike protein through mining literature associated with B-and T-cell epitopes prediction published or preprinted since the outbreak of the virus till June 1,2020.3D structure of the Spike protein were obtained(PDB ID:6VSB)for prediction of discontinuous B-cell epitopes and localization of epitopes in the hotspot regions.Results:Methods provided by the Immune Epitope Database(IEDB)server were the most frequently used to predict epitopes.Sequence alignment of the epitopes extracted from literature with the Spike protein demonstrated that the epitopes in different studies converged to multiple short hotspot regions.There were three hotspot regions found in RBD of the Spike protein harboring B-cell linear epitopes(‘RQIAPGQTGKIADYNYKLPD’,‘SYGFQPTNGVGYQ’and‘YAWNRKRISNCVA’)predicted to have high antigenicity score.Two T-cell epitopes(‘KPFERDISTEIYQ’and‘NYNYLYRLFR’)predicted to be highly antigenic in the original studies were discovered in the hotspot region.Toxicity and allergenicity analysis confirmed all the five epitopes are of non-toxin,and four of them are of non-allergen.The five epitopes identified in hotspot regions of RBD were found fully exposed based on the 3D structure of the Spike protein.Conclusion:The five epitopes we discovered from literature mining may be potential candidates for diagnostics and vaccine development against SARS-CoV-2.展开更多
The biomedical literature is a vast and invaluable resource for biomedical research.Integrating knowledge from the literature with biomedical data can help biological studies and the clinical decision-making process.E...The biomedical literature is a vast and invaluable resource for biomedical research.Integrating knowledge from the literature with biomedical data can help biological studies and the clinical decision-making process.Efforts have been made to gather information from the biomedical literature and create biomedical knowledge bases,such as KEGG and Reactome.However,manual curation remains the primary method to retrieve accurate biomedical entities and relationships.Manual curation becomes increasingly challenging and costly as the volume of biomedical publications quickly grows.Fortunately,recent advancements in Artificial Intelligence(AI)technologies offer the potential to automate the process of curating,updating,and integrating knowledge from the literature.Herein,we highlight the AI capabilities to aid in mining knowledge and building the knowledge base from the biomedical literature.展开更多
BACKGROUND Ribonucleotide reductase(RR)is a key enzyme in tumor proliferation,especially its subunit-RRM2.Although there are multiple therapeutics for tumors,they all have certain limitations.Given their advantages,tr...BACKGROUND Ribonucleotide reductase(RR)is a key enzyme in tumor proliferation,especially its subunit-RRM2.Although there are multiple therapeutics for tumors,they all have certain limitations.Given their advantages,traditional Chinese medicine(TCM)monomers have become an important source of anti-tumor drugs.Therefore,screening and analysis of TCM monomers with RRM2 inhibition can provide a reference for further anti-tumor drug development.AIM To screen and analyze potential anti-tumor TCM monomers with a good binding capacity to RRM2.METHODS The Gene Expression Profiling Interactive Analysis database was used to analyze the level of RRM2 gene expression in normal and tumor tissues as well as RRM2's effect on the overall survival rate of tumor patients.TCM monomers that potentially act on RRM2 were screened via literature mining.Using AutoDock software,the screened monomers were docked with the RRM2 protein.RESULTS The expression of RRM2 mRNA in multiple tumor tissues was significantly higher than that in normal tissues,and it was negatively correlated with the overall survival rate of patients with the majority of tumor types.Through literature mining,we discovered that berberine,ursolic acid,gambogic acid,cinobufagin,quercetin,daphnetin,and osalmide have inhibitory effects on RRM2.The results of molecular docking identified that the above TCM monomers have a strong binding capacity with RRM2 protein,which mainly interacted through hydrogen bonds and hydrophobic force.The main binding sites were Arg330,Tyr323,Ser263,and Met350.CONCLUSION RRM2 is an important tumor therapeutic target.The TCM monomers screened have a good binding capacity with the RRM2 protein.展开更多
Objective:This study aimed to collect,sort out and excavate the literature on the diagnosis and treatment of insomnia in Chinese Medical Code,and to explore the law of internal administration of traditional Chinese me...Objective:This study aimed to collect,sort out and excavate the literature on the diagnosis and treatment of insomnia in Chinese Medical Code,and to explore the law of internal administration of traditional Chinese medicine in the treatment of insomnia.Methods:We used"Chinese Medical Code"as the source of data retrieval,"insomnia","not sleepy","no sleeping"and"eye not sleeping"as the key words,and excluded temporary insomnia,physiological insomnia,insomnia caused by other diseases,and excluded other methods such as acupuncture and massage,the medical cases.Prescriptions of insomnia treated with traditional Chinese medicine were screened out.The database was established by using Microsoft excel 2010.Descriptive statistics were used to analyze the frequency,menstruation of nature,taste and efficacy of traditional Chinese medicine.We analyzed the association rules between drugs and mined the rules of prescriptions by the Apriori algorithm in SPSS Modeler 14.1.Results:A total of 147 prescriptions,138 traditional Chinese medicines,20 pairs of core drugs,57 common drug combinations and 3 core pharmaceuticals were included.The basic prescription is:Suan Zao Ren,Bai Zi Ren,Fu Shen,Fu Ling,Yuan Zhi,Dang Gui,Gan Cao(Zhi).Conclusion:In summary,according to the drug analysis above,the main disease of insomnia is in the heart,involving the liver,spleen and kidney.Because it involves different viscera and syndrome,the methods of treatment are also different.However,there is a basic pathogenesis,that’s the loss of mental nourishment.By excavating the law of the use of traditional Chinese medicine in the treatment of insomnia on the Chinese Medical Code,we expect that it can guide the treatment of insomnia.展开更多
Background:In order to collect and analyze the literature on yellowish leucorrhea disease in Chinese Medical Canon,and to explore its treatment and medication law with data mining.Methods:The Chinese Medical Canon was...Background:In order to collect and analyze the literature on yellowish leucorrhea disease in Chinese Medical Canon,and to explore its treatment and medication law with data mining.Methods:The Chinese Medical Canon was used as the data retrieval source,and“Huangdai”(yellowish leucorrhea disease)and and other keywords were used as the keywords to do literature search.After excluding“gestational yellowish leucorrhea”,“jaundice and yellowish leucorrhea”and other diseases,the literature retrieval and data sorting and analysis were carried out,and the database was established by using Microsoft Excel 2010.Descriptive statistical method was used to analyze the nature,flavour,channel tropism and efficacy of medicine.Combined with the method in“the Research Platform for Inheritance of Traditional Chinese Medicine Academic Thoughts of Famous Gynecologists of Traditional Chinese Medicine”,the association rules between drugs were analyzed and their rules were excavated.Results:Eighty-seven prescriptions,152 traditional Chinese medicines,20 core drug pairs,39 common drug combinations and 5 core drug groups were collected and included.Conclusion:“Damp-Heat”is the fundamental pathological product of yellowish leucorrhea disease,and the basic treatments are“Releasing Heat”and“Eliminating Dampness”,“Regulating Liver Qi”and“Promoting Blood Circulation”,and“strengthening spleen”and stopping leucorrhea.展开更多
Background:Images of anatomical regions and neuron type distribution,as well as their related literature are valuable assets for neuroscience research.They are vital evidence and vehicles in discovering new phenomena ...Background:Images of anatomical regions and neuron type distribution,as well as their related literature are valuable assets for neuroscience research.They are vital evidence and vehicles in discovering new phenomena and knowledge refinement through image and text big data.The knowledge acquired from image data generally echoes with the literature accumulated over the years.The knowledge within the literature can provide a comprehensive context for a deeper understanding of the image data.However,it is quite a challenge to manually identify the related literature and summarize the neuroscience knowledge in the large-scale corpus.Thus,neuroscientists are in dire need of an automated method to extract neuroscience knowledge from large-scale literature.Methods:A proposed deep learning model named BioBERT-CRF extracts brain region entities from the WhiteText dataset.This model takes advantage of BioBERT and CRF to predict entity labels while training.Results:The proposed deep learning model demonstrated comparable performance against or even outperforms the previous models on the WhiteText dataset.The BioBERT-CRF model has achieved the best average precision,recall,and F1 score of 81.3%,84.0%,and 82.6%,respectively.We used the BioBERT-CRF model to predict brain region entities in a large-scale PubMed abstract dataset and used a rule-based method to normalize all brain region entities to three neuroscience dictionaries.Conclusions:Our work shows that the BioBERT-CRF model can be well-suited for brain region entity extraction.The rankings of different brain region entities by their appearance in the large-scale corpus indicate the anatomical regions that researchers are most concerned about.展开更多
基金This work was supported by grants from the National Natural Science Foundation of China(NSFC No.11421202,and 11827803 to YBF)the Youth Thousand Scholar Program of China(J.Z.)Beijing Advanced Innovation Center for Biomedical Engineering,BUAA(J.Z.)。
文摘Background:With the diffusion of SARS-CoV-2 around the world,human health is being threatened.As there is no effective vaccine yet,the development of the vaccine is urgently in progress.Materials and methods:Immunoinformatics methods were applied to predict epitopes from the Spike protein through mining literature associated with B-and T-cell epitopes prediction published or preprinted since the outbreak of the virus till June 1,2020.3D structure of the Spike protein were obtained(PDB ID:6VSB)for prediction of discontinuous B-cell epitopes and localization of epitopes in the hotspot regions.Results:Methods provided by the Immune Epitope Database(IEDB)server were the most frequently used to predict epitopes.Sequence alignment of the epitopes extracted from literature with the Spike protein demonstrated that the epitopes in different studies converged to multiple short hotspot regions.There were three hotspot regions found in RBD of the Spike protein harboring B-cell linear epitopes(‘RQIAPGQTGKIADYNYKLPD’,‘SYGFQPTNGVGYQ’and‘YAWNRKRISNCVA’)predicted to have high antigenicity score.Two T-cell epitopes(‘KPFERDISTEIYQ’and‘NYNYLYRLFR’)predicted to be highly antigenic in the original studies were discovered in the hotspot region.Toxicity and allergenicity analysis confirmed all the five epitopes are of non-toxin,and four of them are of non-allergen.The five epitopes identified in hotspot regions of RBD were found fully exposed based on the 3D structure of the Spike protein.Conclusion:The five epitopes we discovered from literature mining may be potential candidates for diagnostics and vaccine development against SARS-CoV-2.
基金the National Library of Medicine of the National Institute of Health(NIH)award number 5R01LM013392。
文摘The biomedical literature is a vast and invaluable resource for biomedical research.Integrating knowledge from the literature with biomedical data can help biological studies and the clinical decision-making process.Efforts have been made to gather information from the biomedical literature and create biomedical knowledge bases,such as KEGG and Reactome.However,manual curation remains the primary method to retrieve accurate biomedical entities and relationships.Manual curation becomes increasingly challenging and costly as the volume of biomedical publications quickly grows.Fortunately,recent advancements in Artificial Intelligence(AI)technologies offer the potential to automate the process of curating,updating,and integrating knowledge from the literature.Herein,we highlight the AI capabilities to aid in mining knowledge and building the knowledge base from the biomedical literature.
基金Supported by Nanchong City School’s Science and Technology Strategic Cooperation,China,No.20SXQT0304Research and Development Project Plan of Affiliated Hospital of North Sichuan Medical College,China,No.2020ZD003.
文摘BACKGROUND Ribonucleotide reductase(RR)is a key enzyme in tumor proliferation,especially its subunit-RRM2.Although there are multiple therapeutics for tumors,they all have certain limitations.Given their advantages,traditional Chinese medicine(TCM)monomers have become an important source of anti-tumor drugs.Therefore,screening and analysis of TCM monomers with RRM2 inhibition can provide a reference for further anti-tumor drug development.AIM To screen and analyze potential anti-tumor TCM monomers with a good binding capacity to RRM2.METHODS The Gene Expression Profiling Interactive Analysis database was used to analyze the level of RRM2 gene expression in normal and tumor tissues as well as RRM2's effect on the overall survival rate of tumor patients.TCM monomers that potentially act on RRM2 were screened via literature mining.Using AutoDock software,the screened monomers were docked with the RRM2 protein.RESULTS The expression of RRM2 mRNA in multiple tumor tissues was significantly higher than that in normal tissues,and it was negatively correlated with the overall survival rate of patients with the majority of tumor types.Through literature mining,we discovered that berberine,ursolic acid,gambogic acid,cinobufagin,quercetin,daphnetin,and osalmide have inhibitory effects on RRM2.The results of molecular docking identified that the above TCM monomers have a strong binding capacity with RRM2 protein,which mainly interacted through hydrogen bonds and hydrophobic force.The main binding sites were Arg330,Tyr323,Ser263,and Met350.CONCLUSION RRM2 is an important tumor therapeutic target.The TCM monomers screened have a good binding capacity with the RRM2 protein.
文摘Objective:This study aimed to collect,sort out and excavate the literature on the diagnosis and treatment of insomnia in Chinese Medical Code,and to explore the law of internal administration of traditional Chinese medicine in the treatment of insomnia.Methods:We used"Chinese Medical Code"as the source of data retrieval,"insomnia","not sleepy","no sleeping"and"eye not sleeping"as the key words,and excluded temporary insomnia,physiological insomnia,insomnia caused by other diseases,and excluded other methods such as acupuncture and massage,the medical cases.Prescriptions of insomnia treated with traditional Chinese medicine were screened out.The database was established by using Microsoft excel 2010.Descriptive statistics were used to analyze the frequency,menstruation of nature,taste and efficacy of traditional Chinese medicine.We analyzed the association rules between drugs and mined the rules of prescriptions by the Apriori algorithm in SPSS Modeler 14.1.Results:A total of 147 prescriptions,138 traditional Chinese medicines,20 pairs of core drugs,57 common drug combinations and 3 core pharmaceuticals were included.The basic prescription is:Suan Zao Ren,Bai Zi Ren,Fu Shen,Fu Ling,Yuan Zhi,Dang Gui,Gan Cao(Zhi).Conclusion:In summary,according to the drug analysis above,the main disease of insomnia is in the heart,involving the liver,spleen and kidney.Because it involves different viscera and syndrome,the methods of treatment are also different.However,there is a basic pathogenesis,that’s the loss of mental nourishment.By excavating the law of the use of traditional Chinese medicine in the treatment of insomnia on the Chinese Medical Code,we expect that it can guide the treatment of insomnia.
基金This study was supported by Tianjin Health and Family Planning Commission integrated Traditional Chinese and Western medicine research project(2017134).
文摘Background:In order to collect and analyze the literature on yellowish leucorrhea disease in Chinese Medical Canon,and to explore its treatment and medication law with data mining.Methods:The Chinese Medical Canon was used as the data retrieval source,and“Huangdai”(yellowish leucorrhea disease)and and other keywords were used as the keywords to do literature search.After excluding“gestational yellowish leucorrhea”,“jaundice and yellowish leucorrhea”and other diseases,the literature retrieval and data sorting and analysis were carried out,and the database was established by using Microsoft Excel 2010.Descriptive statistical method was used to analyze the nature,flavour,channel tropism and efficacy of medicine.Combined with the method in“the Research Platform for Inheritance of Traditional Chinese Medicine Academic Thoughts of Famous Gynecologists of Traditional Chinese Medicine”,the association rules between drugs were analyzed and their rules were excavated.Results:Eighty-seven prescriptions,152 traditional Chinese medicines,20 core drug pairs,39 common drug combinations and 5 core drug groups were collected and included.Conclusion:“Damp-Heat”is the fundamental pathological product of yellowish leucorrhea disease,and the basic treatments are“Releasing Heat”and“Eliminating Dampness”,“Regulating Liver Qi”and“Promoting Blood Circulation”,and“strengthening spleen”and stopping leucorrhea.
基金This work was supported by the National Science and Technology Innovation 2030 Grant(No.2021ZD0201002)the National Natural Science Foundation of China(Nos.T2122015 and 61890954)+1 种基金CAMS Innovation Fund for Medical Sciences(No.2019-I2M-5-014)Suzhou Prospective Application Research Project(No.SYG201915).
文摘Background:Images of anatomical regions and neuron type distribution,as well as their related literature are valuable assets for neuroscience research.They are vital evidence and vehicles in discovering new phenomena and knowledge refinement through image and text big data.The knowledge acquired from image data generally echoes with the literature accumulated over the years.The knowledge within the literature can provide a comprehensive context for a deeper understanding of the image data.However,it is quite a challenge to manually identify the related literature and summarize the neuroscience knowledge in the large-scale corpus.Thus,neuroscientists are in dire need of an automated method to extract neuroscience knowledge from large-scale literature.Methods:A proposed deep learning model named BioBERT-CRF extracts brain region entities from the WhiteText dataset.This model takes advantage of BioBERT and CRF to predict entity labels while training.Results:The proposed deep learning model demonstrated comparable performance against or even outperforms the previous models on the WhiteText dataset.The BioBERT-CRF model has achieved the best average precision,recall,and F1 score of 81.3%,84.0%,and 82.6%,respectively.We used the BioBERT-CRF model to predict brain region entities in a large-scale PubMed abstract dataset and used a rule-based method to normalize all brain region entities to three neuroscience dictionaries.Conclusions:Our work shows that the BioBERT-CRF model can be well-suited for brain region entity extraction.The rankings of different brain region entities by their appearance in the large-scale corpus indicate the anatomical regions that researchers are most concerned about.