Text mining has emerged as an effective method of handling and extracting useful information from the exponentially growing biomedical literature and biomedical databases.We developed a novel biomedical text mining mo...Text mining has emerged as an effective method of handling and extracting useful information from the exponentially growing biomedical literature and biomedical databases.We developed a novel biomedical text mining model implemented by a multi-agent system and distributed computing mechanism.Our distributed system,TextMed,comprises of several software agents,where each agent uses a reinforcement learning method to update the sentiment of relevant text from a particular set of research articles related to specific keywords.TextMed can also operate on different physical machines to expedite its knowledge extraction by utilizing a clustering technique.We collected the biomedical textual data from PubMed and then assigned to a multi-agent biomedical text mining system,where each agent directly communicates with each other collaboratively to determine the relevant information inside the textual data.Our experimental results indicate that TexMed parallels and distributes the learning process into individual agents and appropriately learn the sentiment score of specific keywords,and efficiently find connections in biomedical information through text mining paradigm.展开更多
In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages o...In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.展开更多
Biomedical entity alignment,composed of two subtasks:entity identification and entity-concept mapping,is of great research value in biomedical text mining while these techniques are widely used for name entity standar...Biomedical entity alignment,composed of two subtasks:entity identification and entity-concept mapping,is of great research value in biomedical text mining while these techniques are widely used for name entity standardization,information retrieval,knowledge acquisition and ontology construc-tion.Previous works made many efforts on feature engineering to employ feature-based models for entity identification and alignment.However,the models depended on subjective feature selection may suffer error propagation and are not able to uti-lize the hidden information.With rapid development in health-related research,researchers need an effective method to explore the large amount of available biomedical literatures.Therefore,we propose a two-stage entity alignment process,biomedical entity exploring model,to identify biomedical entities and align them to the knowledge base interactively.The model aims to automatically obtain semantic information for extracting biomedical entities and mining semantic relations through the standard biomedical knowledge base.The experiments show that the proposed method achieves better performance on entity alignment.The proposed model dramatically improves the FI scores of the task by about 4.5%in entity identification and 2.5%in entity-concept mapping.展开更多
Understanding complex biological pathways,including gene–gene interactions and gene regulatory networks,is critical for exploring disease mechanisms and drug development.Manual literature curation of biological pathw...Understanding complex biological pathways,including gene–gene interactions and gene regulatory networks,is critical for exploring disease mechanisms and drug development.Manual literature curation of biological pathways cannot keep up with the exponential growth of new discoveries in the literature.Large-scale language models(LLMs)trained on extensive text corpora contain rich biological information,and they can be mined as a biological knowledge graph.This study assesses 21 LLMs,including both application programming interface(API)-based models and open-source models in their capacities of retrieving biological knowledge.The evaluation focuses on predicting gene regulatory relations(activation,inhibition,and phosphorylation)and the Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway components.Results indicated a significant disparity in model performance.API-based models GPT-4 and Claude-Pro showed superior performance,with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction,and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction,respectively.Open-source models lagged behind their API-based counterparts,whereas Falcon-180b and llama2-7b had the highest F1 scores of 0.2787 and 0.1923 in gene regulatory relations,respectively.The KEGG pathway recognition had a Jaccard similarity index of 0.2237 for Falcon-180b and 0.2207 for llama2-7b.Our study suggests that LLMs are informative in gene network analysis and pathway mapping,but their effectiveness varies,necessitating careful model selection.This work also provides a case study and insight into using LLMs das knowledge graphs.Our code is publicly available at the website of GitHub(Muh-aza).展开更多
基金This research is supported by Natural Science Foundation of Hunan Province(No.2019JJ40145)Scientific Research Key Project of Hunan Education Department(No.19A273)open Fund of Key Laboratory of Hunan Province(2017TP1026).
文摘Text mining has emerged as an effective method of handling and extracting useful information from the exponentially growing biomedical literature and biomedical databases.We developed a novel biomedical text mining model implemented by a multi-agent system and distributed computing mechanism.Our distributed system,TextMed,comprises of several software agents,where each agent uses a reinforcement learning method to update the sentiment of relevant text from a particular set of research articles related to specific keywords.TextMed can also operate on different physical machines to expedite its knowledge extraction by utilizing a clustering technique.We collected the biomedical textual data from PubMed and then assigned to a multi-agent biomedical text mining system,where each agent directly communicates with each other collaboratively to determine the relevant information inside the textual data.Our experimental results indicate that TexMed parallels and distributes the learning process into individual agents and appropriately learn the sentiment score of specific keywords,and efficiently find connections in biomedical information through text mining paradigm.
基金Supported by the National Natural Science Foundation of China(61202193,61202304)the Major Projects of Chinese National Social Science Foundation(11&ZD189)the Chinese Postdoctoral Science Foundation(2013M540593,2014T70722)
文摘In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.
基金supported by the National Key Research and Development Program of China(2018YFB1003404)the National Natural Science Foundation of China(Grant Nos.61672142,61402213)+1 种基金the Fundamental Research Funds for the Central Universities(N150408001-3,N150404013)Natural Science Foundation of Liaoning Province(20170540471)。
文摘Biomedical entity alignment,composed of two subtasks:entity identification and entity-concept mapping,is of great research value in biomedical text mining while these techniques are widely used for name entity standardization,information retrieval,knowledge acquisition and ontology construc-tion.Previous works made many efforts on feature engineering to employ feature-based models for entity identification and alignment.However,the models depended on subjective feature selection may suffer error propagation and are not able to uti-lize the hidden information.With rapid development in health-related research,researchers need an effective method to explore the large amount of available biomedical literatures.Therefore,we propose a two-stage entity alignment process,biomedical entity exploring model,to identify biomedical entities and align them to the knowledge base interactively.The model aims to automatically obtain semantic information for extracting biomedical entities and mining semantic relations through the standard biomedical knowledge base.The experiments show that the proposed method achieves better performance on entity alignment.The proposed model dramatically improves the FI scores of the task by about 4.5%in entity identification and 2.5%in entity-concept mapping.
基金National Institute of General Medical Sciences,Grant/Award Number:R35-GM126985National Institute of Diabetes and Digestive and Kidney Diseases,Grant/Award Number:P30DK092950U.S.National Library of Medicine,Grant/Award Number:LM013392。
文摘Understanding complex biological pathways,including gene–gene interactions and gene regulatory networks,is critical for exploring disease mechanisms and drug development.Manual literature curation of biological pathways cannot keep up with the exponential growth of new discoveries in the literature.Large-scale language models(LLMs)trained on extensive text corpora contain rich biological information,and they can be mined as a biological knowledge graph.This study assesses 21 LLMs,including both application programming interface(API)-based models and open-source models in their capacities of retrieving biological knowledge.The evaluation focuses on predicting gene regulatory relations(activation,inhibition,and phosphorylation)and the Kyoto Encyclopedia of Genes and Genomes(KEGG)pathway components.Results indicated a significant disparity in model performance.API-based models GPT-4 and Claude-Pro showed superior performance,with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction,and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction,respectively.Open-source models lagged behind their API-based counterparts,whereas Falcon-180b and llama2-7b had the highest F1 scores of 0.2787 and 0.1923 in gene regulatory relations,respectively.The KEGG pathway recognition had a Jaccard similarity index of 0.2237 for Falcon-180b and 0.2207 for llama2-7b.Our study suggests that LLMs are informative in gene network analysis and pathway mapping,but their effectiveness varies,necessitating careful model selection.This work also provides a case study and insight into using LLMs das knowledge graphs.Our code is publicly available at the website of GitHub(Muh-aza).