Objective To establish the knowledge graph of“disease-syndrome-symptom-method-formula”in Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)for reducing the fuzziness and uncertainty of data,and for laying a foun...Objective To establish the knowledge graph of“disease-syndrome-symptom-method-formula”in Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)for reducing the fuzziness and uncertainty of data,and for laying a foundation for later knowledge reasoning and its application.Methods Under the guidance of experts in the classical formula of traditional Chinese medicine(TCM),the method of“top-down as the main,bottom-up as the auxiliary”was adopted to carry out knowledge extraction,knowledge fusion,and knowledge storage from the five aspects of the disease,syndrome,symptom,method,and formula for the original text of Treatise on Febrile Diseases,and so the knowledge graph of Treatise on Febrile Diseases was constructed.On this basis,the knowledge structure query and the knowledge relevance query were realized in a visual manner.Results The knowledge graph of“disease-syndrome-symptom-method-formula”in the Treatise on Febrile Diseases was constructed,containing 6469 entities and 10911 relational triples,on which the query of entities and their relationships can be carried out and the query result can be visualized.Conclusion The knowledge graph of Treatise on Febrile Diseases systematically realizes its digitization of the knowledge system,and improves the completeness and accuracy of the knowledge representation,and the connection between“disease-syndrome-symptom-treatment-formula”,which is conducive to the sharing and reuse of knowledge can be obtained in a clear and efficient way.展开更多
Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based o...Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based on graph convolutional network(GCN).Methods Clauses that contain symptoms,formulas,and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs,which were used to propose a node representation learning method based on GCN−the Traditional Chinese Medicine Graph Convolution Network(TCM-GCN).The symptom-formula,symptom-herb,and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes,and thus acquiring the nodes’sum-aggregations of symptoms,formulas,and herbs to lay a foundation for the downstream tasks of the prediction models.Results Comparisons among the node representations with multi-hot encoding,non-fusion encoding,and fusion encoding showed that the Precision@10,Recall@10,and F1-score@10 of the fusion encoding were 9.77%,6.65%,and 8.30%,respectively,higher than those of the non-fusion encoding in the prediction studies of the model.Conclusion Node representations by fusion encoding achieved comparatively ideal results,indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model.展开更多
Objective:With using natural language processing (NLP) technology to analyze and process the text of "Treatise on Febrile Diseases (TFDs)"for the sake of finding important information, this paper attempts to...Objective:With using natural language processing (NLP) technology to analyze and process the text of "Treatise on Febrile Diseases (TFDs)"for the sake of finding important information, this paper attempts to apply NLP in the field of text mining of traditional Chinese medicine (TCM)literature. Materials and Methods:Based on the Python language, the experiment invoked the NLP toolkit such as Jieba, nltk, gensim,and sklearn library, and combined with Excel and Word software. The text of "TFDs" was sequentially cleaned, segmented, and moved the stopped words, and then implementing word frequency statistics and analysis, keyword extraction, named entity recognition (NER) and other operations, finally calculating text similarity. Results:Jieba can accurately identify the herbal name in "TFDs." Word frequency statistics based on the word segmentation found that "warm therapy" is an important treatment of "TFDs." Guizhi decoction is the main prescription,and five core decoctions are identified. Keyword extraction based on the term "frequency-inverse document frequency" algorithm is ideal.The accuracy of NER in "TFDs" is about 86%;latent semantic indexing model calculating the similarity,"Understanding of Synopsis of Golden Chamber (SGC)" is much more similar with "SGC" than with "TFDs." The results meet expectation. Conclusions:It lays a research foundation for applying NLP to the field of text mining of unstructured TCM literature. With the combination of deep learning technology,NLP as an important branch of artificial intelligence will have broader application prospective in the field of text mining in TCM literature and construction of TCM knowledge graph as well as TCM knowledge services.展开更多
基金The Open Fund of Hunan University of Traditional Chinese Medicine for the First-Class Discipline of Traditional Chinese Medicine(2018ZYX66)the Science Research Project of Hunan Provincial Department of Education(20C1391)the Natural Science Foundation of Hunan Province(2020JJ4461)。
文摘Objective To establish the knowledge graph of“disease-syndrome-symptom-method-formula”in Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)for reducing the fuzziness and uncertainty of data,and for laying a foundation for later knowledge reasoning and its application.Methods Under the guidance of experts in the classical formula of traditional Chinese medicine(TCM),the method of“top-down as the main,bottom-up as the auxiliary”was adopted to carry out knowledge extraction,knowledge fusion,and knowledge storage from the five aspects of the disease,syndrome,symptom,method,and formula for the original text of Treatise on Febrile Diseases,and so the knowledge graph of Treatise on Febrile Diseases was constructed.On this basis,the knowledge structure query and the knowledge relevance query were realized in a visual manner.Results The knowledge graph of“disease-syndrome-symptom-method-formula”in the Treatise on Febrile Diseases was constructed,containing 6469 entities and 10911 relational triples,on which the query of entities and their relationships can be carried out and the query result can be visualized.Conclusion The knowledge graph of Treatise on Febrile Diseases systematically realizes its digitization of the knowledge system,and improves the completeness and accuracy of the knowledge representation,and the connection between“disease-syndrome-symptom-treatment-formula”,which is conducive to the sharing and reuse of knowledge can be obtained in a clear and efficient way.
基金New-Generation Artificial Intelligence-Major Program in the Sci-Tech Innovation 2030 Agenda from the Ministry of Science and Technology of China(2018AAA0102100)Hunan Provincial Department of Education key project(21A0250)The First Class Discipline Open Fund of Hunan University of Traditional Chinese Medicine(2022ZYX08)。
文摘Objective To construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases(Shang Han Lun,《伤寒论》)dataset and explore an optimal learning method represented with node attributes based on graph convolutional network(GCN).Methods Clauses that contain symptoms,formulas,and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs,which were used to propose a node representation learning method based on GCN−the Traditional Chinese Medicine Graph Convolution Network(TCM-GCN).The symptom-formula,symptom-herb,and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes,and thus acquiring the nodes’sum-aggregations of symptoms,formulas,and herbs to lay a foundation for the downstream tasks of the prediction models.Results Comparisons among the node representations with multi-hot encoding,non-fusion encoding,and fusion encoding showed that the Precision@10,Recall@10,and F1-score@10 of the fusion encoding were 9.77%,6.65%,and 8.30%,respectively,higher than those of the non-fusion encoding in the prediction studies of the model.Conclusion Node representations by fusion encoding achieved comparatively ideal results,indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model.
文摘Objective:With using natural language processing (NLP) technology to analyze and process the text of "Treatise on Febrile Diseases (TFDs)"for the sake of finding important information, this paper attempts to apply NLP in the field of text mining of traditional Chinese medicine (TCM)literature. Materials and Methods:Based on the Python language, the experiment invoked the NLP toolkit such as Jieba, nltk, gensim,and sklearn library, and combined with Excel and Word software. The text of "TFDs" was sequentially cleaned, segmented, and moved the stopped words, and then implementing word frequency statistics and analysis, keyword extraction, named entity recognition (NER) and other operations, finally calculating text similarity. Results:Jieba can accurately identify the herbal name in "TFDs." Word frequency statistics based on the word segmentation found that "warm therapy" is an important treatment of "TFDs." Guizhi decoction is the main prescription,and five core decoctions are identified. Keyword extraction based on the term "frequency-inverse document frequency" algorithm is ideal.The accuracy of NER in "TFDs" is about 86%;latent semantic indexing model calculating the similarity,"Understanding of Synopsis of Golden Chamber (SGC)" is much more similar with "SGC" than with "TFDs." The results meet expectation. Conclusions:It lays a research foundation for applying NLP to the field of text mining of unstructured TCM literature. With the combination of deep learning technology,NLP as an important branch of artificial intelligence will have broader application prospective in the field of text mining in TCM literature and construction of TCM knowledge graph as well as TCM knowledge services.