本文介绍了在生物学英文文本纷繁芜杂的当今,面对中英文本的文化差异,生物医学自然语言处理(Natural Language processing for Biology,BioNLP)的基本概念和方法。归纳总结了BioNLP在挖掘生物医学文献信息中的重要方面。通过研究实例分...本文介绍了在生物学英文文本纷繁芜杂的当今,面对中英文本的文化差异,生物医学自然语言处理(Natural Language processing for Biology,BioNLP)的基本概念和方法。归纳总结了BioNLP在挖掘生物医学文献信息中的重要方面。通过研究实例分析了常见的以"词"、"句"、"篇"为语言单位的分析方法并指出这些方法的局限性,最后展望了生物医学计算语言学研究趋势。展开更多
Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis envir...Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.展开更多
文摘本文介绍了在生物学英文文本纷繁芜杂的当今,面对中英文本的文化差异,生物医学自然语言处理(Natural Language processing for Biology,BioNLP)的基本概念和方法。归纳总结了BioNLP在挖掘生物医学文献信息中的重要方面。通过研究实例分析了常见的以"词"、"句"、"篇"为语言单位的分析方法并指出这些方法的局限性,最后展望了生物医学计算语言学研究趋势。
基金the funding support from the National Natural Science Foundation of China (No. 81874429)Digital and Applied Research Platform for Diagnosis of Traditional Chinese Medicine (No. 49021003005)+1 种基金2018 Hunan Provincial Postgraduate Research Innovation Project (No. CX2018B465)Excellent Youth Project of Hunan Education Department in 2018 (No. 18B241)
文摘Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.