摘要
中医临床记录的病症内容是中医医师进行诊断的重要依据。由于中文表达形式的多样性与复杂性,如何从这些病症内容中进行标准化四诊信息的提取对于中医证候分析具有重要的研究价值。本文在充分分析各种中文分词算法的基础上,选择将最大正向匹配分词算法应用于中医临床病症内容中的四诊信息语义理解,构建的中医四诊语义模型在100个实际病例的四诊信息提取,再对最大分词数进行变量控制,得出最大分词数为5时得出的准确率和召回率最高。
TCM clinical record of the disease content is an essential basis for the diagnosis of TCM physicians.Due to the diversity and complexity of Chinese expressions,how to extract standardized four-diagnosis information from the contents of these conditions has important research value for TCM syndrome analysis.Based on the full analysis of various Chinese word segmentation algorithms,this paper chooses to apply the maximum forward matching word segmentation algorithm to the semantic interpretation of the four-diagnosis information in the clinical symptoms of traditional Chinese medicine.This research conducts the extraction of four-diagnosis information of 100 actual cases based on the constructed traditional Chinese medicine four-diagnosis information diagnostic model. Then the variable control is performed on the maximum number of word segmentation,and the high accuracy and recall rate are obtained when the maximum number of word segmentation is five.
作者
许林涛
叶欣欣
裴成飞
吴荣士
XU Lintao;YE Xinxin;PEI Chengfei;WU Rongshi(Anhui University of Science&Technology,Huainan 232000,China)
出处
《软件工程》
2020年第4期15-18,共4页
Software Engineering
关键词
中文分词
证候分析
四诊信息
chinese word segmentation
syndrome analysis
four consultation information