摘要
目的:通过比较两个基于最大概率法的症状提取方案,探讨中医症状信息的提取和标准化。方法:数据分析和处理在R 3.3.2上进行。运用《诊断学》《中医诊断学》及1 000份已标记的肺炎住院病历建立症状标准化数据库,症状描述词库和关键词-形容词词库。基于最大概率法分别设计出中文分词方案,直接提取方案和组合提取方案。并用这3种方案对2 311份肺炎病历进行症状信息提取和标准化,从产生维度、手工处理情况、症状提取效果对方案进行比较。结果:直接提取方案和组合提取方案均能有效降低维度,组合提取方案手工处理百分比较小和症状提取效果较好。结论:基于最大概率法的组合提取方案能有效提取中医症状信息。
Objective: To discuss the extraction and standardization of traditional Chinese medicine symptom by comparing two symptom extraction programs based on the maximum probability method. Methods: All data were analyzed and processed on R 3.3.2. Diagnostics, Diagnostics of Traditional Chinese Medicine and 1 000 marked pneumonia hospitalized medical records were used to establish symptomstandardization database, symptom description lexicon and keyword-adjective lexicon. Based on the maximum probability method, Chinese word segmentation program(CSP), direct extraction program(DEP) and combination extraction program(CEP) weredesigned respectively. And these three programs were used to extract and standardize the symptoms of 2 311 pneumonia medical records,and the results were compared with each other bygenerating dimension, manual processing and the efficiency of symptom extraction. Results: Compared with CSP, CEP and DEP were effective in reducing the dimension. And CEP was lower on the manual processing rate and more efficient on the symptom extraction. Conclusion: CEP based on the maximum probability methodcan effectively extract TCM symptom information.
出处
《中华中医药杂志》
CAS
CSCD
北大核心
2017年第5期2159-2162,共4页
China Journal of Traditional Chinese Medicine and Pharmacy
基金
教育部博士点基金项目(No.20114425110009)~~
关键词
症状
文本挖掘
文本数据结构化
中文分词
最大概率法
标准化
Symptom
Text mining
Text data structure
Chinese word segmentation
Maximum probability method
Standardization