摘要
中医医案蕴藏着丰富的知识,如何完成对海量医案的自动标注以便对其进行知识挖掘显得尤为重要.针对明清古医案中症状、病机的自动识别标注问题,采用了基于条件随机场(CRF)的方法,提出数据清洗以及缩减合并词性以减少特征空间规模.最后,通过仿真实验将该方法与最大熵、支持向量机这两种统计方法进行对比.结果表明:该方法在针对明清古医案中症状、病机这类中医命名实体识别具有明显的优势.
Traditional Chinese medicine contains rich knowledge,how to complete the medical case of mass tagging in order to ex- tract their knowledge seems particularly important. This paper uses CRF to mark symptoms and pathogenesis in Medical Records of the Ming and Qing dynasties. In accord with characteristics of chinese we put forward a proposal of data cleansing, and we combine the part of speech in order to reduce the size of feature space. In order to verify the superiority of CRF in Chinese medicine named en- tity identification,we use maximum entropy and SVM to compare with CRF. The results showed that:our methods proposed for the Ming and Qing dynasties in the case of ancient medical symptoms, pathogenesis such Named Entity Recognition of Chinese medicine has a distinct advantage.
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2009年第3期359-364,共6页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(60803078)资助
关键词
条件随机场
中医命名实体
数据清洗
交叉验证
condition random field (CRF)
traditional Chinese medicine named entity
data cleansing
cross validation