摘要
从临床病历文本中自动提取医学问题的相关时间属性可以服务于诸如临床决策支持、数字化临床路径等多种医疗信息应用,因此在医学语言处理领域,面向病历文本的时间信息自动提取研究在国际上已开展多年,而中文环境下的相关研究仍属空白。本研究提出了一种基于条件随机场(CRF)的时间关系自动提取算法实现了中文医学病历文本中面向医学问题的时间属性自动提取。该机器学习方法以经过医学问题和时间信息语义标注的病历文本为训练内容,时间关系结果标注采用以医学问题为中心的模式,即仅提取所关心医学问题的时间属性。在此方法框架下通过实验,重点分析了不同的CRF学习模板对于时间关系提取的影响,实验以63份实际病历作为实验文本,以多次交叉验证的方式获得不同学习模板情况下时间关系自动提取准确率的平均值,通过分析实验数据总结了CRF学习模板设计的一般规律,实验中最佳模板情况下时间关系提取正确率可达86.94%,这些结果将为后续研究提供基础。
The automatic extraction of temporal attributes related to medical problems from clinical narrative text serves various applications in medical informatics,such as clinical decision support,digital clinical pathway and so on.For this reason,in the domain of medical language processing,studies about automatic temporal information extraction from narrative medical records have been developed abroad for several years.Nevertheless,there is little investigation on Chinese language.This study proposed a solution to automatic extraction of temporal attributes of medical problems from Chinese narrative medical records based on conditional random fields(CRF).In this solution,the medical records were firstly semantically annotated with medical problem and temporal information tags to fulfill the CRF training task.In the labeled training dataset the temporal relationship was tagged based on medical problem oriented mode,that is to say only interested medical problem's temporal attributes were tagged.A further analysis of the impacts of various feature templates of CRF on temporal relationship extraction was taken.A multiple cross-validation method was used to evaluate different CRF learning templates in the corpus including 63 practical narrative medical records.The general principle of template design was proposed.And the accuracy of temporal relationship extraction reached 86.94% with the optimal template file.
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2010年第5期710-716,共7页
Chinese Journal of Biomedical Engineering
基金
国家自然科学基金资助项目(30900329)
中国博士后基金资助项目(20090451467)
关键词
信息提取
时间关系
条件随机场
医学语言处理
information extraction
temporal relationship
conditional random fields
medical language processing