摘要
目的基于表示学习中的Skip-gram词嵌入算法,寻找能够克服电子病历中结构化特征的高维性并在语义层次上表示特征的方法。方法本文的数据来源于北京市某三甲医院的电子病历系统,从中提取患者的结构化特征,包括疾病、药物和实验室指标,其中实验室指标通过正常值范围离散化;利用Skip-gram算法,将电子病历中离散型患者特征(疾病和药物)和离散后的连续型患者特征(实验室指标)嵌入到同一个低维实数向量空间中。通过t-SNE降维可视化方法显示低维实数空间中特征向量的关系,并与特征向量间的余弦距离计算结果相互印证,从而评价特征表示的有效性和揭示特征向量间的潜在联系。结果患者特征的低维实数向量既降低了患者特征的维度,又很好地表征了特征间的潜在联系,临床含义相关的特征表示成的低维实数向量也很相近。结论基于Skip-gram算法将患者结构化特征表示成低维实数向量取得了较好的效果,为解决EMR数据表示的高维性以及结构化特征间潜在关系分析提供一种思路。
Objective To reduce the dimension of structured patient features in electronic medical records(EMR)system and to represent the patient features at a semantic level.Methods Data used in this study was derived from the EMR system of a tertiary hospital in Beijing,China.Three categories of structured patient features were extracted,including two discrete patient features(disease history and medications)and one continuous patient features(laboratory tests).These features were then represented as the concept vectors by being embedded into an unified low-dimensional vector space with Skip-gram algorithm.In order to evaluate the effectiveness of feature representation and reveal the potential relationship between features,t-SNE technology was used to visualize the concept space,and cosine distances in concept vectors were calculated to reflect the relationship quantitively.Results The representation of concept vectors for patient features not only reduced the dimension of the traditional feature representation,but also revealed the potential relationship between features to certain degree.Clinically relevant features were also close in the concept vector space.Conclusions Structured patient features can be represented as meaningful lowdimensional vectors based on the Skip-gram algorithm,providing a new idea for representing structured features in EMR.
作者
黄艳群
王妮
刘红蕾
费晓璐
巍岚
陈卉
HUANG Yanqun;WANG Ni;LIU Honglei;FEI Xiaolu;WEI Lan;CHEN Hui(School of Biomedical Engineering,Capital Medical University,Beijing 100069;Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application,Capital Medical University,Beijing 100069;Information Center,Xuanwu Hospital,Capital Medical University,Beijing 100053)
出处
《北京生物医学工程》
2019年第6期568-574,604,共8页
Beijing Biomedical Engineering
基金
国家自然科学基金(81671786、81971707)资助
关键词
电子病历
Skip-gram算法
特征表示
自然语言处理
词嵌入
electronic medical record
Skip-gram algorithm
feature representation
natural language processing
word embedding