期刊文献+

基于无监督学习的中文电子病历分词 被引量:6

An Unsupervised Approach to Word Segmentation in Chinese EMRs
下载PDF
导出
摘要 电子病历中包含大量有用的医疗知识,抽取这些知识对于构建临床决策支持系统和个性化医疗健康信息服务具有重要意义。自动分词是分析和挖掘中文电子病历的关键基础。为了克服获取标注语料的困难,提出了一种基于无监督学习的中文电子病历分词方法。首先,使用通用领域的词典对电子病历进行初步的切分,为了更好地解决歧义问题,引入概率模型,并通过EM算法从生语料中估计词的出现概率。然后,利用字串的左右分支信息熵构建良度,将未登录词识别转化为最优化问题,并使用动态规划算法进行求解。最后,在3 000来自神经内科的中文电子病历上进行实验,证明了该方法的有效性。 Electronic medical records( EMR) contain a lot of useful medical knowledge. Extracting these knowledge are important for building clinical decision support system and personalized healthcare information service. Automatic word segmentation is a key precursor for analysis and mining of Chinese EMRs. In order to overcome the difficulties of obtaining labeled corpus,the paper proposes an unsupervised approach to word segmentation in Chinese EMRs. First,the paper uses a lexicon of general domain to generate an initial segmentation. To deal with the ambiguity problem,the paper also builds a probabilistic model. The probabilities of words are estimated by an EM procedure. Then the paper uses the left and right branching entropy to build goodness measure and regards the recognition of unknown words as an optimization problem which can be solved by dynamic programming. Finally,to prove the effectiveness of our approach,experiments are conducted on 3,000 copies of Chinese EMRs from the Department of Neurology.
出处 《智能计算机与应用》 2014年第2期68-71,共4页 Intelligent Computer and Applications
关键词 中文电子病历 无监督分词 EM算法 分支信息熵 动态规划 Chinese EMRs Unsupervised Segmentation EM Algorithm Branching Entropy Dynamic Programming
  • 相关文献

参考文献1

  • 1Richard C. Wasserman.Electronic Medical Records (EMRs), Epidemiology, and Epistemology: Reflections on EMRs and Future Pediatric Clinical Research[J].Academic Pediatrics.2011(4)

同被引文献64

引证文献6

二级引证文献64

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部