摘要
提出一种为中文临床诊断自动进行ICD-10编码的算法,利用分布式语义相似度计算方法计算文本语义相似度,考虑到中文的语言特点,不仅基于词语构建词向量,还基于汉字构建词向量,测试二者对查准率和查全率的影响。结果显示该算法在测试集上获得较高的准确率。
The paper proposes an algorithm which can implement ICD-10 coding automatically for clinical diagnoses in Chinese and calculate the semantic similarity of texts by the calculation method of distributed semantic similarity.In consideration to the linguistic features of Chinese,it constructs term vectors based on both words and Chinese characters and tests their influences on the precision ratio and recall ration.The results indicate that this algorithm has a higher precision ration in the test set.
出处
《医学信息学杂志》
CAS
2016年第2期52-56,共5页
Journal of Medical Informatics
关键词
自动编码
语义相似度
分布式语义
ICD-10
Automated code assignment
Semantic similarity
Distributional semantics
ICD-10