摘要
【目的】比较不同机器学习算法在智能分诊任务中的准确率,针对性地分析在线问诊平台的类目设置问题,尝试从数据中提取新特征提升分类器效果。【方法】基于“春雨医生”13个科室33 073条实际问诊数据,比较两种文本向量化方式在支持向量机、多项式贝叶斯、Logistic回归、随机森林、k近邻以及集成分类模型这6种分类器上实现智能分诊的准确率;通过高频词分析及词语共现对不同科室的错分数据进一步分析。【结果】文本向量化方法为TF-IDF、分类算法为支持向量机的分类器在智能分诊中的总体效果最优,增加年龄和性别特征后分类准确率可达76.3%。该分类器对外科数据分诊准确率仅为40.9%,原因在于问诊平台类目设置的混淆。【局限】假设现有数据中患者选择的科室是正确的。【结论】机器学习可用于在线问诊平台的智能分诊任务,根据医疗数据特点增加输入特征是分类器提高准确率的一个方向。部分疾病及症状的跨科室性影响了分类器的效果,在线问诊平台可通过推荐多个科室的方式来提升患者问诊体验。
[Objective]This paper compares the performance of various machine learning algorithms for automatic triage,aiming to improve their effectiveness through analyzing mis-classification data.[Methods]First,we retrieved 33,073 real patients’questions from a website named“chunyu doctor”.Then,we compared the accuracy of two text vectorization methods and six classification models.Finally,we analyzed the mis-classification data and extracted new features to improve the performance of models.[Results]The best automatic triage model used TF-IDF as text vectorization method and support vector machine as classification algorithm.After adding age and gender characteristics,the classification accuracy rate reached 76.3%.The classifier had the lowest accuracy rate for surgery department due to the setting of this platform’s categories.[Limitations]We assumed that the department selection of the patient was correct.[Conclusions]Machine learning techniques could improve the performance of automatic triage services of the online health consulting platforms.
作者
王若佳
张璐
王继民
Wang Ruojia;Zhang Lu;Wang Jimin(Department of Information Management,Peking University,Beijing 100871,China;Institute of Ocean Research,Peking University,Beijing 100871,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2019年第9期88-97,共10页
Data Analysis and Knowledge Discovery
关键词
在线问诊
智能分诊
机器学习
支持向量机
Ask the Doctor Service Automatic Triage Machine Learning Support Vector Machine