摘要
[目的/意义]利用在线医疗咨询文本探索抑郁症症状的自动抽取方法,推进健康大数据的深层次应用。[方法/过程]以在线问诊平台“好大夫在线”的患者问诊记录为语料,选择无监督机器学习法,运用短语识别和深度学习语义建模技术实现抑郁症症状的快速提取。算法通过了测试语料的评估,并在抑郁症典型症状分析和抑郁症人群预测两个任务中得以检验。[结果/结论]运用本文算法识别抑郁症症状短语的准确率为73.85%,模型表现优良。用该方法分析抑郁症患者的典型表现,结论与心理学临床检验结果一致,抑郁症人群预测的精准度则可以达到78.81%。对于3个症状短语的分布表示模型,具备深层次语义表达能力Sentence-BERT表现最好,说明强化短语的语义表达,运用无监督机器学习方法能够实现疾病症状的快速提取,有效提升大规模文本信息的处理能力。
[Purpose/Significance]This study explores how to identify depression symptoms from online medical consultation,which profoundly enhances health big-data utilization and adds data value.[Method/Process]Using the patient consultation text on“HaoDaiFu”,an online medical platform,the study employed unsupervised machine learning,phrase recognition,and deep-learning modeling to identify depression symptoms.The algorithm was evaluated on test data and tested further in two tasks:depression symptom analysis and depression patient identification.[Results/Conclusion]The model s accuracy is 73.85%in the symptom extraction task,indicating it performs well.In the task of analyzing depressed patients features,the conclusion is consistent with the clinical psychological tests,and the accuracy in the task of recognizing depresses patients can reach 78.81%,which verifies the effectiveness of the algorithm.Among the three semantic models describing symptoms,Sentence-BERT performs the best,confirming that strengthening the semantics of symptom phrases and using unsupervised machine learning can extract disease symptoms swiftly and effectively improve the efficiency of processing large-scale textual information.
作者
聂卉
吴晓燕
Nie Hui;Wu Xiaoyan(School of Information Management,Sun Yat-Sen University,Guangzhou 510006,China)
出处
《现代情报》
2023年第9期63-73,共11页
Journal of Modern Information
基金
2022广州社会科学基金项目“双循环新发展格局下粤港澳大湾区社会保障高效协同研究”(项目编号:10000-42220402)
2023广州市哲学社会科学发展“十四五”规划项目“健康中国背景下面向互联网医疗大数据的抑郁症风险预测研究”(项目编号:2023GZGJ259)。
关键词
在线医疗咨询文本
抑郁症
语义建模
短语识别
online medical consultation text
depression
semantic modeling
phrase recognition