摘要
诊断抑郁症的传统方法是通过面对面的评估和交谈。但是,许多患有抑郁症的患者不愿意在早期阶段就医,从而使病情恶化。为了在早期判断抑郁症患者的情况,提出一种利用社交媒体文本信息的时间序列特征和多示例学习的检测模型,考虑到抑郁症状不会立即出现,所以时序样本的使用显得非常重要,因此使用无监督LSTM提取时间序列特征,训练分类器实现二值分类,并使用多示例学习模型来解决不平衡样本问题。首先采用朴素贝叶斯分类器、随机森林、多元社会网络学习和多式抑郁词典学习作为基准,随后利用具有无监督LSTM时间序列功能的多示例学习来更准确地检测抑郁症。在MDDL数据集的基础上,整理出200个调查对象合计7946条推文信息,并按照训练测试比为8:2的实验得到结果如下:在准确率、精度,召回率和F1得分上分别达到75.0%、76.0%、73.0%、74.5%。结果表明,通过社交媒体中的文本数据,采用机器学习进行早期抑郁症检测是可行的。此外,通过大量的消融实验也证实,采用时间序列特征的方法要比传统的基准模型方法能够获得更好的性能。
The traditional method of diagnosing depression is through face-to-face assessment and conversation.However,many patients with depression are reluctant to seek medical attention at an early stage,which makes their condition worse.In order to judge the situation of patients with depression in the early stage,a detection model using time series features of social media textual information and multi-instance learning was proposed in this work.Considering that depressive symptoms will not appear immediately,the use of time series samples will be very important.Therefore,the unsupervised LSTM was used to extract time series features,binary classification was implemented by training a classifier,and a multi-instance learning model was exploited to solve the problem of unbalanced samples.Naive Bayes classifiers,random forests,multivariate social network learning and multimodal depression dictionary learning were used as benchmark methods firstly,and then the multi-instance learning with unsupervised LSTM time series functions was employed to detect depression more accurately.On the basis of the MDDL dataset,200 survey subjects totally 7946 tweets were selected,and the training-test ratio was set as 8:2.Experimental results were following:the accuracy,precision,recall and F1 score reached 75.0%,76.0%,73.0%,and 74.5%,respectively,which demonstrated that it was feasible to use machine learning for early depression detection through text data in social media.In addition,a large number of ablation studies also verified that the method using time series features could achieve better performance than the traditional benchmark methods.
作者
张梦娜
王君岩
龙洋
张浩峰
胡勇
Zhang Mengna;Wang Junyan;Long Yang;Zhang Haofeng;Hu Yong(Department of Preventive Health,Hospital of Nanjing University of Science and Technology,Nanjing 210094,China;School of Computer Science and Engineering,University of New South Wales,Sydney 2052,Australia;Department of Computer Science,Durham University,Durham DH13LE,UK;School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China)
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2022年第1期21-30,共10页
Chinese Journal of Biomedical Engineering
基金
国家自然科学基金(61872187,62072246)
英国医学研究委员会创新基金No.MR/S003916/1。
关键词
抑郁症检测
长短时记忆
时间序列特征
社交媒体
多示例学习
depression detection
long short-term memory(LSTM)
time series feature
social media
multi-instance learning