Depression has become one of the most common mental illnesses in the world.For better prediction and diagnosis,methods of automatic depression recognition based on speech signal are constantly proposed and updated,wit...Depression has become one of the most common mental illnesses in the world.For better prediction and diagnosis,methods of automatic depression recognition based on speech signal are constantly proposed and updated,with a transition from the early traditional methods based on hand‐crafted features to the application of architectures of deep learning.This paper systematically and precisely outlines the most prominent and up‐to‐date research of automatic depression recognition by intelligent speech signal processing so far.Furthermore,methods for acoustic feature extraction,algorithms for classification and regression,as well as end to end deep models are investigated and analysed.Finally,general trends are summarised and key unresolved issues are identified to be considered in future studies of automatic speech depression recognition.展开更多
Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary mea...Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment.Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized.Specialized physicians usually require extensive training and experience to capture changes in these features.Advancements in deep learning technology have provided technical support for capturing non-biological markers.Several researchers have proposed automatic depression estimation(ADE)systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening.This article summarizes commonly used public datasets and recent research on audio-and video-based ADE based on three perspectives:Datasets,deficiencies in existing research,and future development directions.展开更多
With the rapid growth of information transmission via the Internet,efforts have been made to reduce network load to promote efficiency.One such application is semantic computing,which can extract and process semantic ...With the rapid growth of information transmission via the Internet,efforts have been made to reduce network load to promote efficiency.One such application is semantic computing,which can extract and process semantic communication.Social media has enabled users to share their current emotions,opinions,and life events through their mobile devices.Notably,people suffering from mental health problems are more willing to share their feelings on social networks.Therefore,it is necessary to extract semantic information from social media(vlog data)to identify abnormal emotional states to facilitate early identification and intervention.Most studies do not consider spatio-temporal information when fusing multimodal information to identify abnormal emotional states such as depression.To solve this problem,this paper proposes a spatio-temporal squeeze transformer method for the extraction of semantic features of depression.First,a module with spatio-temporal data is embedded into the transformer encoder,which is utilized to obtain a representation of spatio-temporal features.Second,a classifier with a voting mechanism is designed to encourage the model to classify depression and non-depression effec-tively.Experiments are conducted on the D-Vlog dataset.The results show that the method is effective,and the accuracy rate can reach 70.70%.This work provides scaffolding for future work in the detection of affect recognition in semantic communication based on social media vlog data.展开更多
基金supported by the National Natural Science Foundation of China(NSFC,no.61701243,71771125)the Major Project of Natural Science Foundation of Jiangsu Education Department(no.19KJA180002).
文摘Depression has become one of the most common mental illnesses in the world.For better prediction and diagnosis,methods of automatic depression recognition based on speech signal are constantly proposed and updated,with a transition from the early traditional methods based on hand‐crafted features to the application of architectures of deep learning.This paper systematically and precisely outlines the most prominent and up‐to‐date research of automatic depression recognition by intelligent speech signal processing so far.Furthermore,methods for acoustic feature extraction,algorithms for classification and regression,as well as end to end deep models are investigated and analysed.Finally,general trends are summarised and key unresolved issues are identified to be considered in future studies of automatic speech depression recognition.
基金Supported by Shandong Province Key R and D Program,No.2021SFGC0504Shandong Provincial Natural Science Foundation,No.ZR2021MF079Science and Technology Development Plan of Jinan(Clinical Medicine Science and Technology Innovation Plan),No.202225054.
文摘Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment.Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized.Specialized physicians usually require extensive training and experience to capture changes in these features.Advancements in deep learning technology have provided technical support for capturing non-biological markers.Several researchers have proposed automatic depression estimation(ADE)systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening.This article summarizes commonly used public datasets and recent research on audio-and video-based ADE based on three perspectives:Datasets,deficiencies in existing research,and future development directions.
基金supported in part by the STI 2030-Major Projects(2021ZD0202002)in part by the National Natural Science Foundation of China(Grant No.62227807)+2 种基金in part by the Natural Science Foundation of Gansu Province,China(Grant No.22JR5RA488)in part by the Fundamental Research Funds for the Central Universities(Grant No.lzujbky-2023-16)Supported by Supercomputing Center of Lanzhou University.
文摘With the rapid growth of information transmission via the Internet,efforts have been made to reduce network load to promote efficiency.One such application is semantic computing,which can extract and process semantic communication.Social media has enabled users to share their current emotions,opinions,and life events through their mobile devices.Notably,people suffering from mental health problems are more willing to share their feelings on social networks.Therefore,it is necessary to extract semantic information from social media(vlog data)to identify abnormal emotional states to facilitate early identification and intervention.Most studies do not consider spatio-temporal information when fusing multimodal information to identify abnormal emotional states such as depression.To solve this problem,this paper proposes a spatio-temporal squeeze transformer method for the extraction of semantic features of depression.First,a module with spatio-temporal data is embedded into the transformer encoder,which is utilized to obtain a representation of spatio-temporal features.Second,a classifier with a voting mechanism is designed to encourage the model to classify depression and non-depression effec-tively.Experiments are conducted on the D-Vlog dataset.The results show that the method is effective,and the accuracy rate can reach 70.70%.This work provides scaffolding for future work in the detection of affect recognition in semantic communication based on social media vlog data.