Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information....Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional se-mantics. The other is fusing complementary cross‐modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross‐modal attention. As a result, the located nonverbal fea-tures are not only salient but also complementary to sentiment words directly. Experi-mental results show that the authors’ method achieves state‐of‐the‐art performance on several multimodal affective analysis datasets.展开更多
Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines.Speech Emotion Recognition(SER)is one of the critical sources for human evaluatio...Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines.Speech Emotion Recognition(SER)is one of the critical sources for human evaluation,which is applicable in many real-world applications such as healthcare,call centers,robotics,safety,and virtual reality.This work developed a novel TCN-based emotion recognition system using speech signals through a spatial-temporal convolution network to recognize the speaker’s emotional state.The authors designed a Temporal Convolutional Network(TCN)core block to recognize long-term dependencies in speech signals and then feed these temporal cues to a dense network to fuse the spatial features and recognize global information for final classification.The proposed network extracts valid sequential cues automatically from speech signals,which performed better than state-of-the-art(SOTA)and traditional machine learning algorithms.Results of the proposed method show a high recognition rate compared with SOTAmethods.The final unweighted accuracy of 80.84%,and 92.31%,for interactive emotional dyadic motion captures(IEMOCAP)and berlin emotional dataset(EMO-DB),indicate the robustness and efficiency of the designed model.展开更多
Emotion recognition from speech data is an active and emerging area of research that plays an important role in numerous applications,such as robotics,virtual reality,behavior assessments,and emergency call centers.Re...Emotion recognition from speech data is an active and emerging area of research that plays an important role in numerous applications,such as robotics,virtual reality,behavior assessments,and emergency call centers.Recently,researchers have developed many techniques in this field in order to ensure an improvement in the accuracy by utilizing several deep learning approaches,but the recognition rate is still not convincing.Our main aim is to develop a new technique that increases the recognition rate with reasonable cost computations.In this paper,we suggested a new technique,which is a one-dimensional dilated convolutional neural network(1D-DCNN)for speech emotion recognition(SER)that utilizes the hierarchical features learning blocks(HFLBs)with a bi-directional gated recurrent unit(BiGRU).We designed a one-dimensional CNN network to enhance the speech signals,which uses a spectral analysis,and to extract the hidden patterns from the speech signals that are fed into a stacked one-dimensional dilated network that are called HFLBs.Each HFLB contains one dilated convolution layer(DCL),one batch normalization(BN),and one leaky_relu(Relu)layer in order to extract the emotional features using a hieratical correlation strategy.Furthermore,the learned emotional features are feed into a BiGRU in order to adjust the global weights and to recognize the temporal cues.The final state of the deep BiGRU is passed from a softmax classifier in order to produce the probabilities of the emotions.The proposed model was evaluated over three benchmarked datasets that included the IEMOCAP,EMO-DB,and RAVDESS,which achieved 72.75%,91.14%,and 78.01%accuracy,respectively.展开更多
In response to many multi-attribute decision-making(MADM)problems involved in chemical processes such as controller tuning,which suffer human's subjective preferential nature in human–computer interactions,a nove...In response to many multi-attribute decision-making(MADM)problems involved in chemical processes such as controller tuning,which suffer human's subjective preferential nature in human–computer interactions,a novel affective computing and preferential evolutionary solution is proposed to adapt human–computer interaction mechanism.Based on the stimulating response mechanism,an improved affective computing model is introduced to quantify decision maker's preference in selections of interactive evolutionary computing.In addition,the mathematical relationship between affective space and decision maker's preferences is constructed.Subsequently,a human–computer interactive preferential evolutionary algorithm for MADM problems is proposed,which deals with attribute weights and optimal solutions based on preferential evolution metrics.To exemplify applications of the proposed methods,some test functions and,emphatically,controller tuning issues associated with a chemical process are investigated,giving satisfactory results.展开更多
This paper proposes a methodology for using multi-modal data in gameplay to detect outlier behavior.The proposedmethodology collects,synchronizes,and quantifies time-series data fromwebcams,mouses,and keyboards.Facial...This paper proposes a methodology for using multi-modal data in gameplay to detect outlier behavior.The proposedmethodology collects,synchronizes,and quantifies time-series data fromwebcams,mouses,and keyboards.Facial expressions are varied on a one-dimensional pleasure axis,and changes in expression in the mouth and eye areas are detected separately.Furthermore,the keyboard and mouse input frequencies are tracked to determine the interaction intensity of users.Then,we apply a dynamic time warp algorithm to detect outlier behavior.The detected outlier behavior graph patterns were the play patterns that the game designer did not intend or play patterns that differed greatly from those of other users.These outlier patterns can provide game designers with feedback on the actual play experiences of users of the game.Our results can be applied to the game industry as game user experience analysis,enabling a quantitative evaluation of the excitement of a game.展开更多
Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition.Existing approaches use directional pairwise attention or a messag...Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition.Existing approaches use directional pairwise attention or a message hub to fuse language,visual,and audio modalities.However,these fusion methods are often quadratic in complexity with respect to the modal sequence length,bring redundant information and are not efficient.In this paper,we propose an efficient neural network to learn modality-fused representations with CB-Transformer(LMR-CBT)for multimodal emotion recognition from unaligned multi-modal sequences.Specifically,we first perform feature extraction for the three modalities respectively to obtain the local structure of the sequences.Then,we design an innovative asymmetric transformer with cross-modal blocks(CB-Transformer)that enables complementary learning of different modalities,mainly divided into local temporal learning,cross-modal feature fusion and global self-attention representations.In addition,we splice the fused features with the original features to classify the emotions of the sequences.Finally,we conduct word-aligned and unaligned experiments on three challenging datasets,IEMOCAP,CMU-MOSI,and CMU-MOSEI.The experimental results show the superiority and efficiency of our proposed method in both settings.Compared with the mainstream methods,our approach reaches the state-of-the-art with a minimum number of parameters.展开更多
A personalized emotion space is proposed to bridge the "affective gap" in video affective content understanding. In order to unify the discrete and dimensional emotion model, fuzzy C-mean (FCM) clustering algorith...A personalized emotion space is proposed to bridge the "affective gap" in video affective content understanding. In order to unify the discrete and dimensional emotion model, fuzzy C-mean (FCM) clustering algorithm is adopted to divide the emotion space. Gaussian mixture model (GMM) is used to determine the membership functions of typical affective subspaces. At every step of modeling the space, the inputs rely completely on the affective experiences recorded by the audiences. The advantages of the improved V-A (Velance-Arousal) emotion model are the per- sonalization, the ability to define typical affective state areas in the V-A emotion space, and the convenience to explicitly express the intensity of each affective state. The experimental results validate the model and show it can be used as a personalized emotion space for video affective content representation.展开更多
Traditional industrial process control activities relevant to multi-objective optimization problems,such as proportional integral derivative(PID)parameter tuning and operational optimizations,always demand for process...Traditional industrial process control activities relevant to multi-objective optimization problems,such as proportional integral derivative(PID)parameter tuning and operational optimizations,always demand for process knowledge and human operators’experiences during human-computer interactions.However,the impact of human operators’preferences on human-computer interactions has been rarely highlighted ever since.In response to this problem,a novel multilayer cognitive affective computing model based on human personalities and pleasure-arousal-dominance(PAD)emotional space states is established in this paper.Therein,affective preferences are employed to update the affective computing model during human-machine interactions.Accordingly,we propose affective parameters mining strategies based on genetic algorithms(GAs),which are responsible for gradually grasping human operators’operational preferences in the process control activities.Two routine process control tasks,including PID controller tuning for coupling loops and operational optimization for batch beer fermenter processes,are carried out to illustrate the effectiveness of the contributions,leading to the satisfactory results.展开更多
The COVID-19 pandemic has had a profound impact on public mental health,leading to a surge in loneliness,depression,and anxiety.And these public psychological issues increasingly become a factor affecting social order...The COVID-19 pandemic has had a profound impact on public mental health,leading to a surge in loneliness,depression,and anxiety.And these public psychological issues increasingly become a factor affecting social order.As researchers explore ways to address these issues,artificial intelligence(AI)has emerged as a powerful tool for understanding and supporting mental health.In this paper,we provide a thorough literature review on the emotions(EMO)of loneliness,depression,and anxiety(EMO-LDA)before and during the COVID-19 pandemic.Additionally,we evaluate the application of AI in EMO-LDA research from 2018 to 2023(AI-LDA)using Latent Dirichlet Allocation(LDA)topic modeling.Our analysis reveals a significant increase in the proportion of literature on EMO-LDA and AI-LDA before and during the COVID-19 pandemic.We also observe changes in research hotspots and trends in both field.Moreover,our results suggest that the collaborative research of EMO-LDA and AI-LDA is a promising direction for future research.In conclusion,our review highlights the urgent need for effective interventions to address the mental health challenges posed by the COVID-19 pandemic.Our findings suggest that the integration of AI in EMO-LDA research has the potential to provide new insights and solutions to support individuals facing loneliness,depression,and anxiety.And we hope that our study will inspire further research in this vital and revelant domin.展开更多
Remote photoplethysmography (rPPG) allows remote measurement of the heart rate using low-cost RGB imaging equipment. In this study, we review the development of the field of rPPG since its emergence in 2008. We also...Remote photoplethysmography (rPPG) allows remote measurement of the heart rate using low-cost RGB imaging equipment. In this study, we review the development of the field of rPPG since its emergence in 2008. We also classify existing rPPG approaches and derive a framework that provides an overview of modular steps. Based on this framework, practitioners can use our classification to design algorithms for an rPPG approach that suits their specific needs. Researchers can use the reviewed and classified algorithms as a starting point to improve particular features of an rPPG algorithm.展开更多
Background:Physiological signal-based research has been a hot topic in affective computing.Previous works mainly focus on some strong,short-lived emotions(e.g.,joy,anger),while the attention,which is a weak and long-l...Background:Physiological signal-based research has been a hot topic in affective computing.Previous works mainly focus on some strong,short-lived emotions(e.g.,joy,anger),while the attention,which is a weak and long-lasting emotion,receives less attraction.In this paper,we present a study of attention recognition based on electrocardiogram(ECG)signals,which contain a wealth of information related to emotions.Methods:The ECG dataset is derived from 10 subjects and specialized for attention detection.To relieve the impact of noise of baseline wondering and power-line interference,we apply wavelet threshold denoising as preprocessing and extract rich features by pan-tompkins and wavelet decomposition algorithms.To improve the generalized ability,we tested the performance of a variety of combinations of different feature selection algorithms and classifiers.Results:Experiments show that the combination of generic algorithm and random forest achieve the highest correct classification rate(CCR)of 86.3%.Conclusion:This study indicates the feasibility and bright future of ECG-based attention research.展开更多
Emotion-based features are critical for achieving high performance in a speech emotion recognition(SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this p...Emotion-based features are critical for achieving high performance in a speech emotion recognition(SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms(including K-means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. We also show that the two-layer network cannot explicitly improve performance compared to a single-layer network.展开更多
Since the rapid spread of the COVID-19 worldwide,the pandemic has led to a huge impact on global sporting events.As a major international event,the 2022 Beijing Winter Olympics has commonalities with the 2008 Beijing ...Since the rapid spread of the COVID-19 worldwide,the pandemic has led to a huge impact on global sporting events.As a major international event,the 2022 Beijing Winter Olympics has commonalities with the 2008 Beijing Olympics,the 2014 Sochi Winter Olympics,and the 2020 Tokyo Olympics in terms of international public opinion context and epidemiological background.In this study,over 1 million pieces of UGC(User Generated Contents)in Chinese and English languages were obtained from social media platforms such as Twitter,YouTube,as well as traditional mass media in various countries to compare the differences between the two languages in international public opinion.Using sentiment analysis,this study explores the evolution of international public opinion topics and sentiment differences among the above four Olympic Games.The analysis results show that:1)regardless of traditional mass media or online social media,there is a more obvious tendency of general politicization in the topics of the 2008 Beijing Olympics and 2022 Beijing Winter Olympics,and extreme emotional remarks of the 2022 Beijing Winter Olympics are more frequent;2)in the topic of political opinion involving China,international Chinese public opinion presents more negative sentiment than those in English;3)Among the topics involving COVID-19,the negative level of public opinion in Chinese and English is opposite for the 2020 Tokyo Olympics and the 2022 Beijing Winter Olympics;4)International public opinion on the topic of sports events is significantly more positive in Chinese than in English;5)YouTube’s Chinese opinion environment is better than English.展开更多
基金National Key Research and Development Plan of China, Grant/Award Number: 2021YFB3600503National Natural Science Foundation of China, Grant/Award Numbers: 62276065, U21A20472。
文摘Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional se-mantics. The other is fusing complementary cross‐modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross‐modal attention. As a result, the located nonverbal fea-tures are not only salient but also complementary to sentiment words directly. Experi-mental results show that the authors’ method achieves state‐of‐the‐art performance on several multimodal affective analysis datasets.
文摘Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines.Speech Emotion Recognition(SER)is one of the critical sources for human evaluation,which is applicable in many real-world applications such as healthcare,call centers,robotics,safety,and virtual reality.This work developed a novel TCN-based emotion recognition system using speech signals through a spatial-temporal convolution network to recognize the speaker’s emotional state.The authors designed a Temporal Convolutional Network(TCN)core block to recognize long-term dependencies in speech signals and then feed these temporal cues to a dense network to fuse the spatial features and recognize global information for final classification.The proposed network extracts valid sequential cues automatically from speech signals,which performed better than state-of-the-art(SOTA)and traditional machine learning algorithms.Results of the proposed method show a high recognition rate compared with SOTAmethods.The final unweighted accuracy of 80.84%,and 92.31%,for interactive emotional dyadic motion captures(IEMOCAP)and berlin emotional dataset(EMO-DB),indicate the robustness and efficiency of the designed model.
基金supported by the National Research Foundation of Korea funded by the Korean Government through the Ministry of Science and ICT under Grant NRF-2020R1F1A1060659 and in part by the 2020 Faculty Research Fund of Sejong University。
文摘Emotion recognition from speech data is an active and emerging area of research that plays an important role in numerous applications,such as robotics,virtual reality,behavior assessments,and emergency call centers.Recently,researchers have developed many techniques in this field in order to ensure an improvement in the accuracy by utilizing several deep learning approaches,but the recognition rate is still not convincing.Our main aim is to develop a new technique that increases the recognition rate with reasonable cost computations.In this paper,we suggested a new technique,which is a one-dimensional dilated convolutional neural network(1D-DCNN)for speech emotion recognition(SER)that utilizes the hierarchical features learning blocks(HFLBs)with a bi-directional gated recurrent unit(BiGRU).We designed a one-dimensional CNN network to enhance the speech signals,which uses a spectral analysis,and to extract the hidden patterns from the speech signals that are fed into a stacked one-dimensional dilated network that are called HFLBs.Each HFLB contains one dilated convolution layer(DCL),one batch normalization(BN),and one leaky_relu(Relu)layer in order to extract the emotional features using a hieratical correlation strategy.Furthermore,the learned emotional features are feed into a BiGRU in order to adjust the global weights and to recognize the temporal cues.The final state of the deep BiGRU is passed from a softmax classifier in order to produce the probabilities of the emotions.The proposed model was evaluated over three benchmarked datasets that included the IEMOCAP,EMO-DB,and RAVDESS,which achieved 72.75%,91.14%,and 78.01%accuracy,respectively.
基金Supported by the Fundamental Research Funds for the Central Universities(ZY1347and YS1404)
文摘In response to many multi-attribute decision-making(MADM)problems involved in chemical processes such as controller tuning,which suffer human's subjective preferential nature in human–computer interactions,a novel affective computing and preferential evolutionary solution is proposed to adapt human–computer interaction mechanism.Based on the stimulating response mechanism,an improved affective computing model is introduced to quantify decision maker's preference in selections of interactive evolutionary computing.In addition,the mathematical relationship between affective space and decision maker's preferences is constructed.Subsequently,a human–computer interactive preferential evolutionary algorithm for MADM problems is proposed,which deals with attribute weights and optimal solutions based on preferential evolution metrics.To exemplify applications of the proposed methods,some test functions and,emphatically,controller tuning issues associated with a chemical process are investigated,giving satisfactory results.
基金This research was supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2021R1I1A3058103).
文摘This paper proposes a methodology for using multi-modal data in gameplay to detect outlier behavior.The proposedmethodology collects,synchronizes,and quantifies time-series data fromwebcams,mouses,and keyboards.Facial expressions are varied on a one-dimensional pleasure axis,and changes in expression in the mouth and eye areas are detected separately.Furthermore,the keyboard and mouse input frequencies are tracked to determine the interaction intensity of users.Then,we apply a dynamic time warp algorithm to detect outlier behavior.The detected outlier behavior graph patterns were the play patterns that the game designer did not intend or play patterns that differed greatly from those of other users.These outlier patterns can provide game designers with feedback on the actual play experiences of users of the game.Our results can be applied to the game industry as game user experience analysis,enabling a quantitative evaluation of the excitement of a game.
基金National Natural Science Foundation of China(Grant No.72293583).
文摘Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition.Existing approaches use directional pairwise attention or a message hub to fuse language,visual,and audio modalities.However,these fusion methods are often quadratic in complexity with respect to the modal sequence length,bring redundant information and are not efficient.In this paper,we propose an efficient neural network to learn modality-fused representations with CB-Transformer(LMR-CBT)for multimodal emotion recognition from unaligned multi-modal sequences.Specifically,we first perform feature extraction for the three modalities respectively to obtain the local structure of the sequences.Then,we design an innovative asymmetric transformer with cross-modal blocks(CB-Transformer)that enables complementary learning of different modalities,mainly divided into local temporal learning,cross-modal feature fusion and global self-attention representations.In addition,we splice the fused features with the original features to classify the emotions of the sequences.Finally,we conduct word-aligned and unaligned experiments on three challenging datasets,IEMOCAP,CMU-MOSI,and CMU-MOSEI.The experimental results show the superiority and efficiency of our proposed method in both settings.Compared with the mainstream methods,our approach reaches the state-of-the-art with a minimum number of parameters.
基金Supported by the National Natural Science Foundation of China (60703049)the "Chenguang" Foundation for Young Scientists (200850731353)the National Postdoctoral Foundation of China (20060400847)
文摘A personalized emotion space is proposed to bridge the "affective gap" in video affective content understanding. In order to unify the discrete and dimensional emotion model, fuzzy C-mean (FCM) clustering algorithm is adopted to divide the emotion space. Gaussian mixture model (GMM) is used to determine the membership functions of typical affective subspaces. At every step of modeling the space, the inputs rely completely on the affective experiences recorded by the audiences. The advantages of the improved V-A (Velance-Arousal) emotion model are the per- sonalization, the ability to define typical affective state areas in the V-A emotion space, and the convenience to explicitly express the intensity of each affective state. The experimental results validate the model and show it can be used as a personalized emotion space for video affective content representation.
基金the National Natural Science Foundation of China(No.61603023)。
文摘Traditional industrial process control activities relevant to multi-objective optimization problems,such as proportional integral derivative(PID)parameter tuning and operational optimizations,always demand for process knowledge and human operators’experiences during human-computer interactions.However,the impact of human operators’preferences on human-computer interactions has been rarely highlighted ever since.In response to this problem,a novel multilayer cognitive affective computing model based on human personalities and pleasure-arousal-dominance(PAD)emotional space states is established in this paper.Therein,affective preferences are employed to update the affective computing model during human-machine interactions.Accordingly,we propose affective parameters mining strategies based on genetic algorithms(GAs),which are responsible for gradually grasping human operators’operational preferences in the process control activities.Two routine process control tasks,including PID controller tuning for coupling loops and operational optimization for batch beer fermenter processes,are carried out to illustrate the effectiveness of the contributions,leading to the satisfactory results.
文摘The COVID-19 pandemic has had a profound impact on public mental health,leading to a surge in loneliness,depression,and anxiety.And these public psychological issues increasingly become a factor affecting social order.As researchers explore ways to address these issues,artificial intelligence(AI)has emerged as a powerful tool for understanding and supporting mental health.In this paper,we provide a thorough literature review on the emotions(EMO)of loneliness,depression,and anxiety(EMO-LDA)before and during the COVID-19 pandemic.Additionally,we evaluate the application of AI in EMO-LDA research from 2018 to 2023(AI-LDA)using Latent Dirichlet Allocation(LDA)topic modeling.Our analysis reveals a significant increase in the proportion of literature on EMO-LDA and AI-LDA before and during the COVID-19 pandemic.We also observe changes in research hotspots and trends in both field.Moreover,our results suggest that the collaborative research of EMO-LDA and AI-LDA is a promising direction for future research.In conclusion,our review highlights the urgent need for effective interventions to address the mental health challenges posed by the COVID-19 pandemic.Our findings suggest that the integration of AI in EMO-LDA research has the potential to provide new insights and solutions to support individuals facing loneliness,depression,and anxiety.And we hope that our study will inspire further research in this vital and revelant domin.
文摘Remote photoplethysmography (rPPG) allows remote measurement of the heart rate using low-cost RGB imaging equipment. In this study, we review the development of the field of rPPG since its emergence in 2008. We also classify existing rPPG approaches and derive a framework that provides an overview of modular steps. Based on this framework, practitioners can use our classification to design algorithms for an rPPG approach that suits their specific needs. Researchers can use the reviewed and classified algorithms as a starting point to improve particular features of an rPPG algorithm.
基金The work of this paper is financially supported by NSF of Guangdong Province(No.2019A1515010833)the Fundamental Research Funds for the Central Universities(No.2020ZYGXZR089)the Social Science Research Base of Guangdong Province-Research Center of Network Civilization in New Era of SCUT.
文摘Background:Physiological signal-based research has been a hot topic in affective computing.Previous works mainly focus on some strong,short-lived emotions(e.g.,joy,anger),while the attention,which is a weak and long-lasting emotion,receives less attraction.In this paper,we present a study of attention recognition based on electrocardiogram(ECG)signals,which contain a wealth of information related to emotions.Methods:The ECG dataset is derived from 10 subjects and specialized for attention detection.To relieve the impact of noise of baseline wondering and power-line interference,we apply wavelet threshold denoising as preprocessing and extract rich features by pan-tompkins and wavelet decomposition algorithms.To improve the generalized ability,we tested the performance of a variety of combinations of different feature selection algorithms and classifiers.Results:Experiments show that the combination of generic algorithm and random forest achieve the highest correct classification rate(CCR)of 86.3%.Conclusion:This study indicates the feasibility and bright future of ECG-based attention research.
基金supported by the National Natural Science Foundation of China(Nos.61272211 and 61170126)the Six Talent Peaks Foundation of Jiangsu Province,China(No.DZXX027)
文摘Emotion-based features are critical for achieving high performance in a speech emotion recognition(SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms(including K-means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. We also show that the two-layer network cannot explicitly improve performance compared to a single-layer network.
基金supported by the Special Funds of the National Nat-ural Science Foundation of China(72042004)Also supported by the Research Project of Shanghai Science and Technology Commission(20dz2260300)the Fundamental Research Funds for the Central Universities.We would like to thank the anonymous reviewers for their valuable suggestions.
文摘Since the rapid spread of the COVID-19 worldwide,the pandemic has led to a huge impact on global sporting events.As a major international event,the 2022 Beijing Winter Olympics has commonalities with the 2008 Beijing Olympics,the 2014 Sochi Winter Olympics,and the 2020 Tokyo Olympics in terms of international public opinion context and epidemiological background.In this study,over 1 million pieces of UGC(User Generated Contents)in Chinese and English languages were obtained from social media platforms such as Twitter,YouTube,as well as traditional mass media in various countries to compare the differences between the two languages in international public opinion.Using sentiment analysis,this study explores the evolution of international public opinion topics and sentiment differences among the above four Olympic Games.The analysis results show that:1)regardless of traditional mass media or online social media,there is a more obvious tendency of general politicization in the topics of the 2008 Beijing Olympics and 2022 Beijing Winter Olympics,and extreme emotional remarks of the 2022 Beijing Winter Olympics are more frequent;2)in the topic of political opinion involving China,international Chinese public opinion presents more negative sentiment than those in English;3)Among the topics involving COVID-19,the negative level of public opinion in Chinese and English is opposite for the 2020 Tokyo Olympics and the 2022 Beijing Winter Olympics;4)International public opinion on the topic of sports events is significantly more positive in Chinese than in English;5)YouTube’s Chinese opinion environment is better than English.