In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm base...In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm based on the combination of the emotional data field (EDF) and the ant colony search (ACS) strategy, called the EDF-ACS algorithm, is proposed. More specifically, the inter- relationship among the turn-based acoustic feature vectors of different labels are established by using the potential function in the EDF. To perform the spontaneous speech emotion recognition, the artificial colony is used to mimic the turn- based acoustic feature vectors. Then, the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF, which is regarded as the emotional label of the corresponding turn-based acoustic feature vector. The proposed EDF-ACS algorithm is evaluated on the continueous audio)'visual emotion challenge (AVEC) 2012 dataset, which contains the spontaneous, non-prototypical and unsegmented speech emotion data. The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithm in turn-based speech emotion recognition.展开更多
This study aimed at comparing the level of social presence generated in a voice-based chat room and a text-based forum when learners tried to build personal relationships and form an online community for learning on a...This study aimed at comparing the level of social presence generated in a voice-based chat room and a text-based forum when learners tried to build personal relationships and form an online community for learning on an online language course in China. A mixed-method approach was taken for the study, drawing on data from questionnaires to find out about student perception of social presence, and postings of text messages and audio messages in the communication of the student learning process to search for students’ projected social presence in terms of affective, interactive and cohesive features. Interviews were also conducted to supplement additional information with the hope of forming a complete picture of social presence in the reality of an online learning environment. The text-based forum and the voice-based chat room were found to have a different impact on student social presence. In terms of student perception, most of them were more likely to get to know peers in the text-based forum and thus developed a sense of community in their learning process of the online course. Yet they believed that the voice-based chat room had the advantage of helping them with course learning. In the actual interaction, the voice-based chat room was more interactive although the text-based forum was more affective and cohesive. But in terms of the affective category, the problem with the existing framework in literature was that there were no prosodic features included. Therefore, in future more research is needed to probe for the relationship between prosodic sound features and social presence, and the present theoretic framework must be extended. In interviews, students explained that in the voice-based chat room prosodic features led to higher peer awareness, which further reinforced this need.展开更多
It is widely accepted nowadays that intelligibility is the essential goal for most learners of English,and it is not necessary for them to mimic all aspects of native-speaker English in order to achieve a high level o...It is widely accepted nowadays that intelligibility is the essential goal for most learners of English,and it is not necessary for them to mimic all aspects of native-speaker English in order to achieve a high level of intelligibility.However,the features that are needed in order to make oneself easily understood by listeners from elsewhere remain controversial.The current research focuses on thirteen five-minute recordings of conversations between young speakers of English in central China and an interviewer from Britain,in order to determine which features of their speech gave rise to misunderstandings.It was found that,in the 18 tokens of misunderstanding identified,4 resulted from lexical semantics(22%),3 from Chinese place names(17%),3 from grammar(17%),and 11 from pronunciation(61%)(with some tokens cross-classified).The most common phonological factors giving rise to loss of intelligibility were omission of syllables and simplification of word-initial consonant clusters.展开更多
基金The National Natural Science Foundation of China(No.61231002,61273266,61571106)the Foundation of the Department of Science and Technology of Guizhou Province(No.[2015]7637)
文摘In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm based on the combination of the emotional data field (EDF) and the ant colony search (ACS) strategy, called the EDF-ACS algorithm, is proposed. More specifically, the inter- relationship among the turn-based acoustic feature vectors of different labels are established by using the potential function in the EDF. To perform the spontaneous speech emotion recognition, the artificial colony is used to mimic the turn- based acoustic feature vectors. Then, the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF, which is regarded as the emotional label of the corresponding turn-based acoustic feature vector. The proposed EDF-ACS algorithm is evaluated on the continueous audio)'visual emotion challenge (AVEC) 2012 dataset, which contains the spontaneous, non-prototypical and unsegmented speech emotion data. The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithm in turn-based speech emotion recognition.
文摘This study aimed at comparing the level of social presence generated in a voice-based chat room and a text-based forum when learners tried to build personal relationships and form an online community for learning on an online language course in China. A mixed-method approach was taken for the study, drawing on data from questionnaires to find out about student perception of social presence, and postings of text messages and audio messages in the communication of the student learning process to search for students’ projected social presence in terms of affective, interactive and cohesive features. Interviews were also conducted to supplement additional information with the hope of forming a complete picture of social presence in the reality of an online learning environment. The text-based forum and the voice-based chat room were found to have a different impact on student social presence. In terms of student perception, most of them were more likely to get to know peers in the text-based forum and thus developed a sense of community in their learning process of the online course. Yet they believed that the voice-based chat room had the advantage of helping them with course learning. In the actual interaction, the voice-based chat room was more interactive although the text-based forum was more affective and cohesive. But in terms of the affective category, the problem with the existing framework in literature was that there were no prosodic features included. Therefore, in future more research is needed to probe for the relationship between prosodic sound features and social presence, and the present theoretic framework must be extended. In interviews, students explained that in the voice-based chat room prosodic features led to higher peer awareness, which further reinforced this need.
基金funded by State Administration of Foreign Expert Affairs(project GDT20173200030&project G20190214022).
文摘It is widely accepted nowadays that intelligibility is the essential goal for most learners of English,and it is not necessary for them to mimic all aspects of native-speaker English in order to achieve a high level of intelligibility.However,the features that are needed in order to make oneself easily understood by listeners from elsewhere remain controversial.The current research focuses on thirteen five-minute recordings of conversations between young speakers of English in central China and an interviewer from Britain,in order to determine which features of their speech gave rise to misunderstandings.It was found that,in the 18 tokens of misunderstanding identified,4 resulted from lexical semantics(22%),3 from Chinese place names(17%),3 from grammar(17%),and 11 from pronunciation(61%)(with some tokens cross-classified).The most common phonological factors giving rise to loss of intelligibility were omission of syllables and simplification of word-initial consonant clusters.