Speech recognition rate will deteriorate greatly in human-machine interaction when the speaker's speech mixes with a bystander's voice. This paper proposes a time-frequency approach for Blind Source Seperation...Speech recognition rate will deteriorate greatly in human-machine interaction when the speaker's speech mixes with a bystander's voice. This paper proposes a time-frequency approach for Blind Source Seperation (BSS) for intelligent Human-Machine Interaction(HMI). Main idea of the algorithm is to simultaneously diagonalize the correlation matrix of the pre-whitened signals at different time delays for every frequency bins in time-frequency domain. The prososed method has two merits: (1) fast convergence speed; (2) high signal to interference ratio of the separated signals. Numerical evaluations are used to compare the performance of the proposed algorithm with two other deconvolution algorithms. An efficient algorithm to resolve permutation ambiguity is also proposed in this paper. The algorithm proposed saves more than 10% of computational time with properly selected parameters and achieves good performances for both simulated convolutive mixtures and real room recorded speeches.展开更多
Children’s perspective is based on their own cognitive level in understanding objective things.The study of children’s perspective is a bottom-up research process under the premise of having a full respect for a ch...Children’s perspective is based on their own cognitive level in understanding objective things.The study of children’s perspective is a bottom-up research process under the premise of having a full respect for a child’s view.With the change of views about children in recent years,“children’s perspective”has become a new research direction.At the same time,teacher-child interaction,as an important means of evaluating the quality of kindergarten education,requires a bottom-up perspective from children.This study hopes to understand children’s emotional experience in the process of teacher-child interaction as well as their understanding and evaluation of their own experience by exploring their perspectives on the interaction,so as to better improve the quality of teacher-child interaction in kindergarten.展开更多
In this work, a hybrid method is proposed to eliminate the limitations of traditional protein-protein interactions (PPIs) extraction methods, such as pattern learning and machine learning. Each sentence from the bio...In this work, a hybrid method is proposed to eliminate the limitations of traditional protein-protein interactions (PPIs) extraction methods, such as pattern learning and machine learning. Each sentence from the biomedical literature containing a protein pair describes a PPI which is predicted by first learning syntax patterns typical of PPIs from training corpus and then using their presence as features, along with bag-of-word features in a maximum entropy model. Tested on the BioCreAtIve corpus, the PPIs extraction method, which achieved a precision rate of 64%, recall rate of 60%, improved the performance in terms of F1 value by 11% compared with the component pure pattern- based and bag-of-word methods. The results on this test set were also compared with other three extraction methods and found to improve the performance remarkably.展开更多
In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm base...In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm based on the combination of the emotional data field (EDF) and the ant colony search (ACS) strategy, called the EDF-ACS algorithm, is proposed. More specifically, the inter- relationship among the turn-based acoustic feature vectors of different labels are established by using the potential function in the EDF. To perform the spontaneous speech emotion recognition, the artificial colony is used to mimic the turn- based acoustic feature vectors. Then, the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF, which is regarded as the emotional label of the corresponding turn-based acoustic feature vector. The proposed EDF-ACS algorithm is evaluated on the continueous audio)'visual emotion challenge (AVEC) 2012 dataset, which contains the spontaneous, non-prototypical and unsegmented speech emotion data. The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithm in turn-based speech emotion recognition.展开更多
Background One of the most critical issues in human-computer interaction applications is recognizing human emotions based on speech.In recent years,the challenging problem of cross-corpus speech emotion recognition(SE...Background One of the most critical issues in human-computer interaction applications is recognizing human emotions based on speech.In recent years,the challenging problem of cross-corpus speech emotion recognition(SER)has generated extensive research.Nevertheless,the domain discrepancy between training data and testing data remains a major challenge to achieving improved system performance.Methods This paper introduces a novel multi-scale discrepancy adversarial(MSDA)network for conducting multiple timescales domain adaptation for cross-corpus SER,i.e.,integrating domain discriminators of hierarchical levels into the emotion recognition framework to mitigate the gap between the source and target domains.Specifically,we extract two kinds of speech features,i.e.,handcraft features and deep features,from three timescales of global,local,and hybrid levels.In each timescale,the domain discriminator and the feature extrator compete against each other to learn features that minimize the discrepancy between the two domains by fooling the discriminator.Results Extensive experiments on cross-corpus and cross-language SER were conducted on a combination dataset that combines one Chinese dataset and two English datasets commonly used in SER.The MSDA is affected by the strong discriminate power provided by the adversarial process,where three discriminators are working in tandem with an emotion classifier.Accordingly,the MSDA achieves the best performance over all other baseline methods.Conclusions The proposed architecture was tested on a combination of one Chinese and two English datasets.The experimental results demonstrate the superiority of our powerful discriminative model for solving cross-corpus SER.展开更多
目的:针对野战噪声条件下便携式野战医疗装备的语音交互性能受到影响的问题,设计一种小尺寸双麦前端系统。方法:该系统基于最小二乘准则实现小尺寸双麦波速形成,进而实现前端语音增强。系统硬件主要由双麦、信号预处理模块、嵌入式处理...目的:针对野战噪声条件下便携式野战医疗装备的语音交互性能受到影响的问题,设计一种小尺寸双麦前端系统。方法:该系统基于最小二乘准则实现小尺寸双麦波速形成,进而实现前端语音增强。系统硬件主要由双麦、信号预处理模块、嵌入式处理器、模拟数字转换器(analog to digital converter,ADC)、数字模拟转换器(digital to analog converter,DAC)、供电模块等组成。其中,双麦采用2个贴片式微机电系统(micro electro mechanical system,MEMS)麦克风,信号预处理模块、ADC、DAC内置在通用音频编码器WM8978中,嵌入式处理器采用STM32F405系列处理器,供电模块采用LM1117电压调节器芯片。系统软件采用KeilμVision4开发软件编译和测试。为验证该系统的性能,进行指向性实验和语音增强实验。结果:指向性实验结果表明,在0.5~2.0 kHz频率范围内,该系统在各频点的指向性一致性较好;语音增强实验结果表明,在枪声、监护仪报警、医疗器皿碰撞3类非平稳噪声条件下,该系统可有效提升语音的音质及识别率。结论:该系统能实现语音增强,可为便携式野战医疗装备的语音交互提供有效的支持。展开更多
Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable atte...Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable attention in the field. However, each of the three hypotheses of the theory still needs further verification. In this review, we focus on how the auditory-motor anatomical and functional associations play a role in speech perception and discuss why previous studies could not reach an agreement and particularly whether the motor system involvement in speech perception is task-load dependent. Finally, we suggest that the function of the auditory-motor link is particularly useful for speech perception under adverse listening conditions and the further revised Motor Theory is a potential solution to the "cocktail-party" problem.展开更多
This paper presents the first report of a system of human's speech interaction with rats via integration of brain–machine interfaces and automatic speech recognition technologies. We propose a novel human–rat sp...This paper presents the first report of a system of human's speech interaction with rats via integration of brain–machine interfaces and automatic speech recognition technologies. We propose a novel human–rat speech interaction paradigm by incorporating speech translator module, which translates human's speech commands into suitable electrical brain stimulation to steer the rat to induce expected locomotor behaviors. The preliminary results show that we can guide a rat's movement by speech commands. We further look into the future application scenarios together with forthcoming challenges facing this newly evolved cyborg intelligent system. This work will pave the way for natural interaction with animal robots.展开更多
传统的跨语种交互翻译机器人语义纠错方法通常是单向的,效率较低,导致识别错误率较高。为此,文章提出基于语音信号的跨语种交互翻译机器人语义纠错方法。在基础语音识别的基础上,通过交互标定和特征提取来修正语义错误位置,并设计语音...传统的跨语种交互翻译机器人语义纠错方法通常是单向的,效率较低,导致识别错误率较高。为此,文章提出基于语音信号的跨语种交互翻译机器人语义纠错方法。在基础语音识别的基础上,通过交互标定和特征提取来修正语义错误位置,并设计语音信号翻译机器人的语义纠错模型,采用随时间反向传播(Backpropagation Through Time,BPTT)循环训练核验方式,以确保纠错的准确性。测试结果显示,经过3个阶段测试,选定的5段语音材料的纠错识别率成功控制在10%以下,表明基于语音信号的跨语种交互翻译机器人语义纠错方法高效,具有实际应用价值。展开更多
文摘Speech recognition rate will deteriorate greatly in human-machine interaction when the speaker's speech mixes with a bystander's voice. This paper proposes a time-frequency approach for Blind Source Seperation (BSS) for intelligent Human-Machine Interaction(HMI). Main idea of the algorithm is to simultaneously diagonalize the correlation matrix of the pre-whitened signals at different time delays for every frequency bins in time-frequency domain. The prososed method has two merits: (1) fast convergence speed; (2) high signal to interference ratio of the separated signals. Numerical evaluations are used to compare the performance of the proposed algorithm with two other deconvolution algorithms. An efficient algorithm to resolve permutation ambiguity is also proposed in this paper. The algorithm proposed saves more than 10% of computational time with properly selected parameters and achieves good performances for both simulated convolutive mixtures and real room recorded speeches.
基金This research was supported by Jilin Province Vocational Education and Adult Education Teaching Reform Research Project(2021ZCY338)。
文摘Children’s perspective is based on their own cognitive level in understanding objective things.The study of children’s perspective is a bottom-up research process under the premise of having a full respect for a child’s view.With the change of views about children in recent years,“children’s perspective”has become a new research direction.At the same time,teacher-child interaction,as an important means of evaluating the quality of kindergarten education,requires a bottom-up perspective from children.This study hopes to understand children’s emotional experience in the process of teacher-child interaction as well as their understanding and evaluation of their own experience by exploring their perspectives on the interaction,so as to better improve the quality of teacher-child interaction in kindergarten.
文摘In this work, a hybrid method is proposed to eliminate the limitations of traditional protein-protein interactions (PPIs) extraction methods, such as pattern learning and machine learning. Each sentence from the biomedical literature containing a protein pair describes a PPI which is predicted by first learning syntax patterns typical of PPIs from training corpus and then using their presence as features, along with bag-of-word features in a maximum entropy model. Tested on the BioCreAtIve corpus, the PPIs extraction method, which achieved a precision rate of 64%, recall rate of 60%, improved the performance in terms of F1 value by 11% compared with the component pure pattern- based and bag-of-word methods. The results on this test set were also compared with other three extraction methods and found to improve the performance remarkably.
基金The National Natural Science Foundation of China(No.61231002,61273266,61571106)the Foundation of the Department of Science and Technology of Guizhou Province(No.[2015]7637)
文摘In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm based on the combination of the emotional data field (EDF) and the ant colony search (ACS) strategy, called the EDF-ACS algorithm, is proposed. More specifically, the inter- relationship among the turn-based acoustic feature vectors of different labels are established by using the potential function in the EDF. To perform the spontaneous speech emotion recognition, the artificial colony is used to mimic the turn- based acoustic feature vectors. Then, the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF, which is regarded as the emotional label of the corresponding turn-based acoustic feature vector. The proposed EDF-ACS algorithm is evaluated on the continueous audio)'visual emotion challenge (AVEC) 2012 dataset, which contains the spontaneous, non-prototypical and unsegmented speech emotion data. The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithm in turn-based speech emotion recognition.
基金the National Nature Science Foundation of China(U2003207,61902064)the Jiangsu Frontier Technology Basic Research Project(BK20192004).
文摘Background One of the most critical issues in human-computer interaction applications is recognizing human emotions based on speech.In recent years,the challenging problem of cross-corpus speech emotion recognition(SER)has generated extensive research.Nevertheless,the domain discrepancy between training data and testing data remains a major challenge to achieving improved system performance.Methods This paper introduces a novel multi-scale discrepancy adversarial(MSDA)network for conducting multiple timescales domain adaptation for cross-corpus SER,i.e.,integrating domain discriminators of hierarchical levels into the emotion recognition framework to mitigate the gap between the source and target domains.Specifically,we extract two kinds of speech features,i.e.,handcraft features and deep features,from three timescales of global,local,and hybrid levels.In each timescale,the domain discriminator and the feature extrator compete against each other to learn features that minimize the discrepancy between the two domains by fooling the discriminator.Results Extensive experiments on cross-corpus and cross-language SER were conducted on a combination dataset that combines one Chinese dataset and two English datasets commonly used in SER.The MSDA is affected by the strong discriminate power provided by the adversarial process,where three discriminators are working in tandem with an emotion classifier.Accordingly,the MSDA achieves the best performance over all other baseline methods.Conclusions The proposed architecture was tested on a combination of one Chinese and two English datasets.The experimental results demonstrate the superiority of our powerful discriminative model for solving cross-corpus SER.
文摘目的:针对野战噪声条件下便携式野战医疗装备的语音交互性能受到影响的问题,设计一种小尺寸双麦前端系统。方法:该系统基于最小二乘准则实现小尺寸双麦波速形成,进而实现前端语音增强。系统硬件主要由双麦、信号预处理模块、嵌入式处理器、模拟数字转换器(analog to digital converter,ADC)、数字模拟转换器(digital to analog converter,DAC)、供电模块等组成。其中,双麦采用2个贴片式微机电系统(micro electro mechanical system,MEMS)麦克风,信号预处理模块、ADC、DAC内置在通用音频编码器WM8978中,嵌入式处理器采用STM32F405系列处理器,供电模块采用LM1117电压调节器芯片。系统软件采用KeilμVision4开发软件编译和测试。为验证该系统的性能,进行指向性实验和语音增强实验。结果:指向性实验结果表明,在0.5~2.0 kHz频率范围内,该系统在各频点的指向性一致性较好;语音增强实验结果表明,在枪声、监护仪报警、医疗器皿碰撞3类非平稳噪声条件下,该系统可有效提升语音的音质及识别率。结论:该系统能实现语音增强,可为便携式野战医疗装备的语音交互提供有效的支持。
基金supported by the National Basic Research Development Program of China (2009CB320901, 2011CB707805, 2013CB329304)the National Natural Science Foundation of China (31170985, 91120001, 61121002)"985" project grants from Peking University
文摘Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable attention in the field. However, each of the three hypotheses of the theory still needs further verification. In this review, we focus on how the auditory-motor anatomical and functional associations play a role in speech perception and discuss why previous studies could not reach an agreement and particularly whether the motor system involvement in speech perception is task-load dependent. Finally, we suggest that the function of the auditory-motor link is particularly useful for speech perception under adverse listening conditions and the further revised Motor Theory is a potential solution to the "cocktail-party" problem.
基金supported by the National Basic Research Program of China (2013CB329504)
文摘This paper presents the first report of a system of human's speech interaction with rats via integration of brain–machine interfaces and automatic speech recognition technologies. We propose a novel human–rat speech interaction paradigm by incorporating speech translator module, which translates human's speech commands into suitable electrical brain stimulation to steer the rat to induce expected locomotor behaviors. The preliminary results show that we can guide a rat's movement by speech commands. We further look into the future application scenarios together with forthcoming challenges facing this newly evolved cyborg intelligent system. This work will pave the way for natural interaction with animal robots.
文摘传统的跨语种交互翻译机器人语义纠错方法通常是单向的,效率较低,导致识别错误率较高。为此,文章提出基于语音信号的跨语种交互翻译机器人语义纠错方法。在基础语音识别的基础上,通过交互标定和特征提取来修正语义错误位置,并设计语音信号翻译机器人的语义纠错模型,采用随时间反向传播(Backpropagation Through Time,BPTT)循环训练核验方式,以确保纠错的准确性。测试结果显示,经过3个阶段测试,选定的5段语音材料的纠错识别率成功控制在10%以下,表明基于语音信号的跨语种交互翻译机器人语义纠错方法高效,具有实际应用价值。