Fear extinction is an important form of emotional learning, and affects neural plasticity. Cue fear extinction is a classical form of inhibitory learning that can be used as an exposure-based treatment for phobia, bec...Fear extinction is an important form of emotional learning, and affects neural plasticity. Cue fear extinction is a classical form of inhibitory learning that can be used as an exposure-based treatment for phobia, because the long-term extinction memory produced during cue fear extinction can limit the over-expression of fear. The expression of this inhibitory memory partly depends on the context in which the extinction learning occurs. Studies such as transient inhibition, electrophysiology and brain imaging have proved that the hippocampus - an important structure in the limbic system - facilitates memory retrieval by contextual cues. Mediation of the hippocampus-medial prefrontal lobe circuit may be the neurobiological basis of this process. This article has reviewed the role of the hippocampus in the learning and retrieval of fear extinction. Contextual modulation of fear extinction may rely on a neural network consisting of the hippocampus, the medial prefrontal cortex and the amygdala.展开更多
Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i...Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.展开更多
This paper presents a new discriminative approach for training Gaussian mixture models(GMMs)of hidden Markov models(HMMs)based acoustic model in a large vocabulary continuous speech recognition(LVCSR)system.This appro...This paper presents a new discriminative approach for training Gaussian mixture models(GMMs)of hidden Markov models(HMMs)based acoustic model in a large vocabulary continuous speech recognition(LVCSR)system.This approach is featured by embedding a rival penalized competitive learning(RPCL)mechanism on the level of hidden Markov states.For every input,the correct identity state,called winner and obtained by the Viterbi force alignment,is enhanced to describe this input while its most competitive rival is penalized by de-learning,which makes GMMs-based states become more discriminative.Without the extensive computing burden required by typical discriminative learning methods for one-pass recognition of the training set,the new approach saves computing costs considerably.Experiments show that the proposed method has a good convergence with better performances than the classical maximum likelihood estimation(MLE)based method.Comparing with two conventional discriminative methods,the proposed method demonstrates improved generalization ability,especially when the test set is not well matched with the training set.展开更多
The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was ai...The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was aimed to make a computational model considering the mechanism of informational masking. Several psy- choacoustic experiments were conducted to test the ef- fect of informational masking on speech intelligibility by manipulating the number of masking talker, speech rate, and the similarity of F0 contour between target and masker. The results showed that the speech recep- tion threshold for the target increased as the F0 contours of the masker became more similar to that of the tar- get, suggesting that the difficulty in segregating the tar- get harmonics from the masker harmonics may underlie the informational masking effect. Based on these stud- ies, a new auditory computational model was made by inducing the auditory function of harmonic extraction to the traditional model of speech intelligibility index (SII), named as harmonic extraction (HF) model. The predictions of the HF model are highly consistent with the experimental results.展开更多
Multiple-size units-based acoustic modeling has been proposed for large vocabulary speech recognition system to improve the recognition accuracy with limited training data.By introducing a limited number of long-size ...Multiple-size units-based acoustic modeling has been proposed for large vocabulary speech recognition system to improve the recognition accuracy with limited training data.By introducing a limited number of long-size units into unit set,this modeling scheme can make better acoustic model precision than complete short-size unit modeling without losing model trainability.However,such a multiple-size unit acoustic modeling paradigm does not always bring reliable improvement on recognition performance,since when a large number of long-size units are added in,the amount of training data for short-size units will decrease and result in insufficiently trained models.In this paper,a modified Baum-Welch training method is proposed,which uses product hidden Markov models(PHMMs)to couple units with different sizes and enables them to share same portions of training data.The validity of proposed method is proved by experiment results.展开更多
基金the National Natural Science Foundation of China, No. 30670704
文摘Fear extinction is an important form of emotional learning, and affects neural plasticity. Cue fear extinction is a classical form of inhibitory learning that can be used as an exposure-based treatment for phobia, because the long-term extinction memory produced during cue fear extinction can limit the over-expression of fear. The expression of this inhibitory memory partly depends on the context in which the extinction learning occurs. Studies such as transient inhibition, electrophysiology and brain imaging have proved that the hippocampus - an important structure in the limbic system - facilitates memory retrieval by contextual cues. Mediation of the hippocampus-medial prefrontal lobe circuit may be the neurobiological basis of this process. This article has reviewed the role of the hippocampus in the learning and retrieval of fear extinction. Contextual modulation of fear extinction may rely on a neural network consisting of the hippocampus, the medial prefrontal cortex and the amygdala.
基金This work was supported by High-grade,Precision and Advanced Discipline Construction Project of Beijing Universities,Major Projects of National Social Science Fund of China(No.21ZD19)Nation Culture and Tourism Technological Innovation Engineering Project of China.
文摘Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.
基金The work was supported in part by the National Natural Science Foundation of China(Grant No.90920302)the National Key Basic Research Program of China(No.2009CB825404)+2 种基金the HGJ Grant(No.2011ZX01042-001-001)a research program from Microsoft China,and by a GRF grant from the Research Grant Council of Hong Kong SAR(CUHK 4180/10E)Lei XU is also supported by Chang Jiang Scholars Program,Chinese Ministry of Education for Chang Jiang Chair Professorship in Peking University.
文摘This paper presents a new discriminative approach for training Gaussian mixture models(GMMs)of hidden Markov models(HMMs)based acoustic model in a large vocabulary continuous speech recognition(LVCSR)system.This approach is featured by embedding a rival penalized competitive learning(RPCL)mechanism on the level of hidden Markov states.For every input,the correct identity state,called winner and obtained by the Viterbi force alignment,is enhanced to describe this input while its most competitive rival is penalized by de-learning,which makes GMMs-based states become more discriminative.Without the extensive computing burden required by typical discriminative learning methods for one-pass recognition of the training set,the new approach saves computing costs considerably.Experiments show that the proposed method has a good convergence with better performances than the classical maximum likelihood estimation(MLE)based method.Comparing with two conventional discriminative methods,the proposed method demonstrates improved generalization ability,especially when the test set is not well matched with the training set.
文摘The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was aimed to make a computational model considering the mechanism of informational masking. Several psy- choacoustic experiments were conducted to test the ef- fect of informational masking on speech intelligibility by manipulating the number of masking talker, speech rate, and the similarity of F0 contour between target and masker. The results showed that the speech recep- tion threshold for the target increased as the F0 contours of the masker became more similar to that of the tar- get, suggesting that the difficulty in segregating the tar- get harmonics from the masker harmonics may underlie the informational masking effect. Based on these stud- ies, a new auditory computational model was made by inducing the auditory function of harmonic extraction to the traditional model of speech intelligibility index (SII), named as harmonic extraction (HF) model. The predictions of the HF model are highly consistent with the experimental results.
基金supported in part by the National Natural Science Foundation of China (Grant No.60605016)the National Key Basic Research Program of China (Nos.2004CB318005 and 2004CB318105)the National High Technology Research and Development Program of China (No.2006AA010103).
文摘Multiple-size units-based acoustic modeling has been proposed for large vocabulary speech recognition system to improve the recognition accuracy with limited training data.By introducing a limited number of long-size units into unit set,this modeling scheme can make better acoustic model precision than complete short-size unit modeling without losing model trainability.However,such a multiple-size unit acoustic modeling paradigm does not always bring reliable improvement on recognition performance,since when a large number of long-size units are added in,the amount of training data for short-size units will decrease and result in insufficiently trained models.In this paper,a modified Baum-Welch training method is proposed,which uses product hidden Markov models(PHMMs)to couple units with different sizes and enables them to share same portions of training data.The validity of proposed method is proved by experiment results.