In this paper, a new speech recognition method was proposed, which integrated a VQ distortion measure and a discrete HMM. The VQ HMM uses a VQ distortion measure at each state instead of a discrete output probabili...In this paper, a new speech recognition method was proposed, which integrated a VQ distortion measure and a discrete HMM. The VQ HMM uses a VQ distortion measure at each state instead of a discrete output probability used by a discrete HMM. The VQ HMM is described, and its speech recognition performance is compared with the conventional HMMs through the experiments on speaker independent Chinese spoken digit recognition. The comparisons confirm that the new method over performed traditional HMMs.展开更多
The enhanced variable rate codec (EVRC) is a standard for the 'Speech ServiceOption 3 for Wideband Spread Spectrum Digital System,' which has been employed in both IS-95cellular systems and ANSI J-STC-008 PCS ...The enhanced variable rate codec (EVRC) is a standard for the 'Speech ServiceOption 3 for Wideband Spread Spectrum Digital System,' which has been employed in both IS-95cellular systems and ANSI J-STC-008 PCS (personal communications systems). This paper concentrateson channel decoders that exploit the residual redundancy inherent in the enhanced variable ratecodec bitstream. This residual redundancy is quantified by modeling the parameters as first orderMarkov chains and computing the entropy rate based on the relative frequencies of transitions.Moreover, this residual redundancy can be exploited by an appropriately 'tuned' channel decoder toprovide substantial coding gain when compared with the decoders that do not exploit it. Channelcoding schemes include convolutional codes, and iteratively decoded parallel concatenatedconvolutional 'turbo' codes.展开更多
Steganography based on bits-modification of speech frames is a kind of commonly used method, which targets at RTP payloads and offers covert communications over voice-over-IP(Vo IP). However, direct modification on fr...Steganography based on bits-modification of speech frames is a kind of commonly used method, which targets at RTP payloads and offers covert communications over voice-over-IP(Vo IP). However, direct modification on frames is often independent of the inherent speech features, which may lead to great degradation of speech quality. A novel frame-bitrate-change based steganography is proposed in this work, which discovers a novel covert channel for Vo IP and introduces less distortion. This method exploits the feature of multi-rate speech codecs that the practical bitrate of speech frame is identified only by speech decoder at receiving end. Based on this characteristic, two steganography strategies called bitrate downgrading(BD) and bitrate switching(BS)are provided. The first strategy substitutes high bit-rate speech frames with lower ones to embed secret message, which introduces very low distortion in practice, and much less than other bits-modification based methods with the same embedding capacity. The second one encodes secret message bits into different types of speech frames, which is an alternative choice for supplement. The two strategies are implemented and tested on our covert communication system Steg Vo IP. The experiment results show that our proposed method is effective and fulfills the real-time requirement of Vo IP communication.展开更多
The principles of G.729 algorithm are analyzed. It proposes an optimal approach of adaptive codebook search. Realized on fixed point DSP TMS320VC5410,the searching time of the optimal algorithm is thus significantly d...The principles of G.729 algorithm are analyzed. It proposes an optimal approach of adaptive codebook search. Realized on fixed point DSP TMS320VC5410,the searching time of the optimal algorithm is thus significantly decreased,and the result shows that the speech quality is not decreased.展开更多
This letter presents two improvements on 2.4 kb/s Mixed-Excitation Linear Prediction (MELP) vocoder. The one is a new parameter Redzc named energy to differential zerocrossing rate which is used in adaptation of V/UV ...This letter presents two improvements on 2.4 kb/s Mixed-Excitation Linear Prediction (MELP) vocoder. The one is a new parameter Redzc named energy to differential zerocrossing rate which is used in adaptation of V/UV decision of transitional segments and low energy level speech segments. The other is a multi-path searching method for Multi-Stage Vector Quantization (MSVQ) of line spectral frequency. Subjective tests show that the intelligiblity and naturallity of improved MELP vocoder are preferable to those of the original one.展开更多
Speech coding techniques have been studied not truly to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, used as standard, supports the great stead quality even low bit rate...Speech coding techniques have been studied not truly to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, used as standard, supports the great stead quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is different from the conventional vocoder. Different kinds of parameter are used for the preprocessing compared with the other parameters to t'md the more appropriate parameter for the vocoder. The Parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in the preprocessing step are speaking rate, duration, and PSOLA technique.展开更多
Bilingual children' s word awareness can reflect the impact of bilingualism on language cognition from the aspect of psycholinguistics. The current studies on bilingual children's word awareness both at home and abr...Bilingual children' s word awareness can reflect the impact of bilingualism on language cognition from the aspect of psycholinguistics. The current studies on bilingual children's word awareness both at home and abroad show that there exist quite opposite points of views: bilingual disadvantage and bilingual advantage. The interpretation mechanisms of interference effect, word frequency, and mutual exclusivity constraint are used to support the bilingual disadvantage; while the interpretation mechanisms of bilingual advantage include sound coding, short-term memory, and inhibitory control. In effect, there is no negative impact of bilingualism on children's word awareness, and the so-called negative effects only exist on the theoretical aspect of research. The development of children's word awareness is influenced by many factors including age of acquisition, learning environment, and bilingual proficiency, etc.展开更多
Realtime speech communications require high efficient compression algorithms to encode speech signals. As the compressed speech parameters are highly sensitive to transmission errors, robust source and channel decodin...Realtime speech communications require high efficient compression algorithms to encode speech signals. As the compressed speech parameters are highly sensitive to transmission errors, robust source and channel decoding and demodulation schemes are both important and of practical use. In this paper, an it- erative joint souree-channel decoding and demodulation algorithm is proposed for mixed excited linear pre- diction (MELP) vocoder by both exploiting the residual redundancy and passing soft information through- out the receiver while introducing systematic global iteration process to further enhance the performance. Being fully compatible with existing transmitter structure, the proposed algorithm does not introduce addi- tional bandwidth expansion and transmission delay. Simulations show substantial error correcting perfor- mance and synthesized speech quality improvement over conventional separate designed systems in delay and bandwidth constraint channels by using the joint source-channel decoding and demodulation (JSCCM) algorithm.展开更多
In order to improve the efficiency of speech emotion recognition across corpora,a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed.The algorithm first reconstructs a small amou...In order to improve the efficiency of speech emotion recognition across corpora,a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed.The algorithm first reconstructs a small amount of data in the target domain by training the deep sparse auto-encoder,so that the encoder can learn the low-dimensional structural representation of the target domain data.Then,the source domain data and the target domain data are coded by the trained deep sparse auto-encoder to obtain the reconstruction data of the low-dimensional structural representation close to the target domain.Finally,a part of the reconstructed tagged target domain data is mixed with the reconstructed source domain data to jointly train the classifier.This part of the target domain data is used to guide the source domain data.Experiments on the CASIA,SoutheastLab corpus show that the model recognition rate after a small amount of data transferred reached 89.2%and 72.4%on the DNN.Compared to the training results of the complete original corpus,it only decreased by 2%in the CASIA corpus,and only 3.4%in the SoutheastLab corpus.Experiments show that the algorithm can achieve the effect of labeling all data in the extreme case that the data set has only a small amount of data tagged.展开更多
The Chinese intelligence input technology, its applications, and a customer service call center system are developed. This technology can be used both in standard English telephone number input keyboard and in Chinese...The Chinese intelligence input technology, its applications, and a customer service call center system are developed. This technology can be used both in standard English telephone number input keyboard and in Chinese telephone number input keyboard .And authors develop sophisticated technologies including "Pinyin" (the Chinese phonetic alphabet ) encoding technology of phonetic symbol code and formal symbol code of Chinese character structure, phrase encoding technology, input technology of whole sentence intelligence encoding and input technology of Chinese telephone number encoding.展开更多
文摘In this paper, a new speech recognition method was proposed, which integrated a VQ distortion measure and a discrete HMM. The VQ HMM uses a VQ distortion measure at each state instead of a discrete output probability used by a discrete HMM. The VQ HMM is described, and its speech recognition performance is compared with the conventional HMMs through the experiments on speaker independent Chinese spoken digit recognition. The comparisons confirm that the new method over performed traditional HMMs.
文摘The enhanced variable rate codec (EVRC) is a standard for the 'Speech ServiceOption 3 for Wideband Spread Spectrum Digital System,' which has been employed in both IS-95cellular systems and ANSI J-STC-008 PCS (personal communications systems). This paper concentrateson channel decoders that exploit the residual redundancy inherent in the enhanced variable ratecodec bitstream. This residual redundancy is quantified by modeling the parameters as first orderMarkov chains and computing the entropy rate based on the relative frequencies of transitions.Moreover, this residual redundancy can be exploited by an appropriately 'tuned' channel decoder toprovide substantial coding gain when compared with the decoders that do not exploit it. Channelcoding schemes include convolutional codes, and iteratively decoded parallel concatenatedconvolutional 'turbo' codes.
基金Project(2011CB302305)supported by National Basic Research Program(973 Program)of ChinaProjects(61232004,61302094)supported by National Natural Science Foundation of China+2 种基金Project(ZQN-PY115)supported by Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University,ChinaProject(JA13012)supported by Education Science Research Program for Young and Middle-aged Teacher of Fujian Province of ChinaProject(2014J01238)supported by Natural Science Foundation of Fujian Province of China
文摘Steganography based on bits-modification of speech frames is a kind of commonly used method, which targets at RTP payloads and offers covert communications over voice-over-IP(Vo IP). However, direct modification on frames is often independent of the inherent speech features, which may lead to great degradation of speech quality. A novel frame-bitrate-change based steganography is proposed in this work, which discovers a novel covert channel for Vo IP and introduces less distortion. This method exploits the feature of multi-rate speech codecs that the practical bitrate of speech frame is identified only by speech decoder at receiving end. Based on this characteristic, two steganography strategies called bitrate downgrading(BD) and bitrate switching(BS)are provided. The first strategy substitutes high bit-rate speech frames with lower ones to embed secret message, which introduces very low distortion in practice, and much less than other bits-modification based methods with the same embedding capacity. The second one encodes secret message bits into different types of speech frames, which is an alternative choice for supplement. The two strategies are implemented and tested on our covert communication system Steg Vo IP. The experiment results show that our proposed method is effective and fulfills the real-time requirement of Vo IP communication.
文摘The principles of G.729 algorithm are analyzed. It proposes an optimal approach of adaptive codebook search. Realized on fixed point DSP TMS320VC5410,the searching time of the optimal algorithm is thus significantly decreased,and the result shows that the speech quality is not decreased.
文摘This letter presents two improvements on 2.4 kb/s Mixed-Excitation Linear Prediction (MELP) vocoder. The one is a new parameter Redzc named energy to differential zerocrossing rate which is used in adaptation of V/UV decision of transitional segments and low energy level speech segments. The other is a multi-path searching method for Multi-Stage Vector Quantization (MSVQ) of line spectral frequency. Subjective tests show that the intelligiblity and naturallity of improved MELP vocoder are preferable to those of the original one.
基金supported by the Brain Korea 21 Project in 2010,and the MKE(The Ministry of Knowledge Economy,Korea)the ITRC(Information Technology Research Center)support program(NIPA-2010-(C1090-1021-0010))
文摘Speech coding techniques have been studied not truly to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, used as standard, supports the great stead quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is different from the conventional vocoder. Different kinds of parameter are used for the preprocessing compared with the other parameters to t'md the more appropriate parameter for the vocoder. The Parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in the preprocessing step are speaking rate, duration, and PSOLA technique.
基金Acknowledgements: This study was supported by National Social Science Foundation (grant number 14BYY060), a China Postdoctoral Science Foundation (grant number 2012M520057), and a Startup Fund for Advanced Talents of Nanjing Forestry University (grant number GXL022).
文摘Bilingual children' s word awareness can reflect the impact of bilingualism on language cognition from the aspect of psycholinguistics. The current studies on bilingual children's word awareness both at home and abroad show that there exist quite opposite points of views: bilingual disadvantage and bilingual advantage. The interpretation mechanisms of interference effect, word frequency, and mutual exclusivity constraint are used to support the bilingual disadvantage; while the interpretation mechanisms of bilingual advantage include sound coding, short-term memory, and inhibitory control. In effect, there is no negative impact of bilingualism on children's word awareness, and the so-called negative effects only exist on the theoretical aspect of research. The development of children's word awareness is influenced by many factors including age of acquisition, learning environment, and bilingual proficiency, etc.
基金Supported by the National Natural Science Foundation of China (No. 60572081 )
文摘Realtime speech communications require high efficient compression algorithms to encode speech signals. As the compressed speech parameters are highly sensitive to transmission errors, robust source and channel decoding and demodulation schemes are both important and of practical use. In this paper, an it- erative joint souree-channel decoding and demodulation algorithm is proposed for mixed excited linear pre- diction (MELP) vocoder by both exploiting the residual redundancy and passing soft information through- out the receiver while introducing systematic global iteration process to further enhance the performance. Being fully compatible with existing transmitter structure, the proposed algorithm does not introduce addi- tional bandwidth expansion and transmission delay. Simulations show substantial error correcting perfor- mance and synthesized speech quality improvement over conventional separate designed systems in delay and bandwidth constraint channels by using the joint source-channel decoding and demodulation (JSCCM) algorithm.
基金The National Natural Science Foundation of China(No.61871213,61673108,61571106)Six Talent Peaks Project in Jiangsu Province(No.2016-DZXX-023)
文摘In order to improve the efficiency of speech emotion recognition across corpora,a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed.The algorithm first reconstructs a small amount of data in the target domain by training the deep sparse auto-encoder,so that the encoder can learn the low-dimensional structural representation of the target domain data.Then,the source domain data and the target domain data are coded by the trained deep sparse auto-encoder to obtain the reconstruction data of the low-dimensional structural representation close to the target domain.Finally,a part of the reconstructed tagged target domain data is mixed with the reconstructed source domain data to jointly train the classifier.This part of the target domain data is used to guide the source domain data.Experiments on the CASIA,SoutheastLab corpus show that the model recognition rate after a small amount of data transferred reached 89.2%and 72.4%on the DNN.Compared to the training results of the complete original corpus,it only decreased by 2%in the CASIA corpus,and only 3.4%in the SoutheastLab corpus.Experiments show that the algorithm can achieve the effect of labeling all data in the extreme case that the data set has only a small amount of data tagged.
文摘The Chinese intelligence input technology, its applications, and a customer service call center system are developed. This technology can be used both in standard English telephone number input keyboard and in Chinese telephone number input keyboard .And authors develop sophisticated technologies including "Pinyin" (the Chinese phonetic alphabet ) encoding technology of phonetic symbol code and formal symbol code of Chinese character structure, phrase encoding technology, input technology of whole sentence intelligence encoding and input technology of Chinese telephone number encoding.