A variable-bit-rate characteristic waveform interpolation (VBR-CWI) speech codec with about 1.8 kbit/s average bit rate which integrates phonetic classification into characteristic waveform (CW) decomposition is p...A variable-bit-rate characteristic waveform interpolation (VBR-CWI) speech codec with about 1.8 kbit/s average bit rate which integrates phonetic classification into characteristic waveform (CW) decomposition is proposed. Each input frame is classified into one of 4 phonetic classes. Non-speech frames are represented with Bark-band noise model. The extracted CWs become rapidly evolving waveforms (REWs) or slowly evolving waveforms (SEWs) in the cases of unvoiced or stationary voiced frames respectively, while mixed voiced frames use the same CW decomposition as that in the conventional CWI. Experimental results show that the proposed codec can eliminate most buzzy and noisy artifacts existing in the fixed-bit-rate characteristic waveform interpolation (FBR-CWI) speech codec, the average bit rate can be much lower, and its reconstructed speech quality is much better than FS 1 016 CELP at 4.8 kbit/s and similar to G. 723.1 ACELP at 5.3 kbit/s.展开更多
Digital mobile telecommunication systems, such as the global system for mobile (GSM) system, want to further improve speech communication quality without changing the channel encoders and decoders. Speech quality is...Digital mobile telecommunication systems, such as the global system for mobile (GSM) system, want to further improve speech communication quality without changing the channel encoders and decoders. Speech quality is most affected by residual bit errors in received speech frames. Conventional methods use binary decision strategies for error detection and concealment in frames. This paper presents a multi-level error detection and concealment algorithm for GSM full rate speech codec systems. The algorithm uses multi-source knowledge to detect and conceal speech frame errors at the frame, parameter, and even bit levels. Tests show that most corrupted frames can be appropriately concealed by this algorithm, resulting in MOS gains of more than 50% for real-world data tests.展开更多
The author designs a new speech codec in this paper, which is based on ANN tocarry out nonlinear prediction . This new codec synthesizes speeches with better quality than theconventional waveform or hybrid codecs does...The author designs a new speech codec in this paper, which is based on ANN tocarry out nonlinear prediction . This new codec synthesizes speeches with better quality than theconventional waveform or hybrid codecs does at the same bit rate. Moreover, the most importantcharacteristic of this codec is the low coding delay, which will benefit the enhancement of thespeech communication QoS when we transmit speech signals in IP or ATM networks.展开更多
Realtime speech communications require high efficient compression algorithms to encode speech signals. As the compressed speech parameters are highly sensitive to transmission errors, robust source and channel decodin...Realtime speech communications require high efficient compression algorithms to encode speech signals. As the compressed speech parameters are highly sensitive to transmission errors, robust source and channel decoding and demodulation schemes are both important and of practical use. In this paper, an it- erative joint souree-channel decoding and demodulation algorithm is proposed for mixed excited linear pre- diction (MELP) vocoder by both exploiting the residual redundancy and passing soft information through- out the receiver while introducing systematic global iteration process to further enhance the performance. Being fully compatible with existing transmitter structure, the proposed algorithm does not introduce addi- tional bandwidth expansion and transmission delay. Simulations show substantial error correcting perfor- mance and synthesized speech quality improvement over conventional separate designed systems in delay and bandwidth constraint channels by using the joint source-channel decoding and demodulation (JSCCM) algorithm.展开更多
Noise feedback coding (NFC) has attracted renewed interest with the recent standardization of backward-compatible enhancements for ITU-T G.711 and G.722. It has also been revisited with the emergence of proprietary ...Noise feedback coding (NFC) has attracted renewed interest with the recent standardization of backward-compatible enhancements for ITU-T G.711 and G.722. It has also been revisited with the emergence of proprietary speech codecs, such as BV16, BV32, and SILK, that have structures different from CELP coding. In this article, we review NFC and describe a novel coding technique that optimally shapes coding noise in embedded pulse-code modulation (PCM) and embedded adaptive differential PCM (ADPCM). We describe how this new technique was incorporated into the recent ITU-T G.711.1, G.711 App. III, and G.722 Annex B (G.722B) speech-coding standards.展开更多
Lattice vector quantization (LVQ) has been used for real-time speech and audio coding systems. Compared with conventional vector quantization, LVQ has two main advantages: It has a simple and fast encoding process,...Lattice vector quantization (LVQ) has been used for real-time speech and audio coding systems. Compared with conventional vector quantization, LVQ has two main advantages: It has a simple and fast encoding process, and it significantly reduces the amount of memory required. Therefore, LVQ is suitable for use in low-complexity speech and audio coding. In this paper, we describe the basic concepts of LVQ and its advantages over conventional vector quantization. We also describe some LVQ techniques that have been used in speech and audio coding standards of international standards developing organizations (SDOs).展开更多
To make the multiple descriptions codec adaptive to the packet loss rate, which can minimize the final distortion, a novel adaptive multiple descriptions sinusoidal coder (AMDSC) is proposed, which is based on a sin...To make the multiple descriptions codec adaptive to the packet loss rate, which can minimize the final distortion, a novel adaptive multiple descriptions sinusoidal coder (AMDSC) is proposed, which is based on a sinusoidal model and a noise model. Firstly, the sinusoidal parameters are extracted in the sinusoidal model, and ordered in a decrease manner. Odd indexed and even indexed parameters are divided into two descriptions. Secondly, the output vector from the noise model is split vector quantized. And the two sub-vectors are placed into two descriptions too. Finally, the number of the extracted parameters and the redundancy between the two descriptions are adjusted according to the packet loss rate of the network. Analytical and experimental resuits show that the proposed AMDSC outperforms existing MD speech coders by taking network loss characteristics into account. Therefore, it is very suitable for unreliable channels展开更多
It is supposed that speech is the output of a LPC filter which is excited by LPC residual. Consequently, speech can be reproduced if a signal, which occupies main characteristics of the LPC residual, excites the LPC f...It is supposed that speech is the output of a LPC filter which is excited by LPC residual. Consequently, speech can be reproduced if a signal, which occupies main characteristics of the LPC residual, excites the LPC filter. Based on this hypothesis, a new speech coding algorithm is proposed. Its excitation of synthesizer is the fractal interpolation of down sampled LPC residual with the same fractal dimension of LPC residual. Computer simulation shows that this speech coding algorithm can provide high quality coded speech at bit rate of 6.4 kb/s. Some essential issues are also presented to demonstrate this algorithm such as the calculation of fractal dimension, the implementation of fractal interpolation.展开更多
This paper presents a real-time implementation of 4.2Kb/s CELP speech coding on single DSP chip. An algorithm reducing search complexity for adaptive codebook is suggested; the solving method that the parameters are c...This paper presents a real-time implementation of 4.2Kb/s CELP speech coding on single DSP chip. An algorithm reducing search complexity for adaptive codebook is suggested; the solving method that the parameters are changed into LSP parameters is discussed. The realtime implementation process of this coding on a commercial development board with a single TMS320C30 is described.展开更多
This paper presents the design of a full-duplex multi-rate vocoder which implements an LPC-10, CELPC and VSELPC algorithms in real time. A single commercially available digital signal processor IC, the TMS320C25, is u...This paper presents the design of a full-duplex multi-rate vocoder which implements an LPC-10, CELPC and VSELPC algorithms in real time. A single commercially available digital signal processor IC, the TMS320C25, is used to perform the digital processing. The channel interfaces are configured with the design of ASIC, and including timing and control logic circuits.展开更多
Since Pulse Code Modulation emerged in 1937, digitized speech has experienced rapid development due to its outstanding voice quality, reliability, robustness and security in communication. But how to reduce channel wi...Since Pulse Code Modulation emerged in 1937, digitized speech has experienced rapid development due to its outstanding voice quality, reliability, robustness and security in communication. But how to reduce channel width without loss of speech quality remains a crucial problem in speech coding theory. A new full-duplex digital speech communication system based on the Vocoder of AMBE-1000(TM) and microcontroller ATMEL 89C51 is introduced. It shows higher voice quality than current mobile phone system with only a quarter of channel width needed for the latter. The prospective areas in which the system can be applied include satellite communication, IP Phone, virtual meeting and the most important, defence industry.展开更多
A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system wit...A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system with ISLPC speech synthesis and discusses the design and implementation of ISLPC TTS-based Web voice browser. The browser integrates Web technology, Chinese information processing, artificial intelligence and the key technology of Chinese ISLPC speech synthesis. It's a visual and audible web browser that can improve information precision for network users. The evaluation results show that ISLPC-based TTS model has a better performance than other browsers in voice quality and capability of identifying Chinese characters.展开更多
文摘A variable-bit-rate characteristic waveform interpolation (VBR-CWI) speech codec with about 1.8 kbit/s average bit rate which integrates phonetic classification into characteristic waveform (CW) decomposition is proposed. Each input frame is classified into one of 4 phonetic classes. Non-speech frames are represented with Bark-band noise model. The extracted CWs become rapidly evolving waveforms (REWs) or slowly evolving waveforms (SEWs) in the cases of unvoiced or stationary voiced frames respectively, while mixed voiced frames use the same CW decomposition as that in the conventional CWI. Experimental results show that the proposed codec can eliminate most buzzy and noisy artifacts existing in the fixed-bit-rate characteristic waveform interpolation (FBR-CWI) speech codec, the average bit rate can be much lower, and its reconstructed speech quality is much better than FS 1 016 CELP at 4.8 kbit/s and similar to G. 723.1 ACELP at 5.3 kbit/s.
基金Supported by the National Natural Science Foundation of China andMicrosoft Research Asia (No.60776800)in part by the National High-Tech Research and Development Program (863) of China (Nos. 2006AA010101, 2007AA04Z223, 2008AA02Z414,and 2008AA040201)
文摘Digital mobile telecommunication systems, such as the global system for mobile (GSM) system, want to further improve speech communication quality without changing the channel encoders and decoders. Speech quality is most affected by residual bit errors in received speech frames. Conventional methods use binary decision strategies for error detection and concealment in frames. This paper presents a multi-level error detection and concealment algorithm for GSM full rate speech codec systems. The algorithm uses multi-source knowledge to detect and conceal speech frame errors at the frame, parameter, and even bit levels. Tests show that most corrupted frames can be appropriately concealed by this algorithm, resulting in MOS gains of more than 50% for real-world data tests.
文摘The author designs a new speech codec in this paper, which is based on ANN tocarry out nonlinear prediction . This new codec synthesizes speeches with better quality than theconventional waveform or hybrid codecs does at the same bit rate. Moreover, the most importantcharacteristic of this codec is the low coding delay, which will benefit the enhancement of thespeech communication QoS when we transmit speech signals in IP or ATM networks.
基金Supported by the National Natural Science Foundation of China (No. 60572081 )
文摘Realtime speech communications require high efficient compression algorithms to encode speech signals. As the compressed speech parameters are highly sensitive to transmission errors, robust source and channel decoding and demodulation schemes are both important and of practical use. In this paper, an it- erative joint souree-channel decoding and demodulation algorithm is proposed for mixed excited linear pre- diction (MELP) vocoder by both exploiting the residual redundancy and passing soft information through- out the receiver while introducing systematic global iteration process to further enhance the performance. Being fully compatible with existing transmitter structure, the proposed algorithm does not introduce addi- tional bandwidth expansion and transmission delay. Simulations show substantial error correcting perfor- mance and synthesized speech quality improvement over conventional separate designed systems in delay and bandwidth constraint channels by using the joint source-channel decoding and demodulation (JSCCM) algorithm.
文摘Noise feedback coding (NFC) has attracted renewed interest with the recent standardization of backward-compatible enhancements for ITU-T G.711 and G.722. It has also been revisited with the emergence of proprietary speech codecs, such as BV16, BV32, and SILK, that have structures different from CELP coding. In this article, we review NFC and describe a novel coding technique that optimally shapes coding noise in embedded pulse-code modulation (PCM) and embedded adaptive differential PCM (ADPCM). We describe how this new technique was incorporated into the recent ITU-T G.711.1, G.711 App. III, and G.722 Annex B (G.722B) speech-coding standards.
文摘Lattice vector quantization (LVQ) has been used for real-time speech and audio coding systems. Compared with conventional vector quantization, LVQ has two main advantages: It has a simple and fast encoding process, and it significantly reduces the amount of memory required. Therefore, LVQ is suitable for use in low-complexity speech and audio coding. In this paper, we describe the basic concepts of LVQ and its advantages over conventional vector quantization. We also describe some LVQ techniques that have been used in speech and audio coding standards of international standards developing organizations (SDOs).
文摘To make the multiple descriptions codec adaptive to the packet loss rate, which can minimize the final distortion, a novel adaptive multiple descriptions sinusoidal coder (AMDSC) is proposed, which is based on a sinusoidal model and a noise model. Firstly, the sinusoidal parameters are extracted in the sinusoidal model, and ordered in a decrease manner. Odd indexed and even indexed parameters are divided into two descriptions. Secondly, the output vector from the noise model is split vector quantized. And the two sub-vectors are placed into two descriptions too. Finally, the number of the extracted parameters and the redundancy between the two descriptions are adjusted according to the packet loss rate of the network. Analytical and experimental resuits show that the proposed AMDSC outperforms existing MD speech coders by taking network loss characteristics into account. Therefore, it is very suitable for unreliable channels
文摘It is supposed that speech is the output of a LPC filter which is excited by LPC residual. Consequently, speech can be reproduced if a signal, which occupies main characteristics of the LPC residual, excites the LPC filter. Based on this hypothesis, a new speech coding algorithm is proposed. Its excitation of synthesizer is the fractal interpolation of down sampled LPC residual with the same fractal dimension of LPC residual. Computer simulation shows that this speech coding algorithm can provide high quality coded speech at bit rate of 6.4 kb/s. Some essential issues are also presented to demonstrate this algorithm such as the calculation of fractal dimension, the implementation of fractal interpolation.
文摘This paper presents a real-time implementation of 4.2Kb/s CELP speech coding on single DSP chip. An algorithm reducing search complexity for adaptive codebook is suggested; the solving method that the parameters are changed into LSP parameters is discussed. The realtime implementation process of this coding on a commercial development board with a single TMS320C30 is described.
文摘This paper presents the design of a full-duplex multi-rate vocoder which implements an LPC-10, CELPC and VSELPC algorithms in real time. A single commercially available digital signal processor IC, the TMS320C25, is used to perform the digital processing. The channel interfaces are configured with the design of ASIC, and including timing and control logic circuits.
文摘Since Pulse Code Modulation emerged in 1937, digitized speech has experienced rapid development due to its outstanding voice quality, reliability, robustness and security in communication. But how to reduce channel width without loss of speech quality remains a crucial problem in speech coding theory. A new full-duplex digital speech communication system based on the Vocoder of AMBE-1000(TM) and microcontroller ATMEL 89C51 is introduced. It shows higher voice quality than current mobile phone system with only a quarter of channel width needed for the latter. The prospective areas in which the system can be applied include satellite communication, IP Phone, virtual meeting and the most important, defence industry.
基金Supported by the National High-Technology Re-search and Development Program(2005AA122210) the National Out-standing Youth Foundation (60325104)
文摘A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system with ISLPC speech synthesis and discusses the design and implementation of ISLPC TTS-based Web voice browser. The browser integrates Web technology, Chinese information processing, artificial intelligence and the key technology of Chinese ISLPC speech synthesis. It's a visual and audible web browser that can improve information precision for network users. The evaluation results show that ISLPC-based TTS model has a better performance than other browsers in voice quality and capability of identifying Chinese characters.