The LPC “Linear Predictive Coding” algorithm is a widely used technique for voice coder. In this paper we present different implementations of the LPC algorithm used in the majority of voice decoding standard. The w...The LPC “Linear Predictive Coding” algorithm is a widely used technique for voice coder. In this paper we present different implementations of the LPC algorithm used in the majority of voice decoding standard. The windowing/autocorrelation bloc is implemented by three different versions on an FPGA Spartan 3. Allowing the possibility to integrate a Microblaze processor core a first solution consists of a pure software implementation of the LPC using this core RISC processor. Second solution is a pure hardware architecture implemented using VHDL based methodology starting from description until integration. Finally, the autocorrelation core is then proposed to be implemented using hardware/software (HW/SW) architecture with the existing processor. Each architecture performances are compared for different data lengths.展开更多
This paper presents a real-time implementation of 4.2Kb/s CELP speech coding on single DSP chip. An algorithm reducing search complexity for adaptive codebook is suggested; the solving method that the parameters are c...This paper presents a real-time implementation of 4.2Kb/s CELP speech coding on single DSP chip. An algorithm reducing search complexity for adaptive codebook is suggested; the solving method that the parameters are changed into LSP parameters is discussed. The realtime implementation process of this coding on a commercial development board with a single TMS320C30 is described.展开更多
A very low bit rate algorithm for encoding speech signals at 825 bps based on a mixed harmonic and stochastic modeling of the excitation signal is presented. The algorithm is more robust in the V/UV decision, reliable...A very low bit rate algorithm for encoding speech signals at 825 bps based on a mixed harmonic and stochastic modeling of the excitation signal is presented. The algorithm is more robust in the V/UV decision, reliable pitch estimation, and excitation signals synthesis. The bit allocation schedules in every case and the analysis-by-synthesis processes of the parameters are also described. The Diagnostic Rhyme Test (DRT) results show that the performance of the proposed algorithm is comparable to that of the MELP algorithm at 2.4 kbps, and the speech distinctness is 90.25%.展开更多
In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance...In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.展开更多
A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system wit...A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system with ISLPC speech synthesis and discusses the design and implementation of ISLPC TTS-based Web voice browser. The browser integrates Web technology, Chinese information processing, artificial intelligence and the key technology of Chinese ISLPC speech synthesis. It's a visual and audible web browser that can improve information precision for network users. The evaluation results show that ISLPC-based TTS model has a better performance than other browsers in voice quality and capability of identifying Chinese characters.展开更多
This paper presented an approach to hide secret speech information in code excited linear prediction (CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information ...This paper presented an approach to hide secret speech information in code excited linear prediction (CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information hiding and extracting for the purpose of secure speech communication. The secret speech is coded in 2.4 Kb/s mixed excitation linear prediction (MELP), which is embedded in CELP type public speech. The ABS algorithm adopts speech synthesizer in speech coder. Speech embedding and coding are synchronous, i.e. a fusion of speech information data of public and secret. The experiment of embedding 2.4 Kb/s MELP secret speech in G.728 scheme coded public speech transmitted via public switched telephone network (PSTN) shows that the proposed approach satisfies the requirements of information hiding, meets the secure communication speech quality constraints, and achieves high hiding capacity of average 3.2 Kb/s with an excellent speech quality and complicating speakers’ recognition.展开更多
A novel cochlear implant coding strategy based on the neural excitability has been developed and implemented using Matlab/Simulink. Unlike present day coding strategies, the Excitability Controlled Coding (ECC) strate...A novel cochlear implant coding strategy based on the neural excitability has been developed and implemented using Matlab/Simulink. Unlike present day coding strategies, the Excitability Controlled Coding (ECC) strategy uses a model of the excitability state of the target neural population to determine its stimulus selection, with the aim of more efficient stimulation as well as reduced channel interaction. Central to the ECC algorithm is an excitability state model, which takes into account the supposed refractory behaviour of the stimulated neural populations. The excitability state, used to weight the input signal for selecting the stimuli, is estimated and updated after the presentation of each stimulus, and used iteratively in selecting the next stimulus. Additionally, ECC regulates the frequency of stimulation on a given channel as a function of the corresponding input stimulus intensity. Details of the model, implementation and results of benchtop plus subjective tests are presented and discussed. Compared to the Advanced Combination Encoder (ACE) strategy, ECC produces a better spectral representation of an input signal, and can potentially reduce channel interactions. Pilot test results from 4 CI recipients suggest that ECC may have some advantage over ACE for complex situations such as speech in noise, possibly due to ECC’s ability to present more of the input spectral contents compared to ACE, which is restricted to a fixed number of maxima. The ECC strategy represents a neuro-physiological approach that could potentially improve the perception of more complex sound patterns with cochlear implants.展开更多
In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predic...In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predictive coding(LPC),linear prediction cepstrum coefficient(LPCC),perceptual linear prediction(PLP),and Mel frequency cepstral coefficient(MFCC).The 10-hour speech data were used for training and 3-hour data for testing.For each spectral feature,different hidden Markov model(HMM)based recognizers with variations in HMM states and different Gaussian mixture models(GMMs)were built.The performance was evaluated by using the word error rate(WER).The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features.展开更多
文摘The LPC “Linear Predictive Coding” algorithm is a widely used technique for voice coder. In this paper we present different implementations of the LPC algorithm used in the majority of voice decoding standard. The windowing/autocorrelation bloc is implemented by three different versions on an FPGA Spartan 3. Allowing the possibility to integrate a Microblaze processor core a first solution consists of a pure software implementation of the LPC using this core RISC processor. Second solution is a pure hardware architecture implemented using VHDL based methodology starting from description until integration. Finally, the autocorrelation core is then proposed to be implemented using hardware/software (HW/SW) architecture with the existing processor. Each architecture performances are compared for different data lengths.
文摘This paper presents a real-time implementation of 4.2Kb/s CELP speech coding on single DSP chip. An algorithm reducing search complexity for adaptive codebook is suggested; the solving method that the parameters are changed into LSP parameters is discussed. The realtime implementation process of this coding on a commercial development board with a single TMS320C30 is described.
文摘A very low bit rate algorithm for encoding speech signals at 825 bps based on a mixed harmonic and stochastic modeling of the excitation signal is presented. The algorithm is more robust in the V/UV decision, reliable pitch estimation, and excitation signals synthesis. The bit allocation schedules in every case and the analysis-by-synthesis processes of the parameters are also described. The Diagnostic Rhyme Test (DRT) results show that the performance of the proposed algorithm is comparable to that of the MELP algorithm at 2.4 kbps, and the speech distinctness is 90.25%.
文摘In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.
基金Supported by the National High-Technology Re-search and Development Program(2005AA122210) the National Out-standing Youth Foundation (60325104)
文摘A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system with ISLPC speech synthesis and discusses the design and implementation of ISLPC TTS-based Web voice browser. The browser integrates Web technology, Chinese information processing, artificial intelligence and the key technology of Chinese ISLPC speech synthesis. It's a visual and audible web browser that can improve information precision for network users. The evaluation results show that ISLPC-based TTS model has a better performance than other browsers in voice quality and capability of identifying Chinese characters.
文摘This paper presented an approach to hide secret speech information in code excited linear prediction (CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information hiding and extracting for the purpose of secure speech communication. The secret speech is coded in 2.4 Kb/s mixed excitation linear prediction (MELP), which is embedded in CELP type public speech. The ABS algorithm adopts speech synthesizer in speech coder. Speech embedding and coding are synchronous, i.e. a fusion of speech information data of public and secret. The experiment of embedding 2.4 Kb/s MELP secret speech in G.728 scheme coded public speech transmitted via public switched telephone network (PSTN) shows that the proposed approach satisfies the requirements of information hiding, meets the secure communication speech quality constraints, and achieves high hiding capacity of average 3.2 Kb/s with an excellent speech quality and complicating speakers’ recognition.
文摘A novel cochlear implant coding strategy based on the neural excitability has been developed and implemented using Matlab/Simulink. Unlike present day coding strategies, the Excitability Controlled Coding (ECC) strategy uses a model of the excitability state of the target neural population to determine its stimulus selection, with the aim of more efficient stimulation as well as reduced channel interaction. Central to the ECC algorithm is an excitability state model, which takes into account the supposed refractory behaviour of the stimulated neural populations. The excitability state, used to weight the input signal for selecting the stimuli, is estimated and updated after the presentation of each stimulus, and used iteratively in selecting the next stimulus. Additionally, ECC regulates the frequency of stimulation on a given channel as a function of the corresponding input stimulus intensity. Details of the model, implementation and results of benchtop plus subjective tests are presented and discussed. Compared to the Advanced Combination Encoder (ACE) strategy, ECC produces a better spectral representation of an input signal, and can potentially reduce channel interactions. Pilot test results from 4 CI recipients suggest that ECC may have some advantage over ACE for complex situations such as speech in noise, possibly due to ECC’s ability to present more of the input spectral contents compared to ACE, which is restricted to a fixed number of maxima. The ECC strategy represents a neuro-physiological approach that could potentially improve the perception of more complex sound patterns with cochlear implants.
基金supported by the Visvesvaraya Ph.D.Scheme for Electronics and IT students launched by the Ministry of Electronics and Information Technology(MeiTY),Government of India under Grant No.PhD-MLA/4(95)/2015-2016.
文摘In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predictive coding(LPC),linear prediction cepstrum coefficient(LPCC),perceptual linear prediction(PLP),and Mel frequency cepstral coefficient(MFCC).The 10-hour speech data were used for training and 3-hour data for testing.For each spectral feature,different hidden Markov model(HMM)based recognizers with variations in HMM states and different Gaussian mixture models(GMMs)were built.The performance was evaluated by using the word error rate(WER).The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features.