期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Improving the Syllable-Synchronous Network SearchAlgorithm for Word Decoding in ContinuousChinese Speech Recognition 被引量:2
1
作者 郑方 武健 宋战江 《Journal of Computer Science & Technology》 SCIE EI CSCD 2000年第5期461-471,共11页
The previously proposed syllable-synchronous network search (SSNS) algorithm plays a very important role in the word decoding of the continuous Chinese speech recognition and achieves satisfying performance. Several r... The previously proposed syllable-synchronous network search (SSNS) algorithm plays a very important role in the word decoding of the continuous Chinese speech recognition and achieves satisfying performance. Several related key factors that may affect the overall word decoding effect are carefully studied in this paper, including the perfecting of the vocabulary, the big-discount Turing re-estimating of the N-Gram probabilities, and the managing of the searching path buffers. Based on these discussions, corresponding approaches to improving the SSNS algorithm are proposed. Compared with the previous version of SSNS algorithm, the new version decreases the Chinese character error rate (CCER) in the word decoding by 42.1% across a database consisting of a large number of testing sentences (syllable strings). 展开更多
关键词 large-vocabulary continuous Chinese speech recognition word decoding syllable- synchronous network search word segmentation
原文传递
A study on continuous Chinese speech recognition based on stochastic trajectory models
2
作者 MA Xiaohui(Department of Radio Engineering Southeast University Nanjing 210096)GONG Yifan(CRIN/CNRS France)FU Yuqing LU Jiren(Department of Radio Engineering Southeast University Nanjing 210096) 《Chinese Journal of Acoustics》 1997年第4期350-355,共6页
After pointed the unreasonableness of the three basic assumptions contained in HMM, we introduce the theory and the advantage of Stochastic najectory Models (STMs) that possibly resolve these problems caused by HMM as... After pointed the unreasonableness of the three basic assumptions contained in HMM, we introduce the theory and the advantage of Stochastic najectory Models (STMs) that possibly resolve these problems caused by HMM assumptions. In STM, the acoustic observations of an acoustic unit are represented as clusters of trajectories in a parameter space.The trajectories are modelled by mixture of probability density functions of random sequence of states. After analyzing the characteristics of Chinese speech, the acoustic units for continuous Chinese speech recognition based on STM are discussed and phone-like units are suggested. The performance of continuous Chinese speech recognition based on STM is studied on VINICS system. The experimental results prove the efficiency of STM and the consistency of phone-like units. 展开更多
关键词 IEEE ACTA A study on continuous Chinese speech recognition based on stochastic trajectory models
原文传递
HarkMan──A Vocabulary-Independent Keyword Spotter for Spontaneous Chinese Speech
3
作者 郑方 徐明星 +3 位作者 牟晓隆 武健 吴文虎 方棣棠 《Journal of Computer Science & Technology》 SCIE EI CSCD 1999年第1期18-26,共9页
in this paper a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of a vocabulary-independent task in unconstrained Chinese telephone speech.... in this paper a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of a vocabulary-independent task in unconstrained Chinese telephone speech. The speak- ing manner and the number of keywords are not limited. This paper focuses on the novel technique which addresses acoustic modeling, keyword spotting network, search strategies, robustness, and rejection. The underlying technologies used in HarkMan given in this paper are useful not only for keyword spotting but also for continuous speech recognition. The system has achieved a figure-of-merit value over 90%. 展开更多
关键词 keyword spotting keyword spotter vocabulary independent acoustic modeling continuous speech recognition
原文传递
Stream Weight Training Based on MCE for Audio-Visual LVCSR 被引量:1
4
作者 刘鹏 王作英 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期141-144,共4页
In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is dis... In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re- scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental re- sults show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments. 展开更多
关键词 audio-visual speech recognition (AVSR) large vocabulary continuous speech recognition (LVCSR) discriminative training minimum classification error (MCE)
原文传递
Discriminative training of GMM-HMM acoustic model by RPCL learning 被引量:1
5
作者 Zaihu PANG Shikui TU +2 位作者 Dan SU Xihong WU Lei XU 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2011年第2期283-290,共8页
This paper presents a new discriminative approach for training Gaussian mixture models(GMMs)of hidden Markov models(HMMs)based acoustic model in a large vocabulary continuous speech recognition(LVCSR)system.This appro... This paper presents a new discriminative approach for training Gaussian mixture models(GMMs)of hidden Markov models(HMMs)based acoustic model in a large vocabulary continuous speech recognition(LVCSR)system.This approach is featured by embedding a rival penalized competitive learning(RPCL)mechanism on the level of hidden Markov states.For every input,the correct identity state,called winner and obtained by the Viterbi force alignment,is enhanced to describe this input while its most competitive rival is penalized by de-learning,which makes GMMs-based states become more discriminative.Without the extensive computing burden required by typical discriminative learning methods for one-pass recognition of the training set,the new approach saves computing costs considerably.Experiments show that the proposed method has a good convergence with better performances than the classical maximum likelihood estimation(MLE)based method.Comparing with two conventional discriminative methods,the proposed method demonstrates improved generalization ability,especially when the test set is not well matched with the training set. 展开更多
关键词 discriminative training hidden Markov model rival penalized competitive learning Bayesian Ying-Yang harmony learning large vocabulary continuous speech recognition
原文传递
Speaker adapted dynamic lexicons containing phonetic deviations of words
6
作者 Bahram VAZIRNEZHAD Farshad ALMASGANJ +1 位作者 Seyed Mohammad AHADI Ari CHANEN 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2009年第10期1461-1475,共15页
Speaker variability is an important source of speech variations which makes continuous speech recognition a difficult task.Adapting automatic speech recognition(ASR) models to the speaker variations is a well-known st... Speaker variability is an important source of speech variations which makes continuous speech recognition a difficult task.Adapting automatic speech recognition(ASR) models to the speaker variations is a well-known strategy to cope with the challenge.Almost all such techniques focus on developing adaptation solutions within the acoustic models of the ASR systems.Although variations of the acoustic features constitute an important portion of the inter-speaker variations,they do not cover variations at the phonetic level.Phonetic variations are known to form an important part of variations which are influenced by both micro-segmental and suprasegmental factors.Inter-speaker phonetic variations are influenced by the structure and anatomy of a speaker's articulatory system and also his/her speaking style which is driven by many speaker background characteristics such as accent,gender,age,socioeconomic and educational class.The effect of inter-speaker variations in the feature space may cause explicit phone recognition errors.These errors can be compensated later by having appropriate pronunciation variants for the lexicon entries which consider likely phone misclassifications besides pronunciation.In this paper,we introduce speaker adaptive dynamic pronunciation models,which generate different lexicons for various speaker clusters and different ranges of speech rate.The models are hybrids of speaker adapted contextual rules and dynamic generalized decision trees,which take into account word phonological structures,rate of speech,unigram probabilities and stress to generate pronunciation variants of words.Employing the set of speaker adapted dynamic lexicons in a Farsi(Persian) continuous speech recognition task results in word error rate reductions of as much as 10.1% in a speaker-dependent scenario and 7.4% in a speaker-independent scenario. 展开更多
关键词 Pronunciation models Continuous speech recognition Lexicon adaptation
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部