期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
Multi-scale context-aware network for continuous sign language recognition
1
作者 Senhua XUE Liqing GAO +1 位作者 Liang WAN Wei FENG 《虚拟现实与智能硬件(中英文)》 EI 2024年第4期323-337,共15页
The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand an... The hands and face are the most important parts for expressing sign language morphemes in sign language videos.However,we find that existing Continuous Sign Language Recognition(CSLR)methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information.In addition,the signs have different lengths,whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling,which disturbs the perception of complete signs.In this study,we propose a Multi-Scale Context-Aware network(MSCA-Net)to solve the aforementioned problems.Our MSCA-Net contains two main modules:(1)Multi-Scale Motion Attention(MSMA),which uses the differences among frames to perceive information of the hands and face in multiple spatial scales,replacing the heavy feature extractors;and(2)Multi-Scale Temporal Modeling(MSTM),which explores crucial temporal information in the sign language video from different temporal scales.We conduct extensive experiments using three widely used sign language datasets,i.e.,RWTH-PHOENIX-Weather-2014,RWTH-PHOENIX-Weather-2014T,and CSL-Daily.The proposed MSCA-Net achieve state-of-the-art performance,demonstrating the effectiveness of our approach. 展开更多
关键词 continuous sign language recognition Multi-scale motion attention Multi-scale temporal modeling
下载PDF
Continuous Sign Language Recognition Based on Spatial-Temporal Graph Attention Network 被引量:2
2
作者 Qi Guo Shujun Zhang Hui Li 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第3期1653-1670,共18页
Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtempora... Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset. 展开更多
关键词 continuous sign language recognition graph attention network bidirectional long short-term memory connectionist temporal classification
下载PDF
Learning long-term temporal contexts using skip RNN for continuous emotion recognition
3
作者 Jian HUANG Bin LIU Jianhua TAO 《Virtual Reality & Intelligent Hardware》 2021年第1期55-64,共10页
Background Continuous emotion recognition as a function of time assigns emotional values to every frame in a sequence.Incorporating long-term temporal context information is essential for continuous emotion recognitio... Background Continuous emotion recognition as a function of time assigns emotional values to every frame in a sequence.Incorporating long-term temporal context information is essential for continuous emotion recognition tasks.Methods For this purpose,we employ a window of feature frames in place of a single frame as inputs to strengthen the temporal modeling at the feature level.The ideas of frame skipping and temporal pooling are utilized to alleviate the resulting redundancy.At the model level,we leverage the skip recurrent neural network to model the long-term temporal variability by skipping trivial information for continuous emotion recognition.Results The experimental results using the AVEC 2017 database demonstrate that our proposed methods are beneficial to a performance improvement.Further,the skip long short-term memory(LSTM)model can focus on the critical emotional state when training the models,thereby achieving a better performance than the LSTM model and other methods. 展开更多
关键词 continuous emotion recognition Skip RNN Temporal contexts REDUNDANCY
下载PDF
Improving the Syllable-Synchronous Network SearchAlgorithm for Word Decoding in ContinuousChinese Speech Recognition 被引量:2
4
作者 郑方 武健 宋战江 《Journal of Computer Science & Technology》 SCIE EI CSCD 2000年第5期461-471,共11页
The previously proposed syllable-synchronous network search (SSNS) algorithm plays a very important role in the word decoding of the continuous Chinese speech recognition and achieves satisfying performance. Several r... The previously proposed syllable-synchronous network search (SSNS) algorithm plays a very important role in the word decoding of the continuous Chinese speech recognition and achieves satisfying performance. Several related key factors that may affect the overall word decoding effect are carefully studied in this paper, including the perfecting of the vocabulary, the big-discount Turing re-estimating of the N-Gram probabilities, and the managing of the searching path buffers. Based on these discussions, corresponding approaches to improving the SSNS algorithm are proposed. Compared with the previous version of SSNS algorithm, the new version decreases the Chinese character error rate (CCER) in the word decoding by 42.1% across a database consisting of a large number of testing sentences (syllable strings). 展开更多
关键词 large-vocabulary continuous Chinese speech recognition word decoding syllable- synchronous network search word segmentation
原文传递
A study on continuous Chinese speech recognition based on stochastic trajectory models
5
作者 MA Xiaohui(Department of Radio Engineering Southeast University Nanjing 210096)GONG Yifan(CRIN/CNRS France)FU Yuqing LU Jiren(Department of Radio Engineering Southeast University Nanjing 210096) 《Chinese Journal of Acoustics》 1997年第4期350-355,共6页
After pointed the unreasonableness of the three basic assumptions contained in HMM, we introduce the theory and the advantage of Stochastic najectory Models (STMs) that possibly resolve these problems caused by HMM as... After pointed the unreasonableness of the three basic assumptions contained in HMM, we introduce the theory and the advantage of Stochastic najectory Models (STMs) that possibly resolve these problems caused by HMM assumptions. In STM, the acoustic observations of an acoustic unit are represented as clusters of trajectories in a parameter space.The trajectories are modelled by mixture of probability density functions of random sequence of states. After analyzing the characteristics of Chinese speech, the acoustic units for continuous Chinese speech recognition based on STM are discussed and phone-like units are suggested. The performance of continuous Chinese speech recognition based on STM is studied on VINICS system. The experimental results prove the efficiency of STM and the consistency of phone-like units. 展开更多
关键词 IEEE ACTA A study on continuous Chinese speech recognition based on stochastic trajectory models
原文传递
Stream Weight Training Based on MCE for Audio-Visual LVCSR 被引量:1
6
作者 刘鹏 王作英 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期141-144,共4页
In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is dis... In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re- scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental re- sults show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments. 展开更多
关键词 audio-visual speech recognition (AVSR) large vocabulary continuous speech recognition (LVCSR) discriminative training minimum classification error (MCE)
原文传递
Discriminative training of GMM-HMM acoustic model by RPCL learning 被引量:1
7
作者 Zaihu PANG Shikui TU +2 位作者 Dan SU Xihong WU Lei XU 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2011年第2期283-290,共8页
This paper presents a new discriminative approach for training Gaussian mixture models(GMMs)of hidden Markov models(HMMs)based acoustic model in a large vocabulary continuous speech recognition(LVCSR)system.This appro... This paper presents a new discriminative approach for training Gaussian mixture models(GMMs)of hidden Markov models(HMMs)based acoustic model in a large vocabulary continuous speech recognition(LVCSR)system.This approach is featured by embedding a rival penalized competitive learning(RPCL)mechanism on the level of hidden Markov states.For every input,the correct identity state,called winner and obtained by the Viterbi force alignment,is enhanced to describe this input while its most competitive rival is penalized by de-learning,which makes GMMs-based states become more discriminative.Without the extensive computing burden required by typical discriminative learning methods for one-pass recognition of the training set,the new approach saves computing costs considerably.Experiments show that the proposed method has a good convergence with better performances than the classical maximum likelihood estimation(MLE)based method.Comparing with two conventional discriminative methods,the proposed method demonstrates improved generalization ability,especially when the test set is not well matched with the training set. 展开更多
关键词 discriminative training hidden Markov model rival penalized competitive learning Bayesian Ying-Yang harmony learning large vocabulary continuous speech recognition
原文传递
HarkMan──A Vocabulary-Independent Keyword Spotter for Spontaneous Chinese Speech
8
作者 郑方 徐明星 +3 位作者 牟晓隆 武健 吴文虎 方棣棠 《Journal of Computer Science & Technology》 SCIE EI CSCD 1999年第1期18-26,共9页
in this paper a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of a vocabulary-independent task in unconstrained Chinese telephone speech.... in this paper a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of a vocabulary-independent task in unconstrained Chinese telephone speech. The speak- ing manner and the number of keywords are not limited. This paper focuses on the novel technique which addresses acoustic modeling, keyword spotting network, search strategies, robustness, and rejection. The underlying technologies used in HarkMan given in this paper are useful not only for keyword spotting but also for continuous speech recognition. The system has achieved a figure-of-merit value over 90%. 展开更多
关键词 keyword spotting keyword spotter vocabulary independent acoustic modeling continuous speech recognition
原文传递
Speaker adapted dynamic lexicons containing phonetic deviations of words
9
作者 Bahram VAZIRNEZHAD Farshad ALMASGANJ +1 位作者 Seyed Mohammad AHADI Ari CHANEN 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2009年第10期1461-1475,共15页
Speaker variability is an important source of speech variations which makes continuous speech recognition a difficult task.Adapting automatic speech recognition(ASR) models to the speaker variations is a well-known st... Speaker variability is an important source of speech variations which makes continuous speech recognition a difficult task.Adapting automatic speech recognition(ASR) models to the speaker variations is a well-known strategy to cope with the challenge.Almost all such techniques focus on developing adaptation solutions within the acoustic models of the ASR systems.Although variations of the acoustic features constitute an important portion of the inter-speaker variations,they do not cover variations at the phonetic level.Phonetic variations are known to form an important part of variations which are influenced by both micro-segmental and suprasegmental factors.Inter-speaker phonetic variations are influenced by the structure and anatomy of a speaker's articulatory system and also his/her speaking style which is driven by many speaker background characteristics such as accent,gender,age,socioeconomic and educational class.The effect of inter-speaker variations in the feature space may cause explicit phone recognition errors.These errors can be compensated later by having appropriate pronunciation variants for the lexicon entries which consider likely phone misclassifications besides pronunciation.In this paper,we introduce speaker adaptive dynamic pronunciation models,which generate different lexicons for various speaker clusters and different ranges of speech rate.The models are hybrids of speaker adapted contextual rules and dynamic generalized decision trees,which take into account word phonological structures,rate of speech,unigram probabilities and stress to generate pronunciation variants of words.Employing the set of speaker adapted dynamic lexicons in a Farsi(Persian) continuous speech recognition task results in word error rate reductions of as much as 10.1% in a speaker-dependent scenario and 7.4% in a speaker-independent scenario. 展开更多
关键词 Pronunciation models continuous speech recognition Lexicon adaptation
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部