期刊文献+
共找到199篇文章
< 1 2 10 >
每页显示 20 50 100
Investigation of Automatic Speech Recognition Systems via the Multilingual Deep Neural Network Modeling Methods for a Very Low-Resource Language, Chaha 被引量:1
1
作者 Tessfu Geteye Fantaye Junqing Yu Tulu Tilahun Hailu 《Journal of Signal and Information Processing》 2020年第1期1-21,共21页
Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency a... Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency and some of its phonological, morphological, and orthographic features challenge the development and initiatives in the area of ASR. By considering these challenges, this study is the first endeavor, which analyzed the characteristics of the language, prepared speech corpus, and developed different ASR systems. A small 3-hour read speech corpus was prepared and transcribed. Different basic and rounded phone unit-based speech recognizers were explored using multilingual deep neural network (DNN) modeling methods. The experimental results demonstrated that all the basic phone and rounded phone unit-based multilingual models outperformed the corresponding unilingual models with the relative performance improvements of 5.47% to 19.87% and 5.74% to 16.77%, respectively. The rounded phone unit-based multilingual models outperformed the equivalent basic phone unit-based models with relative performance improvements of 0.95% to 4.98%. Overall, we discovered that multilingual DNN modeling methods are profoundly effective to develop Chaha speech recognizers. Both the basic and rounded phone acoustic units are convenient to build Chaha ASR system. However, the rounded phone unit-based models are superior in performance and faster in recognition speed over the corresponding basic phone unit-based models. Hence, the rounded phone units are the most suitable acoustic units to develop Chaha ASR systems. 展开更多
关键词 automatic speech recognition MULTILINGUAL DNN Modeling Methods Basic PHONE ACOUSTIC UNITS Rounded PHONE ACOUSTIC UNITS Chaha
下载PDF
Joint On-Demand Pruning and Online Distillation in Automatic Speech Recognition Language Model Optimization
2
作者 Soonshin Seo Ji-Hwan Kim 《Computers, Materials & Continua》 SCIE EI 2023年第12期2833-2856,共24页
Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these... Automatic speech recognition(ASR)systems have emerged as indispensable tools across a wide spectrum of applications,ranging from transcription services to voice-activated assistants.To enhance the performance of these systems,it is important to deploy efficient models capable of adapting to diverse deployment conditions.In recent years,on-demand pruning methods have obtained significant attention within the ASR domain due to their adaptability in various deployment scenarios.However,these methods often confront substantial trade-offs,particularly in terms of unstable accuracy when reducing the model size.To address challenges,this study introduces two crucial empirical findings.Firstly,it proposes the incorporation of an online distillation mechanism during on-demand pruning training,which holds the promise of maintaining more consistent accuracy levels.Secondly,it proposes the utilization of the Mogrifier long short-term memory(LSTM)language model(LM),an advanced iteration of the conventional LSTM LM,as an effective alternative for pruning targets within the ASR framework.Through rigorous experimentation on the ASR system,employing the Mogrifier LSTM LM and training it using the suggested joint on-demand pruning and online distillation method,this study provides compelling evidence.The results exhibit that the proposed methods significantly outperform a benchmark model trained solely with on-demand pruning methods.Impressively,the proposed strategic configuration successfully reduces the parameter count by approximately 39%,all the while minimizing trade-offs. 展开更多
关键词 automatic speech recognition neural language model Mogrifier long short-term memory PRUNING DISTILLATION efficient deployment OPTIMIZATION joint training
下载PDF
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
3
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
下载PDF
Challenges and Limitations in Speech Recognition Technology:A Critical Review of Speech Signal Processing Algorithms,Tools and Systems
4
作者 Sneha Basak Himanshi Agrawal +4 位作者 Shreya Jena Shilpa Gite Mrinal Bachute Biswajeet Pradhan Mazen Assiri 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第5期1053-1089,共37页
Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computa... Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution. 展开更多
关键词 speech recognition automatic speech recognition(asr) mel-frequency cepstral coefficients(MFCC) hidden Markov model(HMM) artificial neural network(ANN)
下载PDF
Speech Recognition via CTC-CNN Model
5
作者 Wen-Tsai Sung Hao-WeiKang Sung-Jung Hsiao 《Computers, Materials & Continua》 SCIE EI 2023年第9期3833-3858,共26页
In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process o... In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution. 展开更多
关键词 Artificial intelligence speech recognition speech to text convolutional neural network automatic speech recognition
下载PDF
A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control
6
作者 Peiyuan Jiang Weijun Pan +2 位作者 Jian Zhang Teng Wang Junxiang Huang 《Computers, Materials & Continua》 SCIE EI 2023年第10期911-940,共30页
This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition(ASR)technology in the Air Traffic Control(ATC)field.This paper presents ... This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition(ASR)technology in the Air Traffic Control(ATC)field.This paper presents a novel cascaded model architecture,namely Conformer-CTC/Attention-T5(CCAT),to build a highly accurate and robust ATC speech recognition model.To tackle the challenges posed by noise and fast speech rate in ATC,the Conformer model is employed to extract robust and discriminative speech representations from raw waveforms.On the decoding side,the Attention mechanism is integrated to facilitate precise alignment between input features and output characters.The Text-To-Text Transfer Transformer(T5)language model is also introduced to handle particular pronunciations and code-mixing issues,providing more accurate and concise textual output for downstream tasks.To enhance the model’s robustness,transfer learning and data augmentation techniques are utilized in the training strategy.The model’s performance is optimized by performing hyperparameter tunings,such as adjusting the number of attention heads,encoder layers,and the weights of the loss function.The experimental results demonstrate the significant contributions of data augmentation,hyperparameter tuning,and error correction models to the overall model performance.On the Our ATC Corpus dataset,the proposed model achieves a Character Error Rate(CER)of 3.44%,representing a 3.64%improvement compared to the baseline model.Moreover,the effectiveness of the proposed model is validated on two publicly available datasets.On the AISHELL-1 dataset,the CCAT model achieves a CER of 3.42%,showcasing a 1.23%improvement over the baseline model.Similarly,on the LibriSpeech dataset,the CCAT model achieves a Word Error Rate(WER)of 5.27%,demonstrating a performance improvement of 7.67%compared to the baseline model.Additionally,this paper proposes an evaluation criterion for assessing the robustness of ATC speech recognition systems.In robustness evaluation experiments based on this criterion,the proposed model demonstrates a performance improvement of 22%compared to the baseline model. 展开更多
关键词 Air traffic control automatic speech recognition CONFORMER robustness evaluation T5 error correction model
下载PDF
Automatic evaluation of speech impairment caused by wearing a dental appliance
7
作者 Mariko Hattori Yuka I. Sumita Hisashi Taniguchi 《Open Journal of Stomatology》 2013年第7期365-369,共5页
In dentistry, speech evaluation is important for appropriate orofacial dysfunction rehabilitation. The speech intelligibility test is often used to assess patients’ speech, and it involves an evaluation by human list... In dentistry, speech evaluation is important for appropriate orofacial dysfunction rehabilitation. The speech intelligibility test is often used to assess patients’ speech, and it involves an evaluation by human listeners. However, the test has certain shortcomings, and an alternative method, without a listening procedure, is needed. The purpose of this study was to test the applicability of an automatic speech intelligibility test system using a computerized speech recognition technique. Speech of 10 normal subjects, when wearing a dental appliance, was evaluated using an automatic speech intelligibility test system that was developed using computerized speech recognition software. The results of the automatic test were referred to as the speech recognition scores. The Wilcoxon signed rank test was used to analyze differences in the results of the test between the following 2 conditions: with the palatal plate in place and with the palatal plate removed. Spearman correlation coefficients were used to evaluate whether the speech recognition score correlated with the result of conventional intelligibility test. The speech recognition score was significantly decreased when wearing the plate (z = -2.807, P = 0.0050). The automatic evaluation results positively correlated with that of conventional evaluation when wearing the appliance (r = 0.729, P = 0.017). The automatic speech testing system may be useful for evaluating speech intelligibility in denture wearers. 展开更多
关键词 PROSTHODONTICS MAXILLOFACIAL PROSTHODONTICS speech automatic speech recognition
下载PDF
Development of Application Specific Continuous Speech Recognition System in Hindi
8
作者 Gaurav Gaurav Devanesamoni Shakina Deiv +1 位作者 Gopal Krishna Sharma Mahua Bhattacharya 《Journal of Signal and Information Processing》 2012年第3期394-401,共8页
Application specific voice interfaces in local languages will go a long way in reaching the benefits of technology to rural India. A continuous speech recognition system in Hindi tailored to aid teaching Geometry in P... Application specific voice interfaces in local languages will go a long way in reaching the benefits of technology to rural India. A continuous speech recognition system in Hindi tailored to aid teaching Geometry in Primary schools is the goal of the work. This paper presents the preliminary work done towards that end. We have used the Mel Frequency Cepstral Coefficients as speech feature parameters and Hidden Markov Modeling to model the acoustic features. Hidden Markov Modeling Tool Kit —3.4 was used both for feature extraction and model generation. The Julius recognizer which is language independent was used for decoding. A speaker independent system is implemented and results are presented. 展开更多
关键词 automatic speech recognition Mel Frequency Cepstral COEFFICIENTS Hidden MARKOV Modeling
下载PDF
Phoneme Sequence Modeling in the Context of Speech Signal Recognition in Language “Baoule”
9
作者 Hyacinthe Konan Etienne Soro +2 位作者 Olivier Asseu Bi Tra Goore Raymond Gbegbe 《Engineering(科研)》 2016年第9期597-617,共22页
This paper presents the recognition of “Baoule” spoken sentences, a language of C?te d’Ivoire. Several formalisms allow the modelling of an automatic speech recognition system. The one we used to realize our system... This paper presents the recognition of “Baoule” spoken sentences, a language of C?te d’Ivoire. Several formalisms allow the modelling of an automatic speech recognition system. The one we used to realize our system is based on Hidden Markov Models (HMM) discreet. Our goal in this article is to present a system for the recognition of the Baoule word. We present three classical problems and develop different algorithms able to resolve them. We then execute these algorithms with concrete examples. 展开更多
关键词 HMM MATLAB Language Model Acoustic Model recognition automatic speech
下载PDF
WTASR:Wavelet Transformer for Automatic Speech Recognition of Indian Languages
10
作者 Tripti Choudhary Vishal Goyal Atul Bansal 《Big Data Mining and Analytics》 EI CSCD 2023年第1期85-91,共7页
Automatic speech recognition systems are developed for translating the speech signals into the corresponding text representation.This translation is used in a variety of applications like voice enabled commands,assist... Automatic speech recognition systems are developed for translating the speech signals into the corresponding text representation.This translation is used in a variety of applications like voice enabled commands,assistive devices and bots,etc.There is a significant lack of efficient technology for Indian languages.In this paper,an wavelet transformer for automatic speech recognition(WTASR)of Indian language is proposed.The speech signals suffer from the problem of high and low frequency over different times due to variation in speech of the speaker.Thus,wavelets enable the network to analyze the signal in multiscale.The wavelet decomposition of the signal is fed in the network for generating the text.The transformer network comprises an encoder decoder system for speech translation.The model is trained on Indian language dataset for translation of speech into corresponding text.The proposed method is compared with other state of the art methods.The results show that the proposed WTASR has a low word error rate and can be used for effective speech recognition for Indian language. 展开更多
关键词 TRANSFORMER WAVELET automatic speech recognition(asr) Indian language
原文传递
Performance of Text-Independent Automatic Speaker Recognition on a Multicore System
11
作者 Rand Kouatly Talha Ali Khan 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第2期447-456,共10页
This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of t... This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing procedures.Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm.The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ(2.3 GHz,four cores without hyper-threading,and 8 GB of RAM).In addition,a remarkable 100%speaker recognition accuracy is achieved. 展开更多
关键词 automatic Speaker recognition(asr) Gaussian Mixture Model(GMM) shared memory parallel programming PThreads OPENMP
原文传递
Peripheral Nonlinear Time Spectrum Features Algorithm for Large Vocabulary Mandarin Automatic Speech Recognition 被引量:1
12
作者 Fadhil H.T.Al-dulaimy 王作英 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期174-182,共9页
This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm ob... This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral fea- tures using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua elec- tronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) sys- tem as replacements of the dynamic features with different feature combinations. In this algorithm, the or- thogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the ?dp- operator in the time direction and the ?dp- operator in the frequency di- t f rection. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hid- den Markov model (DDBHMM) based on MASR system. 展开更多
关键词 large vocabulary speech recognition Mandarin automatic speech recognition (Masr) dura- tion distribution-based hidden Markov model (DDBHMM) feature identification
原文传递
Investigation of Knowledge Transfer Approaches to Improve the Acoustic Modeling of Vietnamese ASR System 被引量:5
13
作者 Danyang Liu Ji Xu +1 位作者 Pengyuan Zhang Yonghong Yan 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2019年第5期1187-1195,共9页
It is well known that automatic speech recognition(ASR) is a resource consuming task. It takes sufficient amount of data to train a state-of-the-art deep neural network acoustic model. As for some low-resource languag... It is well known that automatic speech recognition(ASR) is a resource consuming task. It takes sufficient amount of data to train a state-of-the-art deep neural network acoustic model. As for some low-resource languages where scripted speech is difficult to obtain, data sparsity is the main problem that limits the performance of speech recognition system. In this paper, several knowledge transfer methods are investigated to overcome the data sparsity problem with the help of high-resource languages.The first one is a pre-training and fine-tuning(PT/FT) method, in which the parameters of hidden layers are initialized with a welltrained neural network. Secondly, the progressive neural networks(Prognets) are investigated. With the help of lateral connections in the network architecture, Prognets are immune to forgetting effect and superior in knowledge transferring. Finally,bottleneck features(BNF) are extracted using cross-lingual deep neural networks and serves as an enhanced feature to improve the performance of ASR system. Experiments are conducted in a low-resource Vietnamese dataset. The results show that all three methods yield significant gains over the baseline system, and the Prognets acoustic model performs the best. Further improvements can be obtained by combining the Prognets model and bottleneck features. 展开更多
关键词 BOTTLENECK feature (BNF) cross-lingual automatic speech recognition (asr) PROGRESSIVE neural networks (Prognets) model transfer learning
下载PDF
基于ASR的呼叫中心系统设计与可靠性研究
14
作者 郭瑞 《环境技术》 2010年第2期34-38,51,共6页
本文以IT运行维护的故障申报系统为例,介绍如何利用Nuance Recognizer9.0自动语音识别系统和东进D081A模拟中继语音卡电话处理系统设计基于ASR(自动语音识别)的呼叫中心。文中不仅介绍了设计过程中的各个关键环节,而且对该系统的可靠性... 本文以IT运行维护的故障申报系统为例,介绍如何利用Nuance Recognizer9.0自动语音识别系统和东进D081A模拟中继语音卡电话处理系统设计基于ASR(自动语音识别)的呼叫中心。文中不仅介绍了设计过程中的各个关键环节,而且对该系统的可靠性进行了深入讨论。其中包括如何合理设计语法文件以提高语音识别率;如何在系统运行期间进行同步保障,使系统逐步趋于完善。 展开更多
关键词 自动语音识别(asr) 呼叫中心 语法文件 同步保障
下载PDF
Speech Signal Recovery Based on Source Separation and Noise Suppression
15
作者 Zhe Wang Haijian Zhang Guoan Bi 《Journal of Computer and Communications》 2014年第9期112-120,共9页
In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. This novel algorithm is able to separate a mixed spe... In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. This novel algorithm is able to separate a mixed speech source from multiple speakers, detect presence/absence of speakers by tracking the higher magnitude portion of speech power spectrum and adaptively suppress noises. An automatic speech recognition (ASR) process to deal with the multi-speaker task is designed and implemented. Evaluation tests have been carried out by using the speech da- tabase NOIZEUS and the experimental results show that the proposed algorithm achieves impressive performance improvements. 展开更多
关键词 speech RECOVERY TIME-FREQUENCY Source SEPARATION Adaptive Noise SUPPRESSION automatic speech recognition
下载PDF
基于ASR与Arduino的语音控制照明系统设计
16
作者 胡芷晗 《电声技术》 2019年第5期56-57,63,共3页
通过对Arduino单板深入研究,结合高性能的ASR语音识别芯片,将语音识别技术引入照明系统设计中,进行了语音控制系统的总体结构、主控制模块和语音识别的软硬件设计,实现了一套基于Arduino的语音控制系统。最终测试完成了远程控制台灯即... 通过对Arduino单板深入研究,结合高性能的ASR语音识别芯片,将语音识别技术引入照明系统设计中,进行了语音控制系统的总体结构、主控制模块和语音识别的软硬件设计,实现了一套基于Arduino的语音控制系统。最终测试完成了远程控制台灯即时状态,提高智能化程度的目的。 展开更多
关键词 语音识别 语音控制 asr ARDUINO
下载PDF
Real Time Speech Based Integrated Development Environment for C Program
17
作者 Bharathi Bhagavathsingh Kavitha Srinivasan Mariappan Natrajan 《Circuits and Systems》 2016年第3期69-82,共14页
This Automatic Speech Recognition (ASR) is the process which converts an acoustic signal captured by the microphone to written text. The motivation of the paper is to create a speech based Integrated Development Envir... This Automatic Speech Recognition (ASR) is the process which converts an acoustic signal captured by the microphone to written text. The motivation of the paper is to create a speech based Integrated Development Environment (IDE) for C program. This paper proposes a technique to facilitate the visually impaired people or the person with arm injuries with excellent programming skills that can code the C program through voice input. The proposed system accepts the C program as voice input and produces compiled C program as output. The user should utter each line of the C program through voice input. First the voice input is recognized as text. The recognized text will be converted into C program by using syntactic constructs of the C language. After conversion, C program will be fetched as input to the IDE. Furthermore, the IDE commands like open, save, close, compile, run are also given through voice input only. If any error occurs during the compilation process, the error is corrected through voice input only. The errors can be corrected by specifying the line number through voice input. Performance of the speech recognition system is analyzed by varying the vocabulary size as well as number of mixture components in HMM. 展开更多
关键词 automatic speech recognition Integrated Development Environment Hidden Markov Model Mel Frequency Cepstral Coefficients
下载PDF
基于AR与ASR的变电运检系统设计与实现
18
作者 梁日才 刘文平 +1 位作者 罗海鑫 王晓强 《通信电源技术》 2022年第13期99-103,共5页
目前电力企业需要开展变电设备巡视、维护、检修和紧急抢修工作,传统变电运检工作存在技能水平不足、沟通不畅以及智能化水平不高的问题。同时,变电运检工作还具有复杂性和综合性的特点,为现场作业人员提供实时专家库支持,是变电运检工... 目前电力企业需要开展变电设备巡视、维护、检修和紧急抢修工作,传统变电运检工作存在技能水平不足、沟通不畅以及智能化水平不高的问题。同时,变电运检工作还具有复杂性和综合性的特点,为现场作业人员提供实时专家库支持,是变电运检工作的重要发展方向。为提高专家会诊效率和质量,保障专家快速了解现场并作出准确的指导,缩短消缺周期和提高消缺效率,基于增强现实(Augmented Reality,AR)和自动语音识别(Automatic Speech Recognition,ASR)技术,设计了一种交互式变电运检系统,实现了专家远程快速会诊功能,高效辅助解决现场问题,显著提升了变电运检工作效率,并进一步保障了变电作业人员的人身安全。该系统在某变电管理所的成功应用,验证了系统的实用性及有效性。 展开更多
关键词 变电消缺 增强现实(AR) 自动语音识别(asr) 交互式系统 远程视频会诊
下载PDF
面向管制语音识别系统的性能评价方法
19
作者 潘卫军 王梓璇 +1 位作者 蒋培元 王壮 《科学技术与工程》 北大核心 2024年第33期14278-14286,共9页
目前,随着空中交通管理领域的发展,越来越多的人工智能技术运用到管制领域。其中自动语音识别技术被用于管制指令纠错、复诵一致性检验等方面,用于提升飞行安全和效率。为了解决自动语音识别系统的性能参差不齐的问题,提出了一种面向管... 目前,随着空中交通管理领域的发展,越来越多的人工智能技术运用到管制领域。其中自动语音识别技术被用于管制指令纠错、复诵一致性检验等方面,用于提升飞行安全和效率。为了解决自动语音识别系统的性能参差不齐的问题,提出了一种面向管制语音识别系统的性能评价方法,对3个待测系统进行了评价与分析。首先,按照一定的管制场景比例收集管制语音并进行数据标注,建立管制语音识别系统测试语料库;其次,设计管制语音识别系统评价指标体系,并通过层次分析法计算指标权重;最后,提出并训练3个待测管制语音识别系统用于评价分析。结果表明,通过该评价方法可以对管制语音识别系统进行全面的评价以及分析不同管制场景下系统的表现情况,并能够根据不同的管制场景提出性能改进建议。该方法提供了一种直观的评价管制语音识别系统的途径,有望为未来的研究提供有力的指导。 展开更多
关键词 自动语音识别 空中交通管制 性能评价 层次分析法
下载PDF
基于智能语音的翻译机器人自动化控制系统设计 被引量:2
20
作者 杨维 秦波涛 《计算机测量与控制》 2024年第5期102-108,共7页
为提升自动控制效果,加快翻译速率,设计基于智能语音的翻译机器人自动化控制系统;采集外界智能语音信号,利用A/D转换器得到数字信号,启动语音唤醒模块激活翻译机器人,听写模式识别复杂语音信号,命令模式识别简单语音信号,得到语言文本... 为提升自动控制效果,加快翻译速率,设计基于智能语音的翻译机器人自动化控制系统;采集外界智能语音信号,利用A/D转换器得到数字信号,启动语音唤醒模块激活翻译机器人,听写模式识别复杂语音信号,命令模式识别简单语音信号,得到语言文本识别结果,通过深度学习关键词检测方法提取关键词作为翻译机器人的自动化控制指令,通过单片机识别自动化控制指令;实验结果表明,该系统可有效采集外界智能语音信号,在0.6 s至2 s之间时,该外界智能语音信号的振幅较小;系统运行时间最短为5.6 s,响应速度在11 m/s左右,控制误差最小为5.1%,BLEU值最高达到了42.75,控制准确率达到95.7%,提取智能语音信号的关键词,完成翻译机器人自动化控制。 展开更多
关键词 智能语音 翻译机器人 自动化控制 语音识别 最小分类错误 深度学习
下载PDF
上一页 1 2 10 下一页 到第
使用帮助 返回顶部