期刊文献+
共找到860篇文章
< 1 2 43 >
每页显示 20 50 100
Evaluation of Speech Intelligibility of a Cleft Palate Patient with Speech Prosthesis before and after Palatographical Analysis and Prosthesis Modification
1
作者 Nikhil S. Rajan Lovely Muthiah +1 位作者 Jovita D’souza Biji Thomas George 《Journal of Biosciences and Medicines》 2017年第6期60-69,共10页
The aim of the study was to evaluate the alterations in speech intelligibility in a cleft palate patient, before and after extending and modifying the palatal contour of the existing prosthesis using a correctable wax... The aim of the study was to evaluate the alterations in speech intelligibility in a cleft palate patient, before and after extending and modifying the palatal contour of the existing prosthesis using a correctable wax recording. An eight-year-old girl studying in second grade with a velopharyngeal defect using an obturator reported to the outpatient clinic complaining of lack in clarity of speech. The existing prosthesis was lacking a speech bulb hence it was decided to add the speech bulb to the existing prosthesis and evaluate the speech. Even after the use of speech bulb it was observed that she was unable to pronounce the vowels and words like shoe, vision, cheer, etc. clearly. Hence, a palatography was done using a correctable wax technique and the existing prosthesis was altered accordingly. Great improvement in speech, mastication, and velopharyngeal function was achieved after the palatography alteration of the existing prosthesis. 展开更多
关键词 speech PROSTHEsiS speech intelligibility Palatography PHONETICS CLEFT PALATE
下载PDF
Speech intelligibility and auditory perception of pre-school children with Hearing Aid,cochlear implant and Typical Hearing 被引量:2
2
作者 Mohammad Ashori 《Journal of Otology》 CSCD 2020年第2期62-66,共5页
Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children... Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children with Hearing Aid(HA),Cochlear Implant(Cl),and Typical Hearing(TH).Methods:The research design was descriptive-analytic and comparative.The participants comprised 75 male pre-school children aged 4-6 years in the 2017-2018 from Tehran,Iran.The participants were divided into three groups,and each group consisted of 25 children.The first and second groups were respectively selected from pre-school children with HA and CI using the convenience sampling method,while the third group was selected from pre-school children with TH by random sampling method.All children completed Speech Intelligibility Rating and Catego ries of Auditory Performance Questionnaires.Results:The findings indicated that the mean scores of speech intelligibility and auditory perception of the group with TH were significantly higher than those of the other groups(P<0.0001).The mean scores of speech intelligibility in the group with CI did not significantly differ from those of the group with HA(P<0.38).Also,the mean scores of auditory perception in the group with CI were significantly higher than those of the group with HA(P<0.002).Conclusion:The results showed that auditory perception in children with CI was significantly higher than children with HA.This finding highlights the importance of cochlear implantation at a younger age and its significant impact on auditory perception in deaf children. 展开更多
关键词 speech intelligibility Auditory perception Hearing aid Cochlear implant
下载PDF
Speech Intelligibility Enhancement Algorithm Based on Multi-Resolution Power-Normalized Cepstral Coefficients(MRPNCC)for Digital Hearing Aids
3
作者 Xia Wang Xing Deng +2 位作者 Hongming Shen Guodong Zhang Shibing Zhang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第2期693-710,共18页
Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great pro... Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired. 展开更多
关键词 speech intelligibility enhancement multi-resolution power-normalized cepstral coefficients binary masking value hearing impaired
下载PDF
Obstacles for Chinese EFL Learners to Improve Speech Intelligibility
4
作者 郭紫悦 《海外英语》 2021年第5期255-258,共4页
As the primary means of communication,speech is an essential aspect for humans to interact and build connections in the social world.Speech intelligibility is critical in social communication;unintelligibility may lea... As the primary means of communication,speech is an essential aspect for humans to interact and build connections in the social world.Speech intelligibility is critical in social communication;unintelligibility may lead to confusion,misunderstanding,and frustration.Many Chinese learners of English find it challenging to apply English into social interaction and reach mutual intelligibility with international communicators.This article analyzes the obstacles impeding Chinese EFL learners’speech intelligibility development,from the aspects of phonology(segmental and suprasegmental features)and pragmatics.Some strategies are proposed to help Chinese learners ameliorate phonology and pragmatics problems and improve speech intelligibility in English communication. 展开更多
关键词 Chinese EFL learners speech intelligibility COMMUNICATION segmental phonology suprasegmental phonology PRAGMATICS
下载PDF
Intelligibility of Reverberant Speech with Amplification: Limitation of Speech Intelligibility Metrics, and a Preliminary Examination of an Alternative Approach 被引量:1
5
作者 Doheon Lee Eunju Gong +2 位作者 Densil Cabrera Manuj Yadav William L. Martens 《Journal of Applied Mathematics and Physics》 2015年第2期177-186,共10页
This study examines the effect of speech level on intelligibility in different reverberation conditions, and explores the potential of loudness-based reverberation parameters proposed by Lee et al. [J. Acoust. Soc. Am... This study examines the effect of speech level on intelligibility in different reverberation conditions, and explores the potential of loudness-based reverberation parameters proposed by Lee et al. [J. Acoust. Soc. Am., 131(2), 1194-1205 (2012)] to explain the effect of speech level on intelligibility in various reverberation conditions. Listening experiments were performed with three speech levels (LAeq of 55 dB, 65 dB and 75 dB) and three reverberation conditions (T20 of 1.0 s, 1.9 s and 4.0 s), and subjects listened to speech stimuli through headphones. Collected subjective data were compared with two conventional speech intelligibility parameters (Speech Intelligibility Index and Speech Transmission Index) and two loudness-based reverberation parameters (EDTN and TN). Results reveal that the effect of speech level on intelligibility changes with a room’s reverberation conditions, and that increased level results in reduced intelligibility in highly reverberant conditions. EDTN and TN explain this finding better than do STI and SII, because they consider many psychoacoustic phenomena important for the modeling of the effect of speech level varying with reverberation. 展开更多
关键词 speech intelligibility speech Level Room REVERBERATION LOUDNESS
下载PDF
The Problem of Identifying Possible Signals of Extra-Terrestrial Civilizations in the Framework of the Information-Based Method 被引量:1
6
作者 Boris Menin 《Journal of Applied Mathematics and Physics》 2019年第10期2157-2168,共12页
Aims: The purpose of this work is to formulate the requirements for future methods of searching for extra-terrestrial civilizations by use of the concepts of information theory and the theoretically grounded method. M... Aims: The purpose of this work is to formulate the requirements for future methods of searching for extra-terrestrial civilizations by use of the concepts of information theory and the theoretically grounded method. Methodology: To realize it, the number of dimensionless criteria contained in the International System of Units (SI) has been calculated. This value, without additional assumptions, allows us to present a formula for calculating the comparative uncertainty of the model of any physical phenomenon. Based on these formulas, the magnitude of the inevitable threshold of misunderstanding of two civilizations in the universe is determined. Results: New theoretical recommendations for choosing the most effective methods to search the techno signatures of extra-terrestrial civilizations are formulated. Conclusion: Using the calculated amount of information embedded in the model, we showed that the most promising methods for finding potential residents in the Universe should combine frequency radiation with thermal or electromagnetic quantities. 展开更多
关键词 Extra-Terrestrial INTELLIGENCE AMOUNT of Information SETI si Technosignature UNCERTAINTY
下载PDF
Constructing a Simple Verbal Compiler
7
作者 Ahmed Laarfi Veton Kepuska 《International Journal of Intelligence Science》 2020年第4期83-91,共9页
The paper’s purpose is to design and program the four operation-calculators that receives voice instructions and runs them as either a voice or text phase. The Calculator simulates the work of the Compiler. The paper... The paper’s purpose is to design and program the four operation-calculators that receives voice instructions and runs them as either a voice or text phase. The Calculator simulates the work of the Compiler. The paper is a practical <span style="font-family:Verdana;">example programmed to support that it is possible to construct a verbal</span><span style="font-family:Verdana;"> Compiler.</span> 展开更多
关键词 speech Recognition Artificial Intelligence Programming Languages Compiler Construction Verbal Programming
下载PDF
Primary discussion on speech intelligibility of Chinese and the speech transmission index
8
作者 SHEN Hao(Institute of Acoustics, Academia Sinica) 《Chinese Journal of Acoustics》 1990年第1期74-81,共8页
The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained... The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained by articulation tests under different signal-to-noise ratios. 展开更多
关键词 STI Primary discussion on speech intelligibility of Chinese and the speech transmission index
原文传递
基于SI-SB系统安全模型的多层级边缘智能管控模式 被引量:1
9
作者 张充 张伟 +2 位作者 李泽亚 赵挺生 张耀庭 《中国安全科学学报》 CAS CSCD 北大核心 2024年第1期17-26,共10页
为探索信息化、智能化技术赋能下的创新型安全生产管控模式,从安全信息学的角度分析安全管控过程中的信息流动特点,提出安全生产多层级边缘智能管控模式;基于安全信息-安全行为(SI-SB)系统安全模型分析安全管控过程中安全决策偏差和滞... 为探索信息化、智能化技术赋能下的创新型安全生产管控模式,从安全信息学的角度分析安全管控过程中的信息流动特点,提出安全生产多层级边缘智能管控模式;基于安全信息-安全行为(SI-SB)系统安全模型分析安全管控过程中安全决策偏差和滞后的机制,提出安全管控系统性能改进的思路;结合安全生产组织管理体系特点和数字化技术优势,阐述数字化技术在信息感知传递、安全信息解释和安全行为引导等3个方面的赋能依据,以及数字化感知、智能化决策和多层级管控等3个方面的赋能途径,并提出具备智能决策、敏捷响应、弹性扩展和人机协同特点的安全生产多层级边缘智能管控模式;在紧急事件、短周期管控、长周期管控3类场景中,对应用智能管控模式前后的安全事件响应进行时效性计算和对比。结果表明:所提出的多层级边缘智能管控模式能够显著提高安全管控效能。 展开更多
关键词 安全信息-安全行为(si-SB)系统安全模型 多层级边缘智能管控 管控模式 安全生产 安全信息学
下载PDF
Relationship between Chinese speech intelligibility and speech transmission index in rooms using dichotic listening 被引量:3
10
作者 PENG JianXin 《Chinese Science Bulletin》 SCIE EI CAS 2008年第18期2748-2752,共5页
Speech intelligibility (SI) is an important index for the design and assessment of speech purpose hall. The relationship between Chinese speech intelligibility scores in rooms and speech transmission index (STI) under... Speech intelligibility (SI) is an important index for the design and assessment of speech purpose hall. The relationship between Chinese speech intelligibility scores in rooms and speech transmission index (STI) under diotic listening condition was studied using monaural room impulse responses obtained from the room acoustical simulation software Odeon in previous paper. The present study employs the simulated binaural room impulse responses and auralization technique to obtain the subjective Chi- nese speech intelligibility scores using rhyme test. The relationship between Chinese speech intelligi- bility scores and STI is built and validated in rooms using dichotic (binaural) listening. The result shows that there is a high correlation between Chinese speech intelligibility scores and STI using di- chotic listening. The relationship between Chinese speech intelligibility scores and STI under diotic and dichotic listening conditions is also analyzed. Compared with diotic listening, dichotic (binaural) listening (an actual listening situation) can improve 2.7 dB signal-to-noise ratio for Mandarin Chinese speech intelligibility. STI method can predict and evaluate the speech intelligibility for Mandarin Chi- nese in rooms for dichotic (binaural) listening. 展开更多
关键词 双听技术 双耳听力 立体声 脉冲反应
原文传递
Exploring Sequential Feature Selection in Deep Bi-LSTM Models for Speech Emotion Recognition
11
作者 Fatma Harby Mansor Alohali +1 位作者 Adel Thaljaoui Amira Samy Talaat 《Computers, Materials & Continua》 SCIE EI 2024年第2期2689-2719,共31页
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona... Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field. 展开更多
关键词 Artificial intelligence application multi features sequential selection speech emotion recognition deep Bi-LSTM
下载PDF
ACOUSTICAL EVALUATION OF SIX‘GREEN’OFFICE BUILDINGS
12
作者 Murray Hodgson 《Journal of Green Building》 2008年第4期108-118,共11页
To explain the reactions of the building occupants to their acoustical environments,meetings with the designers,walk-through surveys,and detailed acoustical measurements were done.The objective was to determine how de... To explain the reactions of the building occupants to their acoustical environments,meetings with the designers,walk-through surveys,and detailed acoustical measurements were done.The objective was to determine how design decisions affect office acoustical environments,and how to improve the acoustical design of‘green’office buildings.Design-performance criteria were established.Measurements were made of noise level,reverberation time,speechintelligibility index(SII),and noise isolation.Noise levels were atypically low in unoccupied buildings with no mechanical ventilation,but excessive in areas near external walls next to noisy external noise sources—especially with windows open for ventilation—and in occupied buildings.Reverberation times were excessive in areas with large volumes and insufficient sound absorption.Speech intelligibility was generally adequate,but speech privacy was inadequate in shared and open-office areas,and into private offices with the doors open for ventilation.Improvement of the acoustical design of‘green’buildings must include increasing the external-internal noise isolation and that between workplaces,and the use of adequate sound absorption to control reverberation and noise. 展开更多
关键词 ‘green’building offices acoustical environment occupant satisfaction noise levels reverberation speech intelligibility speech privacy noise isolation
下载PDF
A computational model for assessment of speech intelligibility in informational masking
13
作者 Xihong WU Jing CHEN 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2012年第1期107-115,共9页
The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was ai... The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was aimed to make a computational model considering the mechanism of informational masking. Several psy- choacoustic experiments were conducted to test the ef- fect of informational masking on speech intelligibility by manipulating the number of masking talker, speech rate, and the similarity of F0 contour between target and masker. The results showed that the speech recep- tion threshold for the target increased as the F0 contours of the masker became more similar to that of the tar- get, suggesting that the difficulty in segregating the tar- get harmonics from the masker harmonics may underlie the informational masking effect. Based on these stud- ies, a new auditory computational model was made by inducing the auditory function of harmonic extraction to the traditional model of speech intelligibility index (SII), named as harmonic extraction (HF) model. The predictions of the HF model are highly consistent with the experimental results. 展开更多
关键词 auditory computational model speech intelligibility informational masking F0 contour harmonic extraction
原文传递
The influence of dummy head on measuring speech intelligibility
14
作者 ZHANG Siyu ZHENG Xiaolin MENG Zihou 《Chinese Journal of Acoustics》 CSCD 2016年第2期178-192,共15页
In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial... In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial configurations of a target source and a noise source in the horizontal plane. The differences between standard STIPA measured without a dummy head and binaural STIPA measured with a dummy head were compared and the correlation of subjective speech intelligibility and objective STIPA was analyzed. It is showed that the position of sound source affects significantly on binaural STIPA and subjective intelligibility measured by a dummy head or measured in a real-life scenario. The standard STIPA is closer to the lower value of the two binaural STIPA values. The speech intelligibility is higher for a single ear which is on the same side with the target source or on the other side of the noise source. Binaural speech intelligibility is always the lowest when both target and noise sources are at the same place but once apart the speech intelligibility will increase sharply. It is also found that the subjective intelligibility measured by a dummy head or measured in a real-life scenario is uncorrelated with standard STIPA, but correlated highly with STIPA measured with a dummy head. The subjective intelligibility of one single ear is correlated highly with STIPA measured at the same ear, and the binaural speech intelligibility is in well agreement with the higher value of the two binaural STIPA values. 展开更多
关键词 HEAD The influence of dummy head on measuring speech intelligibility
原文传递
Investigation of hearing aid users'speech understanding in noise and their spectral-temporal resolution skills
15
作者 Mert Kılıç Eyyup Kara 《Journal of Otology》 CAS CSCD 2023年第3期146-151,共6页
Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:... Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:Our study included sixty-eight hearing aid users aged between 40-70 years,with bilateral mild and moderate symmetrical sensorineural hearing loss.Random gap detection test,Turkish matrix test and spectral-temporally modulated ripple test were implemented on the participants with bilateral hearing aids.The test results acquired were compared statistically according to different variables and the correlations were examined.Results:No statistically significant differences were observed for speech-in-noise recognition,spectraltemporal resolution among older and younger adults in hearing aid users(p>0.05).There wasn’t found a statistically significant difference among test outcomes as regards different hearing loss degrees(p>0.05).Higher performances were obtained in terms of temporal resolution in male participants and participants with more hearing aid use experience(p<0.05).Significant correlations were obtained between the results of speech-in-noise recognition,temporal resolution and spectral resolution tests performed with hearing aids(p<0.05).Conclusion:Our study findings emphasized the importance of regular hearing aid use and it showed that some auditory skills can be improved with hearing aids.Observation of correlations among the speechin-noise recognition,temporal resolution and spectral resolution tests have revealed that these skills should be evaluated as a whole to maximize the patient’s communication abilities. 展开更多
关键词 Hearing aids speech in noise Spectral resolution speech intelligibility Temporal resolution
下载PDF
Automated Speech Recognition System to Detect Babies’ Feelings through Feature Analysis
16
作者 Sana Yasin Umar Draz +12 位作者 Tariq Ali Kashaf Shahid Amna Abid Rukhsana Bibi Muhammad Irfan Mohammed A.Huneif Sultan A.Almedhesh Seham M.Alqahtani Alqahtani Abdulwahab Mohammed Jamaan Alzahrani Dhafer Batti Alshehri Alshehri Ali Abdullah Saifur Rahman 《Computers, Materials & Continua》 SCIE EI 2022年第11期4349-4367,共19页
Diagnosing a baby’s feelings poses a challenge for both doctors and parents because babies cannot explain their feelings through expression or speech.Understanding the emotions of babies and their associated expressi... Diagnosing a baby’s feelings poses a challenge for both doctors and parents because babies cannot explain their feelings through expression or speech.Understanding the emotions of babies and their associated expressions during different sensations such as hunger,pain,etc.,is a complicated task.In infancy,all communication and feelings are propagated through cryspeech,which is a natural phenomenon.Several clinical methods can be used to diagnose a baby’s diseases,but nonclinical methods of diagnosing a baby’s feelings are lacking.As such,in this study,we aimed to identify babies’feelings and emotions through their cry using a nonclinical method.Changes in the cry sound can be identified using our method and used to assess the baby’s feelings.We considered the frequency of the cries from the energy of the sound.The feelings represented by the infant’s cry are judged to represent certain sensations expressed by the child using the optimal frequency of the recognition of a real-world audio sound.We used machine learning and artificial intelligence to distinguish cry tones in real time through feature analysis.The experimental group consisted of 50%each male and female babies,and we determined the relevancy of the results against different parameters.This application produced real-time results after recognizing a child’s cry sounds.The novelty of our work is that we,for the first time,successfully derived the feelings of young children through the cry-speech of the child,showing promise for end-user applications. 展开更多
关键词 Cry-to-speak machine learning artificial intelligence cry speech detection babies
下载PDF
Speech Recognition via CTC-CNN Model
17
作者 Wen-Tsai Sung Hao-WeiKang Sung-Jung Hsiao 《Computers, Materials & Continua》 SCIE EI 2023年第9期3833-3858,共26页
In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process o... In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution. 展开更多
关键词 Artificial intelligence speech recognition speech to text convolutional neural network automatic speech recognition
下载PDF
Improving Speech Enhancement Framework via Deep Learning
18
作者 Sung-Jung Hsiao Wen-Tsai Sung 《Computers, Materials & Continua》 SCIE EI 2023年第5期3817-3832,共16页
Speech plays an extremely important role in social activities.Many individuals suffer from a“speech barrier,”which limits their communication with others.In this study,an improved speech recognitionmethod is propose... Speech plays an extremely important role in social activities.Many individuals suffer from a“speech barrier,”which limits their communication with others.In this study,an improved speech recognitionmethod is proposed that addresses the needs of speech-impaired and deaf individuals.A basic improved connectionist temporal classification convolutional neural network(CTC-CNN)architecture acoustic model was constructed by combining a speech database with a deep neural network.Acoustic sensors were used to convert the collected voice signals into text or corresponding voice signals to improve communication.The method can be extended to modern artificial intelligence techniques,with multiple applications such as meeting minutes,medical reports,and verbatim records for cars,sales,etc.For experiments,a modified CTC-CNN was used to train an acoustic model,which showed better performance than the earlier common algorithms.Thus a CTC-CNN baseline acoustic model was constructed and optimized,which reduced the error rate to about 18%and improved the accuracy rate. 展开更多
关键词 Artificial intelligence speech recognition speech to text CTC-CNN
下载PDF
构音障碍语音识别算法研究综述 被引量:1
19
作者 宋伟 张杨豪 《计算机工程与应用》 CSCD 北大核心 2024年第11期62-74,共13页
构音障碍作为一种医学难症,目前主流的语音识别技术并不能很好地适应这一领域的需求。同时针对构音障碍的语音识别技术利用预训练及个性化训练相结合的方式,通过数据驱动进一步提升了算法性能,识别字错误率进一步降低,但是目前针对构音... 构音障碍作为一种医学难症,目前主流的语音识别技术并不能很好地适应这一领域的需求。同时针对构音障碍的语音识别技术利用预训练及个性化训练相结合的方式,通过数据驱动进一步提升了算法性能,识别字错误率进一步降低,但是目前针对构音障碍的语音识别技术离实际商用还存在一定的距离,该技术的发展受数据规模和技术的限制。到目前为止,尚未出现针对构音障碍语音识别方面的综述文章,亟需将该领域中各种数据集的构建方法和先进技术进行对比分析,以方便进入该领域的研究人员快速获取这方面的知识。对现有数据集、主流算法、评估方式进行了调研,总结了国内外主流构音障碍数据集的规模、形式和特点。分析了构音障碍语音识别的主流算法,并给出了不同算法的性能和特点。最后,研究了基于构音障碍患者的严重等级的算法模型性能评价指标,并讨论了未来的研究方向,以期能够为从事构音障碍语音识别的研究人员提供帮助,助力该领域的快速发展。 展开更多
关键词 构音障碍 语音识别 深度学习 人工智能
下载PDF
脑卒中后构音障碍患者构音器官运动功能与言语清晰度的相关性
20
作者 罗薇 何怡 张庆苏 《中国康复理论与实践》 CSCD 北大核心 2024年第7期818-822,共5页
目的 探讨脑卒中后构音障碍患者构音器官运动功能与言语清晰度(SI)之间的关系。方法 选择2020年11月至2023年10月北京博爱医院收治的住院脑卒中后构音障碍患者67例,采用SI测试和Frenchay构音障碍评价(FDA)进行评定,SI≤65%为低SI组,SI&g... 目的 探讨脑卒中后构音障碍患者构音器官运动功能与言语清晰度(SI)之间的关系。方法 选择2020年11月至2023年10月北京博爱医院收治的住院脑卒中后构音障碍患者67例,采用SI测试和Frenchay构音障碍评价(FDA)进行评定,SI≤65%为低SI组,SI> 65%为高SI组。结果 除颌位置外,高SI组FDA各项评分均低于低SI组(Z> 1.543, P <0.05)。FDA各项评分均与SI呈负相关(r <-0.343, P <0.001),其中与唇运动、舌运动和喉控制3项评分的相关性最大(r <-0.6)。结论 脑卒中后构音障碍患者构音器官运动功能与SI相关,特别是唇、舌、喉的功能。 展开更多
关键词 脑卒中 构音障碍 构音器官 运动功能 言语清晰度
下载PDF
上一页 1 2 43 下一页 到第
使用帮助 返回顶部