期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
A coherent method for finding arrival directions of speech signals and its application for noise reduction in microphone array
1
作者 Zheng Liu and Fumitada Itakura (Department of Electrical Engineering, Faculty of Engineering,Nagoya University Furo-Cho, Chikusa-Ku, Nagoya, 464-01, Japan ) 《Chinese Journal of Acoustics》 1997年第3期214-228,共15页
The research on finding the arrival directions of speech signals by microphone arrny is proposed. We first analyze the uniform microphone array and give the design for microphone array applied in the hand-free speech ... The research on finding the arrival directions of speech signals by microphone arrny is proposed. We first analyze the uniform microphone array and give the design for microphone array applied in the hand-free speech recognition. Combining the traditional direction finding technique of MUltiple SIgnal Classification (MUSIC) with the focusing matrix method, we improve the resolving power of the microphone array for multiple speech sources.As one application of finding Direction of Arrival (DOA), a new microphone-array system for noise reduction is proposed. The new system is based on maximum likelihood estimate technique which reconstruct superimposed signals from different directions by using DOA information. The DOA information is got in terms of focusing MUSIC method which has been proven to have high performance than conventional MUSIC method on speaker localization[1]. 展开更多
关键词 IEEE ASSP A coherent method for finding arrival directions of speech signals and its application for noise reduction in microphone array
原文传递
Heart Rate Extraction from Vowel Speech Signals 被引量:5
2
作者 Abdelwadood Mesleh Dmitriy Skopin +1 位作者 Sergey Baglikov Anas Quteishat 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第6期1243-1251,共9页
This paper presents a novel non-contact heart rate extraction method from vowel speech signals. The proposed method is based on modeling the relationship between speech production of vowel speech signals and heart act... This paper presents a novel non-contact heart rate extraction method from vowel speech signals. The proposed method is based on modeling the relationship between speech production of vowel speech signals and heart activities for humans where it is observed that the moment of heart beat causes a short increment (evolution) of vowel speech formants. The short-time Fourier transform (STFT) is used to detect the formant maximum peaks so as to accurately estimate the heart rate. Compared with traditional contact pulse oximeter, the average accuracy of the proposed non-contact heart rate extraction method exceeds 95%. The proposed non-contact heart rate extraction method is expected to play an important role in modern medical applications. 展开更多
关键词 ELECTROCARDIOGRAM feature extraction heart rate short-tlme Fourier transform vowel speech signal
原文传递
Implementation of Hybrid Deep Reinforcement Learning Technique for Speech Signal Classification
3
作者 R.Gayathri K.Sheela Sobana Rani 《Computer Systems Science & Engineering》 SCIE EI 2023年第7期43-56,共14页
Classification of speech signals is a vital part of speech signal processing systems.With the advent of speech coding and synthesis,the classification of the speech signal is made accurate and faster.Conventional meth... Classification of speech signals is a vital part of speech signal processing systems.With the advent of speech coding and synthesis,the classification of the speech signal is made accurate and faster.Conventional methods are considered inaccurate due to the uncertainty and diversity of speech signals in the case of real speech signal classification.In this paper,we use efficient speech signal classification using a series of neural network classifiers with reinforcement learning operations.Prior classification of speech signals,the study extracts the essential features from the speech signal using Cepstral Analysis.The features are extracted by converting the speech waveform to a parametric representation to obtain a relatively minimized data rate.Hence to improve the precision of classification,Generative Adversarial Networks are used and it tends to classify the speech signal after the extraction of features from the speech signal using the cepstral coefficient.The classifiers are trained with these features initially and the best classifier is chosen to perform the task of classification on new datasets.The validation of testing sets is evaluated using RL that provides feedback to Classifiers.Finally,at the user interface,the signals are played by decoding the signal after being retrieved from the classifier back based on the input query.The results are evaluated in the form of accuracy,recall,precision,f-measure,and error rate,where generative adversarial network attains an increased accuracy rate than other methods:Multi-Layer Perceptron,Recurrent Neural Networks,Deep belief Networks,and Convolutional Neural Networks. 展开更多
关键词 Neural network(NN) reinforcement learning(RL) cepstral coefficient speech signal classification
下载PDF
COMPRESSED SPEECH SIGNAL SENSING BASED ON THE STRUCTURED BLOCK SPARSITY WITH PARTIAL KNOWLEDGE OF SUPPORT 被引量:1
4
作者 JiYunyun YangZhen XuQian 《Journal of Electronics(China)》 2012年第1期62-71,共10页
Structural and statistical characteristics of signals can improve the performance of Compressed Sensing (CS). Two kinds of features of Discrete Cosine Transform (DCT) coefficients of voiced speech signals are discusse... Structural and statistical characteristics of signals can improve the performance of Compressed Sensing (CS). Two kinds of features of Discrete Cosine Transform (DCT) coefficients of voiced speech signals are discussed in this paper. The first one is the block sparsity of DCT coefficients of voiced speech formulated from two different aspects which are the distribution of the DCT coefficients of voiced speech and the comparison of reconstruction performance between the mixed program and Basis Pursuit (BP). The block sparsity of DCT coefficients of voiced speech means that some algorithms of block-sparse CS can be used to improve the recovery performance of speech signals. It is proved by the simulation results of the mixed program which is an improved version of the mixed program. The second one is the well known large DCT coefficients of voiced speech focus on low frequency. In line with this feature, a special Gaussian and Partial Identity Joint (GPIJ) matrix is constructed as the sensing matrix for voiced speech signals. Simulation results show that the GPIJ matrix outperforms the classical Gaussian matrix for speech signals of male and female adults. 展开更多
关键词 Compressed Sensing (CS) speech signals Sensing matrix Block sparsity
下载PDF
Enhanced Frequency-Domain Frost Algorithm Using Conjugate Gradient Techniques for Speech Enhancement 被引量:1
5
作者 Shengkui Zhao Douglas L. Jones 《Journal of Electronic Science and Technology》 CAS 2012年第2期158-162,共5页
In this paper, the frequency-domain Frost algorithm is enhanced by using conjugate gradient techniques for speech enhancement. Unlike the non-adaptive approach of computing the optimum minimum variance distortionless ... In this paper, the frequency-domain Frost algorithm is enhanced by using conjugate gradient techniques for speech enhancement. Unlike the non-adaptive approach of computing the optimum minimum variance distortionless response (MVDR) solution with the correlation matrix inversion, the Frost algorithm implementing the stochastic constrained least mean square (LMS) algorithm can adaptively converge to the MVDR solution in mean-square sense, but with a very slow convergence rate. In this paper, we propose a frequency-domain constrained conjugate gradient (FDCCG) algorithm to speed up the convergence. The devised FDCCG algorithm avoids the matrix inversion and exhibits fast convergence. The speech enhancement experiments for the target speech signal corrupted by two and five interfering speech signals are demonstrated by using a four-channel acoustic-vector-sensor (AVS) micro-phone array and show the superior performance. 展开更多
关键词 Adaptive gence correlation speech arrays. signal processing conver- enhancement MICROPHONE
下载PDF
Speech Signal Detection Based on Bayesian Estimation by Observing Air-Conducted Speech under Existence of Surrounding Noise with the Aid of Bone-Conducted Speech 被引量:1
6
作者 Hisako Orimoto Akira Ikuta Kouji Hasegawa 《Intelligent Information Management》 2021年第4期199-213,共15页
In order to apply speech recognition systems to actual circumstances such as inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-wri... In order to apply speech recognition systems to actual circumstances such as inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-writing is difficult, some countermeasure methods for surrounding noise are indispensable. In this study, a signal detection method to remove the noise for actual speech signals is proposed by using Bayesian estimation with the aid of bone-conducted speech. More specifically, by introducing Bayes’ theorem based on the observation of air-conducted speech contaminated by surrounding background noise, a new type of algorithm for noise removal is theoretically derived. In the proposed speech detection method, bone-conducted speech is utilized in order to obtain precise estimation for speech signals. The effectiveness of the proposed method is experimentally confirmed by applying it to air- and bone-conducted speeches measured in real environment under the existence of surrounding background noise. 展开更多
关键词 speech Signal Detection Bayesian Estimation Air- and Bone-Conducted speeches Surrounding Noise
下载PDF
Enhancing Parkinson’s Disease Diagnosis Accuracy Through Speech Signal Algorithm Modeling
7
作者 Omar M.El-Habbak Abdelrahman M.Abdelalim +5 位作者 Nour H.Mohamed Habiba M.Abd-Elaty Mostafa A.Hammouda Yasmeen Y.Mohamed Mohanad A.Taifor Ali W.Mohamed 《Computers, Materials & Continua》 SCIE EI 2022年第2期2953-2969,共17页
Parkinson’s disease(PD),one of whose symptoms is dysphonia,is a prevalent neurodegenerative disease.The use of outdated diagnosis techniques,which yield inaccurate and unreliable results,continues to represent an obs... Parkinson’s disease(PD),one of whose symptoms is dysphonia,is a prevalent neurodegenerative disease.The use of outdated diagnosis techniques,which yield inaccurate and unreliable results,continues to represent an obstacle in early-stage detection and diagnosis for clinical professionals in the medical field.To solve this issue,the study proposes using machine learning and deep learning models to analyze processed speech signals of patients’voice recordings.Datasets of these processed speech signals were obtained and experimented on by random forest and logistic regression classifiers.Results were highly successful,with 90%accuracy produced by the random forest classifier and 81.5%by the logistic regression classifier.Furthermore,a deep neural network was implemented to investigate if such variation in method could add to the findings.It proved to be effective,as the neural network yielded an accuracy of nearly 92%.Such results suggest that it is possible to accurately diagnose early-stage PD through merely testing patients’voices.This research calls for a revolutionary diagnostic approach in decision support systems,and is the first step in a market-wide implementation of healthcare software dedicated to the aid of clinicians in early diagnosis of PD. 展开更多
关键词 Early diagnosis logistic regression neural network Parkinson’s disease random forest speech signal processing algorithms
下载PDF
Speech Encryption with Fractional Watermark
8
作者 Yan Sun Cun Zhu Qi Cui 《Computers, Materials & Continua》 SCIE EI 2022年第10期1817-1825,共9页
Research on the feature of speech and image signals are carried out from two perspectives,the time domain and the frequency domain.The speech and image signals are a non-stationary signal,so FT is not used for the non... Research on the feature of speech and image signals are carried out from two perspectives,the time domain and the frequency domain.The speech and image signals are a non-stationary signal,so FT is not used for the non-stationary characteristics of the signal.When short-term stable speech is obtained by windowing and framing the subsequent processing of the signal is completed by the Discrete Fourier Transform(DFT).The Fast Discrete Fourier Transform is a commonly used analysis method for speech and image signal processing in frequency domain.It has the problem of adjusting window size to a for desired resolution.But the Fractional Fourier Transform can have both time domain and frequency domain processing capabilities.This paper performs global processing speech encryption by combining speech with image of Fractional Fourier Transform.The speech signal is embedded watermark image that is processed by fractional transformation,and the embedded watermark has the effect of rotation and superposition,which improves the security of the speech.The paper results show that the proposed speech encryption method has a higher security level by Fractional Fourier Transform.The technology is easy to extend to practical applications. 展开更多
关键词 Fractional Fourier Transform WATERMARK speech signal processing image processing
下载PDF
BLIND SPEECH SEPARATION FOR ROBOTS WITH INTELLIGENT HUMAN-MACHINE INTERACTION
9
作者 Huang Yulei Ding Zhizhong +1 位作者 Dai Lirong Chen Xiaoping 《Journal of Electronics(China)》 2012年第3期286-293,共8页
Speech recognition rate will deteriorate greatly in human-machine interaction when the speaker's speech mixes with a bystander's voice. This paper proposes a time-frequency approach for Blind Source Seperation... Speech recognition rate will deteriorate greatly in human-machine interaction when the speaker's speech mixes with a bystander's voice. This paper proposes a time-frequency approach for Blind Source Seperation (BSS) for intelligent Human-Machine Interaction(HMI). Main idea of the algorithm is to simultaneously diagonalize the correlation matrix of the pre-whitened signals at different time delays for every frequency bins in time-frequency domain. The prososed method has two merits: (1) fast convergence speed; (2) high signal to interference ratio of the separated signals. Numerical evaluations are used to compare the performance of the proposed algorithm with two other deconvolution algorithms. An efficient algorithm to resolve permutation ambiguity is also proposed in this paper. The algorithm proposed saves more than 10% of computational time with properly selected parameters and achieves good performances for both simulated convolutive mixtures and real room recorded speeches. 展开更多
关键词 Blind Source Separation (BSS) Blind deconvolution speech signal processing Human-machine interaction Simultaneous diagonalization
下载PDF
A DISTRIBUTED COMPRESSED SENSING APPROACH FOR SPEECH SIGNAL DENOISING
10
作者 Ji Yunyun Yang Zhen 《Journal of Electronics(China)》 2011年第4期509-517,共9页
Compressed sensing,a new area of signal processing rising in recent years,seeks to minimize the number of samples that is necessary to be taken from a signal for precise reconstruction.The precondition of compressed s... Compressed sensing,a new area of signal processing rising in recent years,seeks to minimize the number of samples that is necessary to be taken from a signal for precise reconstruction.The precondition of compressed sensing theory is the sparsity of signals.In this paper,two methods to estimate the sparsity level of the signal are formulated.And then an approach to estimate the sparsity level directly from the noisy signal is presented.Moreover,a scheme based on distributed compressed sensing for speech signal denoising is described in this work which exploits multiple measurements of the noisy speech signal to construct the block-sparse data and then reconstruct the original speech signal using block-sparse model-based Compressive Sampling Matching Pursuit(CoSaMP) algorithm.Several simulation results demonstrate the accuracy of the estimated sparsity level and that this de-noising system for noisy speech signals can achieve favorable performance especially when speech signals suffer severe noise. 展开更多
关键词 Distributed compressed sensing Sparsity estimation speech signal DENOISING
下载PDF
Analysis of Deaf Speakers’ Speech Signal for Understanding the Acoustic Characteristics by Territory Specific Utterances
11
作者 Nirmaladevi Jaganathan Bommannaraja Kanagaraj 《Circuits and Systems》 2016年第8期1709-1721,共13页
An important concern with the deaf community is inability to hear partially or totally. This may affect the development of language during childhood, which limits their habitual existence. Consequently to facilitate s... An important concern with the deaf community is inability to hear partially or totally. This may affect the development of language during childhood, which limits their habitual existence. Consequently to facilitate such deaf speakers through certain assistive mechanism, an effort has been taken to understand the acoustic characteristics of deaf speakers by evaluating the territory specific utterances. Speech signals are acquired from 32 normal and 32 deaf speakers by uttering ten Indian native Tamil language words. The speech parameters like pitch, formants, signal-to-noise ratio, energy, intensity, jitter and shimmer are analyzed. From the results, it has been observed that the acoustic characteristics of deaf speakers differ significantly and their quantitative measure dominates the normal speakers for the words considered. The study also reveals that the informative part of speech in a normal and deaf speakers may be identified using the acoustic features. In addition, these attributes may be used for differential corrections of deaf speaker’s speech signal and facilitate listeners to understand the conveyed information. 展开更多
关键词 Deaf Speaker Hard of Hearing Deaf speech Processing Assistive Mechanism for Deaf Speaker speech Correction speech Signal Processing
下载PDF
TC-Net:A Modest&Lightweight Emotion Recognition System Using Temporal Convolution Network
12
作者 Muhammad Ishaq Mustaqeem Khan Soonil Kwon 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3355-3369,共15页
Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines.Speech Emotion Recognition(SER)is one of the critical sources for human evaluatio... Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines.Speech Emotion Recognition(SER)is one of the critical sources for human evaluation,which is applicable in many real-world applications such as healthcare,call centers,robotics,safety,and virtual reality.This work developed a novel TCN-based emotion recognition system using speech signals through a spatial-temporal convolution network to recognize the speaker’s emotional state.The authors designed a Temporal Convolutional Network(TCN)core block to recognize long-term dependencies in speech signals and then feed these temporal cues to a dense network to fuse the spatial features and recognize global information for final classification.The proposed network extracts valid sequential cues automatically from speech signals,which performed better than state-of-the-art(SOTA)and traditional machine learning algorithms.Results of the proposed method show a high recognition rate compared with SOTAmethods.The final unweighted accuracy of 80.84%,and 92.31%,for interactive emotional dyadic motion captures(IEMOCAP)and berlin emotional dataset(EMO-DB),indicate the robustness and efficiency of the designed model. 展开更多
关键词 Affective computing deep learning emotion recognition speech signal temporal convolutional network
下载PDF
The laboratory of acoustics,speech and signal processing at the institute of acoustics 被引量:1
13
《Chinese Journal of Acoustics》 1990年第4期372-374,共3页
The Laboratory of Acoustics,Speech and Signal Processing(LASSP),theunique and superior national key laboratory of ASSP in China,has been foundedat the Inst.of Acoustics,Academia Sinica,Beijing PRC.After three years of... The Laboratory of Acoustics,Speech and Signal Processing(LASSP),theunique and superior national key laboratory of ASSP in China,has been foundedat the Inst.of Acoustics,Academia Sinica,Beijing PRC.After three years ofefforts,the construction of the LASSP has been completed successfully and thecertain capability of performing frontier research projects in fundamental theory andapplied technology of sound field and acoustic signal processing has ben formed.A fiexible and complete experimental acoustic signal processing system hasbeen set up in the LASSP.With the remarkable advantage of real time signalprocessing and resource sharing,a wide range of research projects in the field ofASSP can be conducted in the laboratory.The Signal Processing Center of theLASSP is well equipped with many computer research facilities including the 展开更多
关键词 ASSP In WELL The laboratory of acoustics speech and signal processing at the institute of acoustics
原文传递
LSB steganalysis of speech data based on distance measure and ML decision
14
作者 DENG Zong-yuan SHAO Xi YANG Zhen 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2007年第3期103-107,共5页
Steganalysis can be used to classify an object whether or not it contains hidden information. In this article, is presented, a novel approach to detect the presence of least significant bit(LSB) steganographic messa... Steganalysis can be used to classify an object whether or not it contains hidden information. In this article, is presented, a novel approach to detect the presence of least significant bit(LSB) steganographic messages in the voice secure communication system. A distance measure, which has proven to be sensitive to LSB steganography by analysis of variance (ANOVA), is denoted to estimate the difference between the host signal and the stego signal. Then an maximum likelihood (ML) decision is combined to form the classifier. Statistical experiments show that the proposed approach has a highly accurate rate and low computational complexity. 展开更多
关键词 speech signal processing LSB steganography STEGANALYSIS ML decision
原文传递
Speech Recognition for Parkinson’s Disease Based on Improved Genetic Algorithm and Data Enhancement Technology
15
作者 Jing Qin Tong Liu +3 位作者 Zumin Wang Qijie Zou Liming Chen Chang Hong 《国际计算机前沿大会会议论文集》 2022年第1期273-286,共14页
Parkinson’s disease is one of the most destructive diseases to the nervous system.Speech disorder is one of the typical symptoms of Parkinson’s disease.Approximately 90%of Parkin-son’s patients develop some degree ... Parkinson’s disease is one of the most destructive diseases to the nervous system.Speech disorder is one of the typical symptoms of Parkinson’s disease.Approximately 90%of Parkin-son’s patients develop some degree of speech disorder,which affects speech function faster than any other subsystem of the body.Screening Parkinson’s disease by sound is a very effective method that has attracted a growing number of researchers over the past decade.Patients with Parkinson’s disease could be identified by recording the sound signal of the pronunciation of words,extracting appropriate features and identifying the disturbance in their voices.This paper proposes an improved genetic algorithm combined with a data enhancement method for Parkinson’s speech signal recognition.Specifically,the methods first extract representative speech signal features through the L1 regularization SVM and then enhance the representative feature data by the SMOTE algorithm.Following this,both original and enhanced features are used to train an SVM classifier for speech signal recognition.An improved genetic algorithm was applied to find the optimal parameters of the SVM.The effectiveness of our proposed model is demonstrated by using Parkinson’s disease audio data set from the UCI machine learning library,and compared with the most advancedmethods,our proposed method has the best performance. 展开更多
关键词 Parkinson’s disease speech signal detection Support vector machine SMOTE algorithm Genetic algorithm
原文传递
4th National Conference on Speech,Image,Communication,and Signal Processing,held in Beijing,25—27 October 1989
16
作者 ZHANG Jialu 《Chinese Journal of Acoustics》 1990年第2期183-183,共1页
The 4th National Conference on Speech,Image,Communication and Signal Pro-cessing,which was sponsored by the Institute of Speech,Hearing,and Music Acoustics,Acoustical Society of China and the Institute of Signal Proce... The 4th National Conference on Speech,Image,Communication and Signal Pro-cessing,which was sponsored by the Institute of Speech,Hearing,and Music Acoustics,Acoustical Society of China and the Institute of Signal Processing,Electronic Society ofChina,was held,25—27 October,1989,at Beijing Institute of Post and Telecommun-ication.The conference drew a registration of 150 from different places in the country,which made it the largest conference in the last eight years.The president of Institute of Speech,Hearing,and Music Acoustics,ASC,professorZHANG Jialu made a openning speech at the openning session,and the honorary presi-dent of Acoustical Society of China,professor MAA Dah-You and the president of 展开更多
关键词 October 1989 National Conference on speech Image Communication and Signal Processing held in Beijing 25
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部