期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
An Efficient Text-Independent Speaker Identification Using Feature Fusion and Transformer Model
1
作者 Arfat Ahmad Khan Rashid Jahangir +4 位作者 Roobaea Alroobaea Saleh Yahya Alyahyan Ahmed H.Almulhi Majed Alsafyani Chitapong Wechtaisong 《Computers, Materials & Continua》 SCIE EI 2023年第5期4085-4100,共16页
Automatic Speaker Identification(ASI)involves the process of distinguishing an audio stream associated with numerous speakers’utterances.Some common aspects,such as the framework difference,overlapping of different s... Automatic Speaker Identification(ASI)involves the process of distinguishing an audio stream associated with numerous speakers’utterances.Some common aspects,such as the framework difference,overlapping of different sound events,and the presence of various sound sources during recording,make the ASI task much more complicated and complex.This research proposes a deep learning model to improve the accuracy of the ASI system and reduce the model training time under limited computation resources.In this research,the performance of the transformer model is investigated.Seven audio features,chromagram,Mel-spectrogram,tonnetz,Mel-Frequency Cepstral Coefficients(MFCCs),delta MFCCs,delta-delta MFCCs and spectral contrast,are extracted from the ELSDSR,CSTRVCTK,and Ar-DAD,datasets.The evaluation of various experiments demonstrates that the best performance was achieved by the proposed transformer model using seven audio features on all datasets.For ELSDSR,CSTRVCTK,and Ar-DAD,the highest attained accuracies are 0.99,0.97,and 0.99,respectively.The experimental results reveal that the proposed technique can achieve the best performance for ASI problems. 展开更多
关键词 speaker identification signal processing ARABIC deep learning TRANSFORMER
下载PDF
Optical Ciphering Scheme for Cancellable Speaker Identification System
2
作者 Walid El-Shafai Marwa A.Elsayed +5 位作者 Mohsen A.Rashwan Moawad I.Dessouky Adel S.El-Fishawy Naglaa F.Soliman Amel A.Alhussan Fathi EAbd El-Samie 《Computer Systems Science & Engineering》 SCIE EI 2023年第4期563-578,共16页
Most current security and authentication systems are based on personal biometrics.The security problem is a major issue in the field of biometric systems.This is due to the use in databases of the original biometrics.... Most current security and authentication systems are based on personal biometrics.The security problem is a major issue in the field of biometric systems.This is due to the use in databases of the original biometrics.Then biometrics will forever be lost if these databases are attacked.Protecting privacy is the most important goal of cancelable biometrics.In order to protect privacy,therefore,cancelable biometrics should be non-invertible in such a way that no information can be inverted from the cancelable biometric templates stored in personal identification/verification databases.One methodology to achieve non-invertibility is the employment of non-invertible transforms.This work suggests an encryption process for cancellable speaker identification using a hybrid encryption system.This system includes the 3D Jigsaw transforms and Fractional Fourier Transform(FrFT).The proposed scheme is compared with the optical Double Random Phase Encoding(DRPE)encryption process.The evaluation of simulation results of cancellable biometrics shows that the algorithm proposed is secure,authoritative,and feasible.The encryption and cancelability effects are good and reveal good performance.Also,it introduces recommended security and robustness levels for its utilization for achieving efficient cancellable biometrics systems. 展开更多
关键词 Cancellable biometrics jigsaw transform FrFT DRPE speaker identification
下载PDF
Application of formant instantaneous characteristics to speech recognition and speaker identification
3
作者 侯丽敏 胡晓宁 谢娟敏 《Journal of Shanghai University(English Edition)》 CAS 2011年第2期123-127,共5页
This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant chara... This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant characteristics can be represented by instantaneous frequency (IF) and instantaneous bandwidth, namely formant instantaneous characteristics (FIC). In order to explore the importance of FIC both in SR and SI, this paper proposes different features from FIC used for SR and SI systems. When combing these new features with conventional parameters, higher identification rate can be achieved than that of using Mel-frequency cepstral coefficients (MFCC) parameters only. The experiment results show that the new features are effective characteristic parameters and can be treated as the compensation of conventional parameters for SR and SI. 展开更多
关键词 instantaneous frequency (IF) Hilbert transform (HT) speech recognition speaker identification Mel-frequency cepstral coefficients (MFCC)
下载PDF
Speaker Identification Based on Fractal Dimensions
4
作者 侯丽敏 王朔中 《Journal of Shanghai University(English Edition)》 CAS 2003年第1期60-63,共4页
This paper discusses application of fractal dimensions to speech processing. Generalized dimensions of arbitrary orders and associated fractal parameters are used in speaker identification. A characteristic vactor bas... This paper discusses application of fractal dimensions to speech processing. Generalized dimensions of arbitrary orders and associated fractal parameters are used in speaker identification. A characteristic vactor based on these parameters is formed, and a recognition criterion definded in order to identify individual speakers. Experimental results show the usefulness of fractal dimensions in characterizing speaker identity. 展开更多
关键词 speaker identification CHAOS fractal dimension.
下载PDF
COMBINATION OF PITCH SYNCHRONOUS ANALYSIS AND FISHER CRITERION FOR SPEAKER IDENTIFICATION
5
作者 Zeng Yumin Wu Zhenyang 《Journal of Electronics(China)》 2007年第6期828-834,共7页
A novel text independent speaker identification system is proposed. In the proposed system, the 12-order perceptual linear predictive cepstrum and their delta coefficients in the span of five frames are extracted from... A novel text independent speaker identification system is proposed. In the proposed system, the 12-order perceptual linear predictive cepstrum and their delta coefficients in the span of five frames are extracted from the segmented speech based on the method of pitch synchronous analysis. The Fisher ratios of the original coefficients then be calculated, and the coefficients whose Fisher ratios are bigger are selected to form the 13-dimensional feature vectors of speaker. The Gaussian mixture model is used to model the speakers. The experimental results show that the identification accuracy of the proposed system is obviously better than that of the systems based on other conventional coefficients like the linear predictive cepstral coefficients and the Mel-frequency cepstral coefficients. 展开更多
关键词 speaker identification Perceptual linear predictive Pitch synchronous analysis Fisher criterion
下载PDF
STATISTICAL FEATURE OF PITCH FREQUENCY DISTRIBUTIONS FOR OBUST SPEAKER IDENTIFICATION
6
作者 ZhangLinghua ZhengBaoyu YangZhen 《Journal of Electronics(China)》 2005年第4期437-442,共6页
This letter proposes an effective and robust speech feature extraction method based on statistical analysis of Pitch Frequency Distributions (PFD) for speaker identification. Compared with the conventional cepstrum, P... This letter proposes an effective and robust speech feature extraction method based on statistical analysis of Pitch Frequency Distributions (PFD) for speaker identification. Compared with the conventional cepstrum, PFD is relatively insensitive to Additive White Gaussian Noise (AWGN), but it does not show good performance for speaker identification, even if under clean environments. To compensate this shortcoming, PFD and conventional cepstrum are combined to make the ultimate decision, instead of simply taking one kind of features into account.Experimental results indicate that the hybrid approach can give outstanding improvement for text-independent speaker identification under noisy environments corrupted by AWGN. 展开更多
关键词 speaker identification Feature extraction Pitch frequency Gaussian Mixture Model (GMM)
下载PDF
Improved MFCC-Based Feature for Robust Speaker Identification 被引量:7
7
作者 吴尊敬 曹志刚 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期158-161,共4页
The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically de- grade the perfor... The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically de- grade the performance of recognition systems because of the mismatches between training and testing. In this paper, the logarithmic transformation in the standard MFCC analysis is replaced by a combined function to improve the noisy sensitivity. The proposed feature extraction process is also combined with speech en- hancement methods, such as spectral subtraction and median-filter to further suppress the noise. Experi- ments show that the proposed robust MFCC-based feature significantly reduces the recognition error rate over a wide signal-to-noise ratio range. 展开更多
关键词 Mel-frequency cepstral coefficient (MFCC) robust speaker identification feature extraction
原文传递
Maximum Likelihood A Priori Knowledge Interpolation-Based Handset Mismatch Compensation for Robust Speaker Identification
8
作者 廖元甫 庄智显 杨智合 《Tsinghua Science and Technology》 SCIE EI CAS 2008年第4期528-532,共5页
Unseen handset mismatch is the major source of performance degradation in speaker identification in telecommunication environments. To alleviate the problem, a maximum likelihood a priori knowledge interpolation (ML-... Unseen handset mismatch is the major source of performance degradation in speaker identification in telecommunication environments. To alleviate the problem, a maximum likelihood a priori knowledge interpolation (ML-AKI)-based handset mismatch compensation approach is proposed. It first collects a set of handset characteristics of seen handsets to use as the a priori knowledge for representing the space of handsets. During evaluation the characteristics of an unknown test handset are optimally estimated by interpolation from the set of the a priori knowledge. Experimental results on the HTIMIT database show that the ML-AKI method can improve the average speaker identification rate from 60.0% to 74.6% as compared with conventional maximum a posteriori-adapted Gaussian mixture models. The proposed ML-AKI method is a promising method for robust speaker identification. 展开更多
关键词 robust speaker identification maximum likelihood estimation handset mismatch compensation Gaussian mixture model maximum a posteriori
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部