期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
Rethinking multi-spatial information for transferable adversarial attacks on speaker recognition systems
1
作者 Junjian Zhang Hao Tan +2 位作者 Le Wang Yaguan Qian Zhaoquan Gu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第3期620-631,共12页
Adversarial attacks have been posing significant security concerns to intelligent systems,such as speaker recognition systems(SRSs).Most attacks assume the neural networks in the systems are known beforehand,while bla... Adversarial attacks have been posing significant security concerns to intelligent systems,such as speaker recognition systems(SRSs).Most attacks assume the neural networks in the systems are known beforehand,while black-box attacks are proposed without such information to meet practical situations.Existing black-box attacks improve trans-ferability by integrating multiple models or training on multiple datasets,but these methods are costly.Motivated by the optimisation strategy with spatial information on the perturbed paths and samples,we propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method(DS-MI-FGSM)to improve the transferability of black-box at-tacks against SRSs.Specifically,DS-MI-FGSM only needs a single data and one model as the input;by extending to the data and model neighbouring spaces,it generates adver-sarial examples against the integrating models.To reduce the risk of overfitting,DS-MI-FGSM also introduces gradient masking to improve transferability.The authors conduct extensive experiments regarding the speaker recognition task,and the results demonstrate the effectiveness of their method,which can achieve up to 92%attack success rate on the victim model in black-box scenarios with only one known model. 展开更多
关键词 speaker recognition spoofing attacks
下载PDF
Voice Response Questionnaire System for Speaker Recognition Using Biometric Authentication Interface
2
作者 Chang-Yi Kao Hao-En Chueh 《Intelligent Automation & Soft Computing》 SCIE 2023年第1期913-924,共12页
The use of voice to perform biometric authentication is an importanttechnological development,because it is a non-invasive identification methodand does not require special hardware,so it is less likely to arouse user... The use of voice to perform biometric authentication is an importanttechnological development,because it is a non-invasive identification methodand does not require special hardware,so it is less likely to arouse user disgust.This study tries to apply the voice recognition technology to the speech-driveninteractive voice response questionnaire system aiming to upgrade the traditionalspeech system to an intelligent voice response questionnaire network so that thenew device may offer enterprises more precise data for customer relationshipmanagement(CRM).The intelligence-type voice response gadget is becominga new mobile channel at the current time,with functions of the questionnaireto be built in for the convenience of collecting information on local preferencesthat can be used for localized promotion and publicity.Authors of this study propose a framework using voice recognition and intelligent analysis models to identify target customers through voice messages gathered in the voice response questionnaire system;that is,transforming the traditional speech system to anintelligent voice complex.The speaker recognition system discussed hereemploys volume as the acoustic feature in endpoint detection as the computationload is usually low in this method.To correct two types of errors found in the endpoint detection practice because of ambient noise,this study suggests ways toimprove the situation.First,to reach high accuracy,this study follows a dynamictime warping(DTW)based method to gain speaker identification.Second,it isdevoted to avoiding any errors in endpoint detection by filtering noise from voicesignals before getting recognition and deleting any test utterances that might negatively affect the results of recognition.It is hoped that by so doing the recognitionrate is improved.According to the experimental results,the method proposed inthis research has a high recognition rate,whether it is on personal-level or industrial-level computers,and can reach the practical application standard.Therefore,the voice management system in this research can be regarded as Virtual customerservice staff to use. 展开更多
关键词 Biometric authentication customer relationship management speaker recognition QUESTIONNAIRE
下载PDF
Emotional speaker recognition based on prosody transformation 被引量:1
3
作者 宋鹏 赵力 邹采荣 《Journal of Southeast University(English Edition)》 EI CAS 2011年第4期357-360,共4页
A novel emotional speaker recognition system (ESRS) is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech.... A novel emotional speaker recognition system (ESRS) is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech. Then, the recognized emotion speech is adjusted by prosody modification. Different methods including Gaussian normalization, the Gaussian mixture model (GMM) and support vector regression (SVR) are adopted to define the mapping rules of F0s between emotional and neutral speech, and the average linear ratio is used for the duration modification. Finally, the modified emotional speech is employed for the speaker recognition. The experimental results show that the proposed ESRS can significantly improve the performance of emotional speaker recognition, and the identification rate (IR) is higher than that of the traditional recognition system. The emotional speech with F0 and duration modifications is closer to the neutral one. 展开更多
关键词 emotion recognition speaker recognition F0 transformation duration modification
下载PDF
Performance of Text-Independent Automatic Speaker Recognition on a Multicore System
4
作者 Rand Kouatly Talha Ali Khan 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第2期447-456,共10页
This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of t... This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing procedures.Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm.The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ(2.3 GHz,four cores without hyper-threading,and 8 GB of RAM).In addition,a remarkable 100%speaker recognition accuracy is achieved. 展开更多
关键词 Automatic speaker recognition(ASR) Gaussian Mixture Model(GMM) shared memory parallel programming PThreads OPENMP
原文传递
Adaptive bands filter bank optimized by genetic algorithm for robust speech recognition system 被引量:5
5
作者 黄丽霞 G.Evangelista 张雪英 《Journal of Central South University》 SCIE EI CAS 2011年第5期1595-1601,共7页
Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher acc... Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results. 展开更多
关键词 perceptual filter banks bark scale speaker independent speech recognition systems zero-crossing peak amplitude genetic algorithm
下载PDF
Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning 被引量:1
6
作者 U˘gur Ayvaz Hüseyin Gürüler +3 位作者 Faheem Khan Naveed Ahmed Taegkeun Whangbo Abdusalomov Akmalbek Bobomirzaevich 《Computers, Materials & Continua》 SCIE EI 2022年第6期5511-5521,共11页
Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the mo... Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients(MFCCs).Recent researches show that MFCCs are successful in processing the voice signal with high accuracies.MFCCs represents a sequence of voice signal-specific features.This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings.Since the human perception of sound is not linear,after the filterbank step in theMFCC method,we converted the obtained log filterbanks into decibel(dB)features-based spectrograms without applying the Discrete Cosine Transform(DCT).A new dataset was created with converted spectrogram into a 2-D array.Several learning algorithms were implementedwith a 10-fold cross-validationmethod to detect the speaker.The highest accuracy of 90.2%was achieved using Multi-layer Perceptron(MLP)with tanh activation function.The most important output of this study is the inclusion of human voice as a new feature set. 展开更多
关键词 Automatic speaker recognition human voice recognition spatial pattern recognition MFCCs SPECTROGRAM machine learning artificial intelligence
下载PDF
Comparative Study on VQ-Based Efficient Mandarin Speech Recognition Method
7
作者 谢湘 赵军辉 匡镜明 《Journal of Beijing Institute of Technology》 EI CAS 2002年第3期266-270,共5页
A VQ based efficient speech recognition method is introduced, and the key parameters of this method are comparatively studied. This method is especially designed for mandarin speaker dependent small size word set r... A VQ based efficient speech recognition method is introduced, and the key parameters of this method are comparatively studied. This method is especially designed for mandarin speaker dependent small size word set recognition. It has less complexity, less resource consumption but higher ARR (accurate recognition rate) compared with traditional HMM or NN approach. A large scale test on the task of 11 mandarin digits recognition shows that the WER(word error rate) can reach 3 86%. This method is suitable for being embedded in PDA (personal digital assistant), mobile phone and so on to perform voice controlling like digits dialing, name dialing, calculating, voice commanding, etc. 展开更多
关键词 speech recognition vector quantization(VQ) speaker dependent digits recognition
下载PDF
Novel pseudo-divergence of Gaussian mixture models based speaker clustering method
8
作者 Wang Bo Xu Yiqiong 《仪器仪表学报》 EI CAS CSCD 北大核心 2006年第z1期712-714,732,共4页
Serial structure is applied to speaker recognition to reduce the algorithm delay and computational complexity.The speech is first classified into speaker class,and then searches the most likely one inside the class.Di... Serial structure is applied to speaker recognition to reduce the algorithm delay and computational complexity.The speech is first classified into speaker class,and then searches the most likely one inside the class.Difference between Gaussian Mixture Models(GMMs) is widely applied in speaker classification.The paper proposes a novel mean of pseudo-divergence,the ratio of Inter-Model dispersion to Intra-Model dispersion,to present the difference between GMMs,to perform speaker cluster.Weight,mean and variance,GMM’s components,are involved in the dispersion.Experiments indicate that the measurement can well present the difference of GMMs and has improved performance of speaker clustering. 展开更多
关键词 Serial structure speaker recognition Pseudo-divergence GMMs
下载PDF
Fractal Dimension of Voice-Signal Waveforms 被引量:3
9
作者 Xie Yu qiong, Wen Zhi xiong Non linear Science Center , Wuhan University,Wuhan 430072,Hubei,China 《Wuhan University Journal of Natural Sciences》 CAS 2002年第4期399-402,共4页
The fractal dimension is one important parameter that characterizes waveforms. In this paper, we derive a new method to calculate fractal dimension of digital voice-signal waveforms. We show that fractal dimension is ... The fractal dimension is one important parameter that characterizes waveforms. In this paper, we derive a new method to calculate fractal dimension of digital voice-signal waveforms. We show that fractal dimension is an efficient tool for speaker recognition or speech recognition. It can be used to identify different speakers or distinguish speech. We apply our results to Chinese speaker recognition and numerical experiment shows that fractal dimension is an efficient parameter to characterize individual Chinese speakers. We have developed a semiautomatic voiceprint analysis system based on the theory of this paper and former researches. 展开更多
关键词 Key words fractal dimension voiceprint analysis speaker recognition speech recognition biometric authentication
下载PDF
Forensic Automatic Speaker Recognition Based on Likelihood Ratio Using Acoustic-phonetic Features Measured Automatically 被引量:4
10
作者 Huapeng Wang Cuiling Zhang 《Journal of Forensic Science and Medicine》 2015年第2期119-123,共5页
Forensic speaker recognition is experiencing a remarkable paradigm shift in terms of the evaluation framework and presentation of voice evidence.This paper proposes a new method of forensic automatic speaker recogniti... Forensic speaker recognition is experiencing a remarkable paradigm shift in terms of the evaluation framework and presentation of voice evidence.This paper proposes a new method of forensic automatic speaker recognition using the likelihood ratio framework to quantify the strength of voice evidence.The proposed method uses a reference database to calculate the within-and between-speaker variability.Some acoustic-phonetic features are extracted automatically using the software VbiceSauce.The effectiveness of the approach was tested using two Mandarin databases:A mobile telephone database and a landline database.The experimenfs results indicate that these acoustic-phonetic features do have some discriminating potential and are worth trying in discrimination.The automatic acoustic-phonetic features have acceptable discriminative performance and can provide more reliable results in evidence analysis when fused with other kind of voice features. 展开更多
关键词 Acoustic-phonetic speaker recognition evidence evaluation forensic speaker recognition likelihood ratio
原文传递
Histogram equalization using a reduced feature set of background speakers' utterances for speaker recognition 被引量:1
11
作者 Myung-jae KIM Il-ho YANG +1 位作者 Min-seok KIM Ha-jin YU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第5期738-750,共13页
We propose a method for histogram equalization using supplement sets to improve the performance of speaker recognition when the training and test utterances are very short. The supplement sets are derived using output... We propose a method for histogram equalization using supplement sets to improve the performance of speaker recognition when the training and test utterances are very short. The supplement sets are derived using outputs of selection or clustering algorithms from the background speakers' utterances. The proposed approach is used as a feature normalization method for building histograms when there are insufficient input utterance samples.In addition, the proposed method is used as an i-vector normalization method in an i-vector-based probabilistic linear discriminant analysis(PLDA) system, which is the current state-of-the-art for speaker verification. The ranks of sample values for histogram equalization are estimated in ascending order from both the input utterances and the supplement set. New ranks are obtained by computing the sum of different kinds of ranks. Subsequently, the proposed method determines the cumulative distribution function of the test utterance using the newly defined ranks. The proposed method is compared with conventional feature normalization methods, such as cepstral mean normalization(CMN), cepstral mean and variance normalization(MVN), histogram equalization(HEQ), and the European Telecommunications Standards Institute(ETSI) advanced front-end methods. In addition, performance is compared for a case in which the greedy selection algorithm is used with fuzzy C-means and K-means algorithms.The YOHO and Electronics and Telecommunications Research Institute(ETRI) databases are used in an evaluation in the feature space. The test sets are simulated by the Opus Vo IP codec. We also use the 2008 National Institute of Standards and Technology(NIST) speaker recognition evaluation(SRE) corpus for the i-vector system. The results of the experimental evaluation demonstrate that the average system performance is improved when the proposed method is used, compared to the conventional feature normalization methods. 展开更多
关键词 speaker recognition Histogram equalization i-vector
原文传递
Mismatched feature detection with finer granularity for emotional speaker recognition 被引量:1
12
作者 Li CHEN Ying-chun YANG Zhao-hui WU 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2014年第10期903-916,共14页
The shapes of speakers' vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby t... The shapes of speakers' vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition—phoneme classes, Gaussian mixture model(GMM) tokenizer, and probabilistic GMM tokenizer—are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus(MASC) show that our feature pruning and feature regulation methods increase the identification rate(IR) by 3.64% and 6.77%, compared with the baseline GMM-UBM(universal background model) algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector. 展开更多
关键词 Emotional speaker recognition Mismatched feature detection Feature regulation
原文传递
Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization
13
作者 吴强 张丽清 石光川 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第4期783-792,共10页
How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method ... How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method called constrained Nonnegative Tensor Factorization (cNTF). A novel feature extraction framework based on the cortical representation in primary auditory cortex (A1) is proposed for robust speaker recognition. Motivated by the neural firing rates model in A1, the speech signal first is represented as a general higher order tensor, cNTF is used to learn the basis functions from multiple interrelated feature subspaces and find a robust sparse representation for speech signal. Computer simulations are given to evaluate the performance of our method and comparisons with existing speaker recognition methods are also provided. The experimental results demonstrate that the proposed method achieves higher recognition accuracy in noisy environment. 展开更多
关键词 pattern recognition speaker recognition nonnegative tensor factorization feature extraction auditory perception
原文传递
Advances in SVM-Based System Using GMM Super Vectors for Text-Independent Speaker Verification
14
作者 赵剑 董远 +3 位作者 赵贤宇 杨浩 陆亮 王海拉 《Tsinghua Science and Technology》 SCIE EI CAS 2008年第4期522-527,共6页
For text-independent speaker verification, the Gaussian mixture model (GMM) using a universal background model strategy and the GMM using support vector machines are the two most commonly used methodologies. Recentl... For text-independent speaker verification, the Gaussian mixture model (GMM) using a universal background model strategy and the GMM using support vector machines are the two most commonly used methodologies. Recently, a new SVM-based speaker verification method using GMM super vectors has been proposed. This paper describes the construction of a new speaker verification system and investigates the use of nuisance attribute projection and test normalization to further enhance performance. Experiments were conducted on the core test of the 2006 NIST speaker recognition evaluation corpus. The experimental results indicate that an SVM-based speaker verification system using GMM super vectors can achieve appealing performance. With the use of nuisance attribute projection and test normalization, the system performance can be significantly improved, with improvements in the equal error rate from 7.78% to 4.92% and detection cost function from 0.0376 to 0.0251. 展开更多
关键词 support vector machines Gaussian mixture model super vector nuisance attribute projection test normalization speaker verification NIST 06 speaker recognition evaluation
原文传递
Latent discriminative representation learning for speaker recognition
15
作者 Duolin HUANG Qirong MAO +3 位作者 Zhongchen MA Zhishen ZHENG Sidheswar ROUTRYAR Elias-Nii-Noi OCQUAYE 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2021年第5期697-708,共12页
Extracting discriminative speaker-specific representations from speech signals and transforming them into fixed length vectors are key steps in speaker identification and verification systems.In this study,we propose ... Extracting discriminative speaker-specific representations from speech signals and transforming them into fixed length vectors are key steps in speaker identification and verification systems.In this study,we propose a latent discriminative representation learning method for speaker recognition.We mean that the learned representations in this study are not only discriminative but also relevant.Specifically,we introduce an additional speaker embedded lookup table to explore the relevance between different utterances from the same speaker.Moreover,a reconstruction constraint intended to learn a linear mapping matrix is introduced to make representation discriminative.Experimental results demonstrate that the proposed method outperforms state-of-the-art methods based on the Apollo dataset used in the Fearless Steps Challenge in INTERSPEECH2019 and the TIMIT dataset. 展开更多
关键词 speaker recognition Latent discriminative representation learning speaker embedding lookup table Linear mapping matrix
原文传递
An Empirical Study of Exploring Nonphonetic Forensic Speaker Recognition Features
16
作者 Xin Guan 《Journal of Forensic Science and Medicine》 2018年第3期142-149,共8页
So far, phonetic features have been the main type of forensic speaker recognition features studied and used in practice. One problem with phonetic forensic speaker recognition features is that they are affected dramat... So far, phonetic features have been the main type of forensic speaker recognition features studied and used in practice. One problem with phonetic forensic speaker recognition features is that they are affected dramatically by the real-world conditions, which results in within-speaker variations and consequently reduces the reliability of forensic speaker cognition results. In this context, supported by Sapir’s description of the structure of speech behavior and discourse information theory, natural conversations are adopted as experiment materials to explore nonphonetic featuresthat are supposed to be less affected by real‑world conditions. The results of experimentsshow that first there exist nonphonetic featuresbesides phonetic features, and what’s more, the nonphonetic features are less affected by real-world conditions as expected. 展开更多
关键词 Forensic speaker recognition natural conversations nonphonetic real-world conditions
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部