Speaker adaptive test normalization (ATnorm) is the most effective approach of the widely used score normalization in text-flldependent speaker verification, which selects speaker adaptive impostor cohorts with an e...Speaker adaptive test normalization (ATnorm) is the most effective approach of the widely used score normalization in text-flldependent speaker verification, which selects speaker adaptive impostor cohorts with an extra development corpus in order to enhance the recognition performance. In this paper, an improved implementation of ATnorm that can offer overall significant advantages over the original ATnorm is presented. This method adopts a novel cross similarity measurement in speaker adaptive cohort model selection without an extra development corpus. It can achieve a comparable performance with the original ATnorm and reduce the computation complexity moderately. With the full use of the saved extra development corpus, the overall system performance can be improved significantly. The results are presented on NIST 2006 Speaker Recognition Evaluation data corpora where it is shown that this method provides significant improvements in system performance, with relatively 14.4% gain on equal error rate (EER) and 14.6% gain on decision cost function (DCF) obtained as a whole.展开更多
For text-independent speaker verification, the Gaussian mixture model (GMM) using a universal background model strategy and the GMM using support vector machines are the two most commonly used methodologies. Recentl...For text-independent speaker verification, the Gaussian mixture model (GMM) using a universal background model strategy and the GMM using support vector machines are the two most commonly used methodologies. Recently, a new SVM-based speaker verification method using GMM super vectors has been proposed. This paper describes the construction of a new speaker verification system and investigates the use of nuisance attribute projection and test normalization to further enhance performance. Experiments were conducted on the core test of the 2006 NIST speaker recognition evaluation corpus. The experimental results indicate that an SVM-based speaker verification system using GMM super vectors can achieve appealing performance. With the use of nuisance attribute projection and test normalization, the system performance can be significantly improved, with improvements in the equal error rate from 7.78% to 4.92% and detection cost function from 0.0376 to 0.0251.展开更多
In this paper, a manifold subspace learning algorithm based on locality preserving discriminant projection (LPDP) is used for speaker verification. LPDP can overcome the deficiency of the total variability factor anal...In this paper, a manifold subspace learning algorithm based on locality preserving discriminant projection (LPDP) is used for speaker verification. LPDP can overcome the deficiency of the total variability factor analysis and locality preserving projection (LPP). LPDP can effectively use the speaker label information of speech data. Through optimization, LPDP can maintain the inherent manifold local structure of the speech data samples of the same speaker by reducing the distance between them. At the same time, LPDP can enhance the discriminability of the embedding space by expanding the distance between the speech data samples of different speakers. The proposed method is compared with LPP and total variability factor analysis on the NIST SRE 2010 telephone-telephone core condition. The experimental results indicate that the proposed LPDP can overcome the deficiency of LPP and total variability factor analysis and can further improve the system performance.展开更多
The performance of speaker verification systems is often compromised under real world environments. For example, variations in handset characteristics could cause severe performance degradation. This paper presents a...The performance of speaker verification systems is often compromised under real world environments. For example, variations in handset characteristics could cause severe performance degradation. This paper presents a novel method to overcome this problem by using a non linear handset mapper. Under this method, a mapper is constructed by training an elliptical basis function network using distorted speech features as inputs and the corresponding clean features as the desired outputs. During feature recuperation, clean features are recovered by feeding the distorted features to the feature mapper. The recovered features are then presented to a speaker model as if they were derived from clean speech. Experimental evaluations based on 258 speakers of the TIMIT and NTIMIT corpuses suggest that the feature mappers improve the verification performance remarkably.展开更多
Due to differences in the distribution of scores for different trials, the performance of a speaker verification system will be seriously diminished if raw scores are directly used for detection with a unified thresho...Due to differences in the distribution of scores for different trials, the performance of a speaker verification system will be seriously diminished if raw scores are directly used for detection with a unified threshold value. As such, the scores must be normalized. To tackle the shortcomings of score normalization methods, we propose a speaker verification system based on log-likelihood normalization (LLN). Without a priori knowledge, LLN increases the separation between scores of target and non-target speaker models, so as to improve score aliasing of “same-speaker” and “different-speaker” trials corresponding to the same test speech, enabling better discrimination and decision capability. The experiment shows that LLN is an effective method of scoring normalization.展开更多
In this paper a new text-independent speaker verification method GSMSV is proposed based on likelihood score normalization. In this novel method a global speaker model is established to represent the universal feature...In this paper a new text-independent speaker verification method GSMSV is proposed based on likelihood score normalization. In this novel method a global speaker model is established to represent the universal features of speech and normalize the likelihood score. Statistical analysis demonstrates that this normaliza- tion method can remove common factors of speech and bring the differences between speakers into prominence. As a result the equal error rate is decreased significantly, verification procedure is accelerated and system adaptability to speaking speed is improved.展开更多
In recent years,various speech embedding methods based on deep learning have been proposed and have shown better performance in speaker verification.Those new technologies will inevitably promote the development of fo...In recent years,various speech embedding methods based on deep learning have been proposed and have shown better performance in speaker verification.Those new technologies will inevitably promote the development of forensic speaker verification.We propose a new forensic speaker verification method based on embeddings trained with loss function called generalized end-to-end(GE2E)loss.First,a long short-term memory(LSTM)based deep neural network(DNN)is trained as the embedding extractor,then the cosine similarity scores between embeddings from same speaker comparison pairs and different speaker comparison pairs are trained to represent within-speaker model and between-speaker model respectively,and finally,the cosine similarity scores between the questioned embeddings and enrolled embeddings are evaluated in the above two models to get the likelihood ratio(LR)value.On the subset of LibriSpeech,test-other-500,we achieve a new state of the art.Both all the same speaker comparison pairs and different speaker comparison pairs get correct results and can provide considerable strong evidence strength for courts.展开更多
基金supported by France Telecom Research and Development Center, Beijing
文摘Speaker adaptive test normalization (ATnorm) is the most effective approach of the widely used score normalization in text-flldependent speaker verification, which selects speaker adaptive impostor cohorts with an extra development corpus in order to enhance the recognition performance. In this paper, an improved implementation of ATnorm that can offer overall significant advantages over the original ATnorm is presented. This method adopts a novel cross similarity measurement in speaker adaptive cohort model selection without an extra development corpus. It can achieve a comparable performance with the original ATnorm and reduce the computation complexity moderately. With the full use of the saved extra development corpus, the overall system performance can be improved significantly. The results are presented on NIST 2006 Speaker Recognition Evaluation data corpora where it is shown that this method provides significant improvements in system performance, with relatively 14.4% gain on equal error rate (EER) and 14.6% gain on decision cost function (DCF) obtained as a whole.
文摘For text-independent speaker verification, the Gaussian mixture model (GMM) using a universal background model strategy and the GMM using support vector machines are the two most commonly used methodologies. Recently, a new SVM-based speaker verification method using GMM super vectors has been proposed. This paper describes the construction of a new speaker verification system and investigates the use of nuisance attribute projection and test normalization to further enhance performance. Experiments were conducted on the core test of the 2006 NIST speaker recognition evaluation corpus. The experimental results indicate that an SVM-based speaker verification system using GMM super vectors can achieve appealing performance. With the use of nuisance attribute projection and test normalization, the system performance can be significantly improved, with improvements in the equal error rate from 7.78% to 4.92% and detection cost function from 0.0376 to 0.0251.
文摘In this paper, a manifold subspace learning algorithm based on locality preserving discriminant projection (LPDP) is used for speaker verification. LPDP can overcome the deficiency of the total variability factor analysis and locality preserving projection (LPP). LPDP can effectively use the speaker label information of speech data. Through optimization, LPDP can maintain the inherent manifold local structure of the speech data samples of the same speaker by reducing the distance between them. At the same time, LPDP can enhance the discriminability of the embedding space by expanding the distance between the speech data samples of different speakers. The proposed method is compared with LPP and total variability factor analysis on the NIST SRE 2010 telephone-telephone core condition. The experimental results indicate that the proposed LPDP can overcome the deficiency of LPP and total variability factor analysis and can further improve the system performance.
文摘The performance of speaker verification systems is often compromised under real world environments. For example, variations in handset characteristics could cause severe performance degradation. This paper presents a novel method to overcome this problem by using a non linear handset mapper. Under this method, a mapper is constructed by training an elliptical basis function network using distorted speech features as inputs and the corresponding clean features as the desired outputs. During feature recuperation, clean features are recovered by feeding the distorted features to the feature mapper. The recovered features are then presented to a speaker model as if they were derived from clean speech. Experimental evaluations based on 258 speakers of the TIMIT and NTIMIT corpuses suggest that the feature mappers improve the verification performance remarkably.
文摘Due to differences in the distribution of scores for different trials, the performance of a speaker verification system will be seriously diminished if raw scores are directly used for detection with a unified threshold value. As such, the scores must be normalized. To tackle the shortcomings of score normalization methods, we propose a speaker verification system based on log-likelihood normalization (LLN). Without a priori knowledge, LLN increases the separation between scores of target and non-target speaker models, so as to improve score aliasing of “same-speaker” and “different-speaker” trials corresponding to the same test speech, enabling better discrimination and decision capability. The experiment shows that LLN is an effective method of scoring normalization.
基金the National Natural Science Foundation of China.
文摘In this paper a new text-independent speaker verification method GSMSV is proposed based on likelihood score normalization. In this novel method a global speaker model is established to represent the universal features of speech and normalize the likelihood score. Statistical analysis demonstrates that this normaliza- tion method can remove common factors of speech and bring the differences between speakers into prominence. As a result the equal error rate is decreased significantly, verification procedure is accelerated and system adaptability to speaking speed is improved.
基金Supported by the National Key Research and Development Projects(2017YFC0821000)Guangzhou Science and Technology Project(2019030004)Key Lab of Forensic Science,Ministry of Justice,China(KF202117)。
文摘In recent years,various speech embedding methods based on deep learning have been proposed and have shown better performance in speaker verification.Those new technologies will inevitably promote the development of forensic speaker verification.We propose a new forensic speaker verification method based on embeddings trained with loss function called generalized end-to-end(GE2E)loss.First,a long short-term memory(LSTM)based deep neural network(DNN)is trained as the embedding extractor,then the cosine similarity scores between embeddings from same speaker comparison pairs and different speaker comparison pairs are trained to represent within-speaker model and between-speaker model respectively,and finally,the cosine similarity scores between the questioned embeddings and enrolled embeddings are evaluated in the above two models to get the likelihood ratio(LR)value.On the subset of LibriSpeech,test-other-500,we achieve a new state of the art.Both all the same speaker comparison pairs and different speaker comparison pairs get correct results and can provide considerable strong evidence strength for courts.