Adversarial attacks have been posing significant security concerns to intelligent systems,such as speaker recognition systems(SRSs).Most attacks assume the neural networks in the systems are known beforehand,while bla...Adversarial attacks have been posing significant security concerns to intelligent systems,such as speaker recognition systems(SRSs).Most attacks assume the neural networks in the systems are known beforehand,while black-box attacks are proposed without such information to meet practical situations.Existing black-box attacks improve trans-ferability by integrating multiple models or training on multiple datasets,but these methods are costly.Motivated by the optimisation strategy with spatial information on the perturbed paths and samples,we propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method(DS-MI-FGSM)to improve the transferability of black-box at-tacks against SRSs.Specifically,DS-MI-FGSM only needs a single data and one model as the input;by extending to the data and model neighbouring spaces,it generates adver-sarial examples against the integrating models.To reduce the risk of overfitting,DS-MI-FGSM also introduces gradient masking to improve transferability.The authors conduct extensive experiments regarding the speaker recognition task,and the results demonstrate the effectiveness of their method,which can achieve up to 92%attack success rate on the victim model in black-box scenarios with only one known model.展开更多
Most current security and authentication systems are based on personal biometrics.The security problem is a major issue in the field of biometric systems.This is due to the use in databases of the original biometrics....Most current security and authentication systems are based on personal biometrics.The security problem is a major issue in the field of biometric systems.This is due to the use in databases of the original biometrics.Then biometrics will forever be lost if these databases are attacked.Protecting privacy is the most important goal of cancelable biometrics.In order to protect privacy,therefore,cancelable biometrics should be non-invertible in such a way that no information can be inverted from the cancelable biometric templates stored in personal identification/verification databases.One methodology to achieve non-invertibility is the employment of non-invertible transforms.This work suggests an encryption process for cancellable speaker identification using a hybrid encryption system.This system includes the 3D Jigsaw transforms and Fractional Fourier Transform(FrFT).The proposed scheme is compared with the optical Double Random Phase Encoding(DRPE)encryption process.The evaluation of simulation results of cancellable biometrics shows that the algorithm proposed is secure,authoritative,and feasible.The encryption and cancelability effects are good and reveal good performance.Also,it introduces recommended security and robustness levels for its utilization for achieving efficient cancellable biometrics systems.展开更多
The use of voice to perform biometric authentication is an importanttechnological development,because it is a non-invasive identification methodand does not require special hardware,so it is less likely to arouse user...The use of voice to perform biometric authentication is an importanttechnological development,because it is a non-invasive identification methodand does not require special hardware,so it is less likely to arouse user disgust.This study tries to apply the voice recognition technology to the speech-driveninteractive voice response questionnaire system aiming to upgrade the traditionalspeech system to an intelligent voice response questionnaire network so that thenew device may offer enterprises more precise data for customer relationshipmanagement(CRM).The intelligence-type voice response gadget is becominga new mobile channel at the current time,with functions of the questionnaireto be built in for the convenience of collecting information on local preferencesthat can be used for localized promotion and publicity.Authors of this study propose a framework using voice recognition and intelligent analysis models to identify target customers through voice messages gathered in the voice response questionnaire system;that is,transforming the traditional speech system to anintelligent voice complex.The speaker recognition system discussed hereemploys volume as the acoustic feature in endpoint detection as the computationload is usually low in this method.To correct two types of errors found in the endpoint detection practice because of ambient noise,this study suggests ways toimprove the situation.First,to reach high accuracy,this study follows a dynamictime warping(DTW)based method to gain speaker identification.Second,it isdevoted to avoiding any errors in endpoint detection by filtering noise from voicesignals before getting recognition and deleting any test utterances that might negatively affect the results of recognition.It is hoped that by so doing the recognitionrate is improved.According to the experimental results,the method proposed inthis research has a high recognition rate,whether it is on personal-level or industrial-level computers,and can reach the practical application standard.Therefore,the voice management system in this research can be regarded as Virtual customerservice staff to use.展开更多
In audio stream containing multiple speakers, speaker diarization aids in ascertaining "who speak when". This is an unsupervised task as there is no prior information about the speakers. It labels the speech...In audio stream containing multiple speakers, speaker diarization aids in ascertaining "who speak when". This is an unsupervised task as there is no prior information about the speakers. It labels the speech signal conforming to the identity of the speaker, namely, input audio stream is partitioned into homogeneous segments. In this work, we present a novel speaker diarization system using the Tangent weighted Mel frequency cepstral coefficient(TMFCC) as the feature parameter and Lion algorithm for the clustering of the voice activity detected audio streams into particular speaker groups. Thus the two main tasks of the speaker indexing, i.e., speaker segmentation and speaker clustering, are improved. The TMFCC makes use of the low energy frame as well as the high energy frame with more effect, improving the performance of the proposed system. The experiments using the audio signal from the ELSDSR corpus datasets having three speakers, four speakers and five speakers are analyzed for the proposed system. The evaluation of the proposed speaker diarization system based on the tracking distance, tracking time as the evaluation metrics is done and the experimental results show that the speaker diarization system with the TMFCC parameterization and Lion based clustering is found to be superior over existing diarization systems with 95% tracking accuracy.展开更多
The aim of this paper is to show the accuracy and time results of a text independent automatic speaker recognition (ASR) system, based on Mel-Frequency Cepstrum Coefficients (MFCC) and Gaussian Mixture Models (GMM), i...The aim of this paper is to show the accuracy and time results of a text independent automatic speaker recognition (ASR) system, based on Mel-Frequency Cepstrum Coefficients (MFCC) and Gaussian Mixture Models (GMM), in order to develop a security control access gate. 450 speakers were randomly extracted from the Voxforge.org audio database, their utterances have been improved using spectral subtraction, then MFCC were extracted and these coefficients were statistically analyzed by GMM in order to build each profile. For each speaker two different speech files were used: the first one to build the profile database, the second one to test the system performance. The accuracy achieved by the proposed approach is greater than 96% and the time spent for a single test run, implemented in Matlab language, is about 2 seconds on a common PC.展开更多
Emotion mismatch between training and testing will cause system performance decline sharply which is emotional speaker recognition. It is an important idea to solve this problem according to the emotion normalization ...Emotion mismatch between training and testing will cause system performance decline sharply which is emotional speaker recognition. It is an important idea to solve this problem according to the emotion normalization of test speech. This method proceeds from analysis of the differences between every kind of emotional speech and neutral speech. Besides, it takes the baseband mismatch of emotional changes as the main line. At the same time, it gives the corresponding algorithm according to four technical points which are emotional expansion, emotional shield, emotional normalization and score compensation. Compared with the traditional GMM-UBM method, the recognition rate in MASC corpus and EPST corpus was increased by 3.80% and 8.81% respectively.展开更多
This paper lies in the field of digital signal processing.This is a speech recognition system that identifies the different speakers based on deep learning.The invention consists of the following steps:Firstly,we coll...This paper lies in the field of digital signal processing.This is a speech recognition system that identifies the different speakers based on deep learning.The invention consists of the following steps:Firstly,we collect the voice data from different people.Secondly,the data having been selected is preprocessed by extracting their Mel Frequency Cepstral Coefficients(MFCC)and is divided into training set and test set randomly.Thirdly,we cut the training set into batches,and put them into the convolutional neural network which consists of convolutional layers,max pooling layers and fully connected layers.After repeatedly adjusting the parameters of the network such as learning rate,dropout rate and decay rate,the model will reach the optimal performance.Finally,the testing set is also cut into batches and put into the trained neural network.The final recognition accuracy rate is 70.23%.In brief,the research can automatically recognize different speakers efficiently.展开更多
Automatic Speaker Identification(ASI)involves the process of distinguishing an audio stream associated with numerous speakers’utterances.Some common aspects,such as the framework difference,overlapping of different s...Automatic Speaker Identification(ASI)involves the process of distinguishing an audio stream associated with numerous speakers’utterances.Some common aspects,such as the framework difference,overlapping of different sound events,and the presence of various sound sources during recording,make the ASI task much more complicated and complex.This research proposes a deep learning model to improve the accuracy of the ASI system and reduce the model training time under limited computation resources.In this research,the performance of the transformer model is investigated.Seven audio features,chromagram,Mel-spectrogram,tonnetz,Mel-Frequency Cepstral Coefficients(MFCCs),delta MFCCs,delta-delta MFCCs and spectral contrast,are extracted from the ELSDSR,CSTRVCTK,and Ar-DAD,datasets.The evaluation of various experiments demonstrates that the best performance was achieved by the proposed transformer model using seven audio features on all datasets.For ELSDSR,CSTRVCTK,and Ar-DAD,the highest attained accuracies are 0.99,0.97,and 0.99,respectively.The experimental results reveal that the proposed technique can achieve the best performance for ASI problems.展开更多
Previous studies have investigated the efficiency in teaching listener and speaker repertoires in children diagnosed with autism spectrum disorder(ASD).Some investigations focused on listener responding by function,fe...Previous studies have investigated the efficiency in teaching listener and speaker repertoires in children diagnosed with autism spectrum disorder(ASD).Some investigations focused on listener responding by function,feature,and class(LRFFC)and intraverbal by function,feature,and class(FFC).For some children,teaching intraverbal FFC was more efficient because it resulted in a better emergence effect of a related untaught repertoire(LRFFC).For other children,teaching LRFFC along with tacting pictures was more efficient,resulting in a better emergence effect of a related untaught repertoire(intraverbal FFC).In these cases,it is not clear whether the tact increased the efficiency of LRFFC training because a comparison with a condition in which tacts were not required was not conducted.This investigation consisted of a replication with two children diagnosed with ASD.Three instructional sequences were compared:teaching LRFFC-probing intraverbal;teaching LRFFC+tacts-probing intraverbal;teaching intraverbal-probing LRFFC.For a child,all sequences were equally efficient because all related untaught repertoires emerged without errors.However,the acquisition of intraverbals during training occurred with variability.In the case of the second child,the most efficient sequence consisted of teaching intraverbals,resulting in the emergence of LRFFC without errors.In both cases of teaching LRFFC,the emergence of related intraverbals was partial and acquisition of the trained repertoires occurred with variability.The case that did not demand tact responses was slightly more efficient.Data were discussed in the sense that the best instructional sequence may vary from learner to learner.展开更多
A novel emotional speaker recognition system (ESRS) is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech....A novel emotional speaker recognition system (ESRS) is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech. Then, the recognized emotion speech is adjusted by prosody modification. Different methods including Gaussian normalization, the Gaussian mixture model (GMM) and support vector regression (SVR) are adopted to define the mapping rules of F0s between emotional and neutral speech, and the average linear ratio is used for the duration modification. Finally, the modified emotional speech is employed for the speaker recognition. The experimental results show that the proposed ESRS can significantly improve the performance of emotional speaker recognition, and the identification rate (IR) is higher than that of the traditional recognition system. The emotional speech with F0 and duration modifications is closer to the neutral one.展开更多
On December 9,2023,I was privileged to be honored and participate in the Dr.Chi Chao Chan Symposium on Global Collaboration of Eye Research as the Global Eye Genetic Consortium(GEGC)session,which was held in the 16th ...On December 9,2023,I was privileged to be honored and participate in the Dr.Chi Chao Chan Symposium on Global Collaboration of Eye Research as the Global Eye Genetic Consortium(GEGC)session,which was held in the 16th Congress of the Asia-Pacific Vitreo-Retina Society(APVRS)in Hong Kong.Along with my talk on“Global collaboration of eye research:personal experience”,other prominent international speakers provided their own perspectives on opportunities for networking,collaboration,and exchange of ideas with global leaders and experts in ophthalmic practice,research,and education.展开更多
This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of t...This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing procedures.Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm.The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ(2.3 GHz,four cores without hyper-threading,and 8 GB of RAM).In addition,a remarkable 100%speaker recognition accuracy is achieved.展开更多
This week promises a rich variety of thought-provoking sessions,covering emerging trends,global challenges and future opportunities.Inspiring speakers from across business,government,public,and private sectors spark p...This week promises a rich variety of thought-provoking sessions,covering emerging trends,global challenges and future opportunities.Inspiring speakers from across business,government,public,and private sectors spark passionate discussions and debates.I hope that like me,you are also excited at the prospect of encountering fresh ideas,opinions,and perspectives,and are ready to consider the role of standards in achieving the 2030 Agenda for Sustainable Development.展开更多
Chinese Telecom Companies Eye African Market Chinese telecom companies,showcasing innovative technologies and products at the Africa Tech Festival 2024 in South Africa,have set ambitious goals to expand their footprin...Chinese Telecom Companies Eye African Market Chinese telecom companies,showcasing innovative technologies and products at the Africa Tech Festival 2024 in South Africa,have set ambitious goals to expand their footprint in the rapidly growing African market.Held from 12 to 14 November at the Cape Town International Convention Centre in the country’s legislative capital,the festival-Africa’s largest and most influential telecom and technology event-attracted 15,000 attendees,over 300 exhibitors,and 450 speakers.展开更多
1 Quechua is a language family with nine variants(变体),numbering more than 10 million speakers in some countries:Chile,Ecuador,Bolivia,Colombia,Argentina and Peru.To understand the origins of the Quechua language,we ...1 Quechua is a language family with nine variants(变体),numbering more than 10 million speakers in some countries:Chile,Ecuador,Bolivia,Colombia,Argentina and Peru.To understand the origins of the Quechua language,we have to go back in time to a territory currently in Peru and Ecuador known as Chinchay Suyu.展开更多
This paper attempts to argue that in the age of‘World Englishes', it is not necessary to differentiate native speaker teachers from non-native speaker teachers. It is concluded that non-native speaker teachers ca...This paper attempts to argue that in the age of‘World Englishes', it is not necessary to differentiate native speaker teachers from non-native speaker teachers. It is concluded that non-native speaker teachers can be as effective as their native colleagues and they have equal chance to achieve professional success, even though native speaker teachers have great advantages over non-native teachers in some aspects. It is time for employers, as well as ELT professionals to shut their eyes to the glaring differences between native speaker teachers and non-native speaker teachers and optimize such unique resources.展开更多
The target of much language teaching and learning is to make students approximate to native speakers.The only rightful speak ers of a language are its native speakers.Contrary to these contemporary views,however,this ...The target of much language teaching and learning is to make students approximate to native speakers.The only rightful speak ers of a language are its native speakers.Contrary to these contemporary views,however,this paper argues that the obligation of the lan guage teacher is to help students to use L2 effectively not to simply imitate native speaker.A successful L2 user who comes from the group of L2 learners can be a model for students.Therefore,non-native teachers with a high degree of language proficiency and good teaching skills can be ideal and qualified language teachers.展开更多
Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher acc...Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results.展开更多
This paper reports on part of the findings of a large-scale study exploring the viewpoints of Chinese ELT stakeholders(students,teachers and administrators)on native speakerism in order to find out whether current EFL...This paper reports on part of the findings of a large-scale study exploring the viewpoints of Chinese ELT stakeholders(students,teachers and administrators)on native speakerism in order to find out whether current EFL education in China is still affected by this chauvinistic ideology.The analysis of data via a critical lens reveals that the vast majority of the participants conferred upon NS products(teacher,language,culture and teaching methodology)a status superior to that granted to the NNS counterparts and failed to see linguacultural and epistemological inequalities between the English speaking West and traditional NNS countries,inter alia,China.These findings suggest that the three participant groups as an entirety succumb to native speakerism,and by extension that ELT in China is still haunted to a great degree by this ideology.Given that this study treats each participant group separately,future studies are expected to explore inter-group interactions in ideology.展开更多
基金The Major Key Project of PCL,Grant/Award Number:PCL2022A03National Natural Science Foundation of China,Grant/Award Numbers:61976064,62372137Zhejiang Provincial Natural Science Foundation of China,Grant/Award Number:LZ22F020007。
文摘Adversarial attacks have been posing significant security concerns to intelligent systems,such as speaker recognition systems(SRSs).Most attacks assume the neural networks in the systems are known beforehand,while black-box attacks are proposed without such information to meet practical situations.Existing black-box attacks improve trans-ferability by integrating multiple models or training on multiple datasets,but these methods are costly.Motivated by the optimisation strategy with spatial information on the perturbed paths and samples,we propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method(DS-MI-FGSM)to improve the transferability of black-box at-tacks against SRSs.Specifically,DS-MI-FGSM only needs a single data and one model as the input;by extending to the data and model neighbouring spaces,it generates adver-sarial examples against the integrating models.To reduce the risk of overfitting,DS-MI-FGSM also introduces gradient masking to improve transferability.The authors conduct extensive experiments regarding the speaker recognition task,and the results demonstrate the effectiveness of their method,which can achieve up to 92%attack success rate on the victim model in black-box scenarios with only one known model.
文摘Most current security and authentication systems are based on personal biometrics.The security problem is a major issue in the field of biometric systems.This is due to the use in databases of the original biometrics.Then biometrics will forever be lost if these databases are attacked.Protecting privacy is the most important goal of cancelable biometrics.In order to protect privacy,therefore,cancelable biometrics should be non-invertible in such a way that no information can be inverted from the cancelable biometric templates stored in personal identification/verification databases.One methodology to achieve non-invertibility is the employment of non-invertible transforms.This work suggests an encryption process for cancellable speaker identification using a hybrid encryption system.This system includes the 3D Jigsaw transforms and Fractional Fourier Transform(FrFT).The proposed scheme is compared with the optical Double Random Phase Encoding(DRPE)encryption process.The evaluation of simulation results of cancellable biometrics shows that the algorithm proposed is secure,authoritative,and feasible.The encryption and cancelability effects are good and reveal good performance.Also,it introduces recommended security and robustness levels for its utilization for achieving efficient cancellable biometrics systems.
文摘The use of voice to perform biometric authentication is an importanttechnological development,because it is a non-invasive identification methodand does not require special hardware,so it is less likely to arouse user disgust.This study tries to apply the voice recognition technology to the speech-driveninteractive voice response questionnaire system aiming to upgrade the traditionalspeech system to an intelligent voice response questionnaire network so that thenew device may offer enterprises more precise data for customer relationshipmanagement(CRM).The intelligence-type voice response gadget is becominga new mobile channel at the current time,with functions of the questionnaireto be built in for the convenience of collecting information on local preferencesthat can be used for localized promotion and publicity.Authors of this study propose a framework using voice recognition and intelligent analysis models to identify target customers through voice messages gathered in the voice response questionnaire system;that is,transforming the traditional speech system to anintelligent voice complex.The speaker recognition system discussed hereemploys volume as the acoustic feature in endpoint detection as the computationload is usually low in this method.To correct two types of errors found in the endpoint detection practice because of ambient noise,this study suggests ways toimprove the situation.First,to reach high accuracy,this study follows a dynamictime warping(DTW)based method to gain speaker identification.Second,it isdevoted to avoiding any errors in endpoint detection by filtering noise from voicesignals before getting recognition and deleting any test utterances that might negatively affect the results of recognition.It is hoped that by so doing the recognitionrate is improved.According to the experimental results,the method proposed inthis research has a high recognition rate,whether it is on personal-level or industrial-level computers,and can reach the practical application standard.Therefore,the voice management system in this research can be regarded as Virtual customerservice staff to use.
文摘In audio stream containing multiple speakers, speaker diarization aids in ascertaining "who speak when". This is an unsupervised task as there is no prior information about the speakers. It labels the speech signal conforming to the identity of the speaker, namely, input audio stream is partitioned into homogeneous segments. In this work, we present a novel speaker diarization system using the Tangent weighted Mel frequency cepstral coefficient(TMFCC) as the feature parameter and Lion algorithm for the clustering of the voice activity detected audio streams into particular speaker groups. Thus the two main tasks of the speaker indexing, i.e., speaker segmentation and speaker clustering, are improved. The TMFCC makes use of the low energy frame as well as the high energy frame with more effect, improving the performance of the proposed system. The experiments using the audio signal from the ELSDSR corpus datasets having three speakers, four speakers and five speakers are analyzed for the proposed system. The evaluation of the proposed speaker diarization system based on the tracking distance, tracking time as the evaluation metrics is done and the experimental results show that the speaker diarization system with the TMFCC parameterization and Lion based clustering is found to be superior over existing diarization systems with 95% tracking accuracy.
文摘The aim of this paper is to show the accuracy and time results of a text independent automatic speaker recognition (ASR) system, based on Mel-Frequency Cepstrum Coefficients (MFCC) and Gaussian Mixture Models (GMM), in order to develop a security control access gate. 450 speakers were randomly extracted from the Voxforge.org audio database, their utterances have been improved using spectral subtraction, then MFCC were extracted and these coefficients were statistically analyzed by GMM in order to build each profile. For each speaker two different speech files were used: the first one to build the profile database, the second one to test the system performance. The accuracy achieved by the proposed approach is greater than 96% and the time spent for a single test run, implemented in Matlab language, is about 2 seconds on a common PC.
文摘Emotion mismatch between training and testing will cause system performance decline sharply which is emotional speaker recognition. It is an important idea to solve this problem according to the emotion normalization of test speech. This method proceeds from analysis of the differences between every kind of emotional speech and neutral speech. Besides, it takes the baseband mismatch of emotional changes as the main line. At the same time, it gives the corresponding algorithm according to four technical points which are emotional expansion, emotional shield, emotional normalization and score compensation. Compared with the traditional GMM-UBM method, the recognition rate in MASC corpus and EPST corpus was increased by 3.80% and 8.81% respectively.
文摘This paper lies in the field of digital signal processing.This is a speech recognition system that identifies the different speakers based on deep learning.The invention consists of the following steps:Firstly,we collect the voice data from different people.Secondly,the data having been selected is preprocessed by extracting their Mel Frequency Cepstral Coefficients(MFCC)and is divided into training set and test set randomly.Thirdly,we cut the training set into batches,and put them into the convolutional neural network which consists of convolutional layers,max pooling layers and fully connected layers.After repeatedly adjusting the parameters of the network such as learning rate,dropout rate and decay rate,the model will reach the optimal performance.Finally,the testing set is also cut into batches and put into the trained neural network.The final recognition accuracy rate is 70.23%.In brief,the research can automatically recognize different speakers efficiently.
基金The authors are grateful to the Taif University Researchers Supporting Project Number(TURSP-2020/36)Taif University,Taif,Saudi Arabia.
文摘Automatic Speaker Identification(ASI)involves the process of distinguishing an audio stream associated with numerous speakers’utterances.Some common aspects,such as the framework difference,overlapping of different sound events,and the presence of various sound sources during recording,make the ASI task much more complicated and complex.This research proposes a deep learning model to improve the accuracy of the ASI system and reduce the model training time under limited computation resources.In this research,the performance of the transformer model is investigated.Seven audio features,chromagram,Mel-spectrogram,tonnetz,Mel-Frequency Cepstral Coefficients(MFCCs),delta MFCCs,delta-delta MFCCs and spectral contrast,are extracted from the ELSDSR,CSTRVCTK,and Ar-DAD,datasets.The evaluation of various experiments demonstrates that the best performance was achieved by the proposed transformer model using seven audio features on all datasets.For ELSDSR,CSTRVCTK,and Ar-DAD,the highest attained accuracies are 0.99,0.97,and 0.99,respectively.The experimental results reveal that the proposed technique can achieve the best performance for ASI problems.
文摘Previous studies have investigated the efficiency in teaching listener and speaker repertoires in children diagnosed with autism spectrum disorder(ASD).Some investigations focused on listener responding by function,feature,and class(LRFFC)and intraverbal by function,feature,and class(FFC).For some children,teaching intraverbal FFC was more efficient because it resulted in a better emergence effect of a related untaught repertoire(LRFFC).For other children,teaching LRFFC along with tacting pictures was more efficient,resulting in a better emergence effect of a related untaught repertoire(intraverbal FFC).In these cases,it is not clear whether the tact increased the efficiency of LRFFC training because a comparison with a condition in which tacts were not required was not conducted.This investigation consisted of a replication with two children diagnosed with ASD.Three instructional sequences were compared:teaching LRFFC-probing intraverbal;teaching LRFFC+tacts-probing intraverbal;teaching intraverbal-probing LRFFC.For a child,all sequences were equally efficient because all related untaught repertoires emerged without errors.However,the acquisition of intraverbals during training occurred with variability.In the case of the second child,the most efficient sequence consisted of teaching intraverbals,resulting in the emergence of LRFFC without errors.In both cases of teaching LRFFC,the emergence of related intraverbals was partial and acquisition of the trained repertoires occurred with variability.The case that did not demand tact responses was slightly more efficient.Data were discussed in the sense that the best instructional sequence may vary from learner to learner.
基金The National Natural Science Foundation of China (No.60872073, 60975017, 51075068)the Natural Science Foundation of Guangdong Province (No. 10252800001000001)the Natural Science Foundation of Jiangsu Province (No. BK2010546)
文摘A novel emotional speaker recognition system (ESRS) is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech. Then, the recognized emotion speech is adjusted by prosody modification. Different methods including Gaussian normalization, the Gaussian mixture model (GMM) and support vector regression (SVR) are adopted to define the mapping rules of F0s between emotional and neutral speech, and the average linear ratio is used for the duration modification. Finally, the modified emotional speech is employed for the speaker recognition. The experimental results show that the proposed ESRS can significantly improve the performance of emotional speaker recognition, and the identification rate (IR) is higher than that of the traditional recognition system. The emotional speech with F0 and duration modifications is closer to the neutral one.
文摘On December 9,2023,I was privileged to be honored and participate in the Dr.Chi Chao Chan Symposium on Global Collaboration of Eye Research as the Global Eye Genetic Consortium(GEGC)session,which was held in the 16th Congress of the Asia-Pacific Vitreo-Retina Society(APVRS)in Hong Kong.Along with my talk on“Global collaboration of eye research:personal experience”,other prominent international speakers provided their own perspectives on opportunities for networking,collaboration,and exchange of ideas with global leaders and experts in ophthalmic practice,research,and education.
文摘This paper studies a high-speed text-independent Automatic Speaker Recognition(ASR)algorithm based on a multicore system's Gaussian Mixture Model(GMM).The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing procedures.Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm.The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ(2.3 GHz,four cores without hyper-threading,and 8 GB of RAM).In addition,a remarkable 100%speaker recognition accuracy is achieved.
文摘This week promises a rich variety of thought-provoking sessions,covering emerging trends,global challenges and future opportunities.Inspiring speakers from across business,government,public,and private sectors spark passionate discussions and debates.I hope that like me,you are also excited at the prospect of encountering fresh ideas,opinions,and perspectives,and are ready to consider the role of standards in achieving the 2030 Agenda for Sustainable Development.
文摘Chinese Telecom Companies Eye African Market Chinese telecom companies,showcasing innovative technologies and products at the Africa Tech Festival 2024 in South Africa,have set ambitious goals to expand their footprint in the rapidly growing African market.Held from 12 to 14 November at the Cape Town International Convention Centre in the country’s legislative capital,the festival-Africa’s largest and most influential telecom and technology event-attracted 15,000 attendees,over 300 exhibitors,and 450 speakers.
文摘1 Quechua is a language family with nine variants(变体),numbering more than 10 million speakers in some countries:Chile,Ecuador,Bolivia,Colombia,Argentina and Peru.To understand the origins of the Quechua language,we have to go back in time to a territory currently in Peru and Ecuador known as Chinchay Suyu.
文摘This paper attempts to argue that in the age of‘World Englishes', it is not necessary to differentiate native speaker teachers from non-native speaker teachers. It is concluded that non-native speaker teachers can be as effective as their native colleagues and they have equal chance to achieve professional success, even though native speaker teachers have great advantages over non-native teachers in some aspects. It is time for employers, as well as ELT professionals to shut their eyes to the glaring differences between native speaker teachers and non-native speaker teachers and optimize such unique resources.
文摘The target of much language teaching and learning is to make students approximate to native speakers.The only rightful speak ers of a language are its native speakers.Contrary to these contemporary views,however,this paper argues that the obligation of the lan guage teacher is to help students to use L2 effectively not to simply imitate native speaker.A successful L2 user who comes from the group of L2 learners can be a model for students.Therefore,non-native teachers with a high degree of language proficiency and good teaching skills can be ideal and qualified language teachers.
基金Project(61072087) supported by the National Natural Science Foundation of ChinaProject(20093048) supported by Shanxi ProvincialGraduate Innovation Fund of China
文摘Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results.
文摘This paper reports on part of the findings of a large-scale study exploring the viewpoints of Chinese ELT stakeholders(students,teachers and administrators)on native speakerism in order to find out whether current EFL education in China is still affected by this chauvinistic ideology.The analysis of data via a critical lens reveals that the vast majority of the participants conferred upon NS products(teacher,language,culture and teaching methodology)a status superior to that granted to the NNS counterparts and failed to see linguacultural and epistemological inequalities between the English speaking West and traditional NNS countries,inter alia,China.These findings suggest that the three participant groups as an entirety succumb to native speakerism,and by extension that ELT in China is still haunted to a great degree by this ideology.Given that this study treats each participant group separately,future studies are expected to explore inter-group interactions in ideology.