A sinusoidal representation of speech and a cochlear model are used to extract speech parameters in this paper, and a speech analysis/synthesis system controlled by the auditory spectrum is developed with the model. T...A sinusoidal representation of speech and a cochlear model are used to extract speech parameters in this paper, and a speech analysis/synthesis system controlled by the auditory spectrum is developed with the model. The computer simulation shows that speech can be synthesized with only 12 parameters per frame on the average. The method has the advantages of few parameters, low complexity and high performance of speech representation. The synthetic speech has high intelligibility.展开更多
According to Reiss’s Text Type theory,a key part of the functionalist approach in translation studies,the source text can be assigned to a text type and to a genre.In making this assignment,the translator can decide ...According to Reiss’s Text Type theory,a key part of the functionalist approach in translation studies,the source text can be assigned to a text type and to a genre.In making this assignment,the translator can decide on the hierarchy of postulates which has to be observed during target-text production(Mona,2005).This essay intends to conduct a linguistic and stylistic analysis of the Chinese translation of Obama’s speech to explore the general approach of the translator(if there is one),by comparing the respective results of the two analyses from the perspective of Katharina Reiss’s Text Type theory.In doing so,critical judgments will accordingly be made as to whether such an approach is justifiable or not.展开更多
Twitter has become very popular among celebrities. It is the main platform used by them to publish press releases and, especially, to reach out to their fans. Given the pervasiveness of celebrities on the site, people...Twitter has become very popular among celebrities. It is the main platform used by them to publish press releases and, especially, to reach out to their fans. Given the pervasiveness of celebrities on the site, people with related interests may be especially likely to start using the service due to the perception of direct access to a famous person. As for the celebrities, it is a way of being close to the public and giving them an insight in to the life of a celebrity. Although most celebrity Twitter accounts are only used for promotion purposes, many celebrities use their personal accounts for the purpose of communicating with their fans, friends and other celebrities. These celebrities tweet personal photos and share their inner thoughts for various reasons and to different audiences. Thus in this study I ask: What are celebrity speech patterns on Twitter? Are they talking mostly to fans, and if not, who are they talking to? How are they talking to these different audiences? I address these questions by analyzing the tweets publicly available on four active celebrities’ Twitter timelines. The findings support that these celebrities indeed address different audiences on Twitter, including fans, friends, family and other celebrities. The findings further reveal that celebrities tend to communicate using different speech acts when talking to these different audiences. In light of this evidence, I attempt to highlight patterns that may be relevant with regards to the celebrities’ gender.展开更多
Nonlinear dynamic method is used in studying Chinese spoken in normal speed, and the improved correlation dimension algorithm are made for the characterization of speech signal. The reconstructed phase space and corre...Nonlinear dynamic method is used in studying Chinese spoken in normal speed, and the improved correlation dimension algorithm are made for the characterization of speech signal. The reconstructed phase space and correlation dimension curves of unvoiced fricative consonants and vowels are also given. It is found that the correlation dimension algorithm can distinguish fricative from vowel because of the different mechanism between them. And the study shows that it can provide information for distinguishing four basic tones in mandarin.展开更多
In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classificati...In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classification tool designed to educate iHEARu-PLAY users about state-of-the-art speech analysis paradigms. Via this associated speech analysis web interface, in addition, VoiLA encourages users to take an active role in improving the service by providing labelled speech data. The platform allows users to record and upload voice samples directly from their browser, which are then analysed in a state-of-the-art classification pipeline. A set of pre-trained models targeting a range of speaker states and traits such as gender, valence, arousal, dominance, and 24 different discrete emotions is employed. The analysis results are visualised in a way that they are easily interpretable by laymen, giving users unique insights into how their voice sounds. We assess the effectiveness of iHEARu-PLAY and its integrated VoiLA feature via a series of user evaluations which indicate that it is fun and easy to use, and that it provides accurate and informative results.展开更多
The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of au...The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.展开更多
文摘A sinusoidal representation of speech and a cochlear model are used to extract speech parameters in this paper, and a speech analysis/synthesis system controlled by the auditory spectrum is developed with the model. The computer simulation shows that speech can be synthesized with only 12 parameters per frame on the average. The method has the advantages of few parameters, low complexity and high performance of speech representation. The synthetic speech has high intelligibility.
文摘According to Reiss’s Text Type theory,a key part of the functionalist approach in translation studies,the source text can be assigned to a text type and to a genre.In making this assignment,the translator can decide on the hierarchy of postulates which has to be observed during target-text production(Mona,2005).This essay intends to conduct a linguistic and stylistic analysis of the Chinese translation of Obama’s speech to explore the general approach of the translator(if there is one),by comparing the respective results of the two analyses from the perspective of Katharina Reiss’s Text Type theory.In doing so,critical judgments will accordingly be made as to whether such an approach is justifiable or not.
文摘Twitter has become very popular among celebrities. It is the main platform used by them to publish press releases and, especially, to reach out to their fans. Given the pervasiveness of celebrities on the site, people with related interests may be especially likely to start using the service due to the perception of direct access to a famous person. As for the celebrities, it is a way of being close to the public and giving them an insight in to the life of a celebrity. Although most celebrity Twitter accounts are only used for promotion purposes, many celebrities use their personal accounts for the purpose of communicating with their fans, friends and other celebrities. These celebrities tweet personal photos and share their inner thoughts for various reasons and to different audiences. Thus in this study I ask: What are celebrity speech patterns on Twitter? Are they talking mostly to fans, and if not, who are they talking to? How are they talking to these different audiences? I address these questions by analyzing the tweets publicly available on four active celebrities’ Twitter timelines. The findings support that these celebrities indeed address different audiences on Twitter, including fans, friends, family and other celebrities. The findings further reveal that celebrities tend to communicate using different speech acts when talking to these different audiences. In light of this evidence, I attempt to highlight patterns that may be relevant with regards to the celebrities’ gender.
基金National Natural Science Foundation of China!(No. 19834040).
文摘Nonlinear dynamic method is used in studying Chinese spoken in normal speed, and the improved correlation dimension algorithm are made for the characterization of speech signal. The reconstructed phase space and correlation dimension curves of unvoiced fricative consonants and vowels are also given. It is found that the correlation dimension algorithm can distinguish fricative from vowel because of the different mechanism between them. And the study shows that it can provide information for distinguishing four basic tones in mandarin.
基金supported by the European Community’s Seventh Framework Programme(No.338164)(ERC Starting Grant iHEARu)
文摘In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classification tool designed to educate iHEARu-PLAY users about state-of-the-art speech analysis paradigms. Via this associated speech analysis web interface, in addition, VoiLA encourages users to take an active role in improving the service by providing labelled speech data. The platform allows users to record and upload voice samples directly from their browser, which are then analysed in a state-of-the-art classification pipeline. A set of pre-trained models targeting a range of speaker states and traits such as gender, valence, arousal, dominance, and 24 different discrete emotions is employed. The analysis results are visualised in a way that they are easily interpretable by laymen, giving users unique insights into how their voice sounds. We assess the effectiveness of iHEARu-PLAY and its integrated VoiLA feature via a series of user evaluations which indicate that it is fun and easy to use, and that it provides accurate and informative results.
基金supported by the Tencent and Shanghai Jiao Tong University Joint Project
文摘The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.