期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
Web Voice Browser Based on an ISLPC Text-to-Speech Algorithm
1
作者 LIAO Rikun JI Yuefeng LI Hui 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1157-1160,共4页
A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system wit... A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system with ISLPC speech synthesis and discusses the design and implementation of ISLPC TTS-based Web voice browser. The browser integrates Web technology, Chinese information processing, artificial intelligence and the key technology of Chinese ISLPC speech synthesis. It's a visual and audible web browser that can improve information precision for network users. The evaluation results show that ISLPC-based TTS model has a better performance than other browsers in voice quality and capability of identifying Chinese characters. 展开更多
关键词 improved synchronous linear predictive coding (ISLPC) text-to-speech (TTS) Web voice browser voice quality
下载PDF
Trainable prosodic model for standard Chinese Text-to-Speech system 被引量:1
2
作者 TAO Jianhua, CAI Lianhong, ZHAO Shixia (Department of Computer Science and Technology Tsinghua University Beijing 100084) 《Chinese Journal of Acoustics》 2001年第3期257-265,共9页
Putonghua prosody is characterized by its hierarchical structure when influenced by linguistic environments. Based on this, a neural network, with specially weighted factors and optimizing outputs, is described and ap... Putonghua prosody is characterized by its hierarchical structure when influenced by linguistic environments. Based on this, a neural network, with specially weighted factors and optimizing outputs, is described and applied to construct the Putonghua prosodic model in Text-to-Speech (TTS) system. Extensive tests show that the structure of the neural network characterizes the Putonghua prosody more exactly than traditional models. Learning rate is speeded up and computational precision is improved, which makes the whole prosodic model more efficient. Furthermore, the paper also stylizes the Putonghua syllable pitch contours with SPiS parameters (Syllable Pitch Stylized Parameters), and analyzes them in adjusting the syllable pitch. It shows that the SPiS parameters effectively characterize the Putonghua syllable pitch contours, and facilitate the establishment of the network model and the prosodic controlling. 展开更多
关键词 Trainable prosodic model for standard Chinese text-to-speech system TEXT
原文传递
A Unified Framework for Multilingual Text-to-Speech Synthesis with SSML Specification as Interface
3
作者 吴志勇 曹光琦 +1 位作者 蒙美玲 蔡莲红 《Tsinghua Science and Technology》 SCIE EI CAS 2009年第5期623-630,共8页
This paper describes the design of a unified framework for a multilingual text-to-speech (TTS) synthesis engine - Crystal. The unified framework defines the common TTS modules for different languages and/or dialects... This paper describes the design of a unified framework for a multilingual text-to-speech (TTS) synthesis engine - Crystal. The unified framework defines the common TTS modules for different languages and/or dialects. The interfaces between consecutive modules conform to the speech synthesis markup language (SSML) specification for standardization, interoperability, multilinguality, and extensibility. Detailed module divisions and implementation technologies for the unified framework are introduced, together with possible extensions for the algorithm research and evaluation of the TTS synthesis. Implementation of a mixed-language TTS system for Chinese Putonghua, Chinese Cantonese, and English demonstrates the feasibility of the proposed unified framework. 展开更多
关键词 text-to-speech (TTS) synthesis MULTILINGUAL unified framework speech synthesis markup language (SSML)
原文传递
浊声基频轮廓对汉语合成自然度提高的分析与综合 被引量:1
4
作者 田岚 陆小珊 杨霓清 《山东大学学报(工学版)》 CAS 2003年第4期413-416,共4页
连续语音浊声基频轮廓是影响合成语音自然度和表现力的一个重要因素 .本文采用序位调值分类统计法 ,对汉语连续语音音调动态特性作了系统分析 ,提出一种用于分析和分层产生汉语连续语音基频参数的数学模型 .模型充分考虑了汉语发音特点 ... 连续语音浊声基频轮廓是影响合成语音自然度和表现力的一个重要因素 .本文采用序位调值分类统计法 ,对汉语连续语音音调动态特性作了系统分析 ,提出一种用于分析和分层产生汉语连续语音基频参数的数学模型 .模型充分考虑了汉语发音特点 ,归纳了语言表达中音调变化的各种可能 ,并相应设置了控制调整参量 ,相对完整而实用地表示了语言知识和基频参数之间的对应关系 .对一些典型自然语句进行了仿真实验 ,结果表明 ,该模型控制产生的合成基频轮廓和测试目标可达到满意的吻合 ,对有效改善TTS系统语音合成自然度作用明显 . 展开更多
关键词 文语转换 text-to-speech 韵律特征 基频 语音自然度 浊声基频轮廓
下载PDF
Cloud Enabled Text Reader for Individuals with Vision Impairment
5
作者 Abul K. M. Azad Mohammed Misbahuddin 《Advances in Internet of Things》 2017年第4期97-111,共15页
The paper describes the development of a text reader for people with vision impairments. The system is designed to extract the content of written documents or commercially printed materials. In terms of hardware, it u... The paper describes the development of a text reader for people with vision impairments. The system is designed to extract the content of written documents or commercially printed materials. In terms of hardware, it utilizes a camera, a small embedded processor board, and an Alexa Echo Dot. The software involves an open source text detection library called Tesseract along with Leptonica and OpenCV. The system in its current version can only work with English text. By using the Amazon cloud web services, a skill set was deployed, which would read aloud the detected text utilizing a OpenCV program via the Alexa Echo Dot. For this development, a Raspberry Pi was utilized as the embedded processor system. 展开更多
关键词 text-to-speech Converter CLOUD Computing Image Processing Embedded PROCESSOR Internet of THINGS
下载PDF
A HMM-Based System To Diacritize Arabic Text
6
作者 M. S. Khorsheed 《Journal of Software Engineering and Applications》 2012年第12期124-127,共4页
The Arabic language comes under the category of Semitic languages with an entirely different sentence structure in terms of Natural Language Processing. In such languages, two different words may have identical spelli... The Arabic language comes under the category of Semitic languages with an entirely different sentence structure in terms of Natural Language Processing. In such languages, two different words may have identical spelling whereas their pronunciations and meanings are totally different. To remove this ambiguity, special marks are put above or below? the spelling characters to determine the correct pronunciation. These marks are called diacritics and the language that uses them is called a diacritized language. This paper presents a system for Arabic language diacritization using Hid- den Markov Models (HMMs). The system employs the renowned HMM Tool Kit? (HTK). Each single diacritic is represented as a separate model. The concatenation of output models is coupled with the input? character sequence to form the fully diacritized text. The performance of the proposed system is assessed using a data corpus that includes more than 24000 sentences. 展开更多
关键词 ARABIC Hidden MARKOV MODELS text-to-speech Diacritization
下载PDF
Control Emotion Intensity for LSTM-Based Expressive Speech Synthesis
7
作者 Xiaolian Zhu Liumeng Xue 《国际计算机前沿大会会议论文集》 2019年第2期654-656,共3页
To improve the performance of human-computer interaction interfaces, emotion is considered to be one of the most important factors. The major objective of expressive speech synthesis is to inject various expressions r... To improve the performance of human-computer interaction interfaces, emotion is considered to be one of the most important factors. The major objective of expressive speech synthesis is to inject various expressions reflecting different emotions to the synthesized speech. To effectively model and control the emotion, emotion intensity is introduced for expressive speech synthesis model to generate speech conveyed the delicate and complicate emotional states. The system was composed of an emotion analysis module with the goal of extracting control emotion intensity vector and a speech synthesis module responsible for mapping text characters to speech waveform. The proposed continuous variable “perception vector” is a data-driven approach of controlling the model to synthesize speech with different emotion intensities. Compared with the system using a one-hot vector to control emotion intensity, this model using perception vector is able to learn the high-level emotion information from low-level acoustic features. In terms of the model controllability and flexibility, both the objective and subjective evaluations demonstrate perception vector outperforms one-hot vector. 展开更多
关键词 EMOTION INTENSITY Expressive SPEECH synthesis CONTROLLABLE text-to-speech NEURAL networks
下载PDF
An Approach to Intelligent Speech Production System
8
作者 陈芳 袁保宗 《Journal of Computer Science & Technology》 SCIE EI CSCD 1997年第2期185-188,共4页
In the paper an intelligent speech production system is established by using language information processing technology. The concept of bi-directional grammar is proposed in Chinese language information processing and... In the paper an intelligent speech production system is established by using language information processing technology. The concept of bi-directional grammar is proposed in Chinese language information processing and a corresponding Chinese characteristic network is completed. Correct text can be generated through grammar parsing and some additional rules. According to the generated text the system generates speech which has good quality in naturalness and intelligibility using Chinese Text-to-Speech Conversion System. 展开更多
关键词 Language information processing GRAMMAR text generation text-to-speech conversion
原文传递
A Synthesis Instance Pruning Approach Based on Virtual Non-uniform Replacements
9
作者 张巍 凌震华 +1 位作者 胡国平 王仁华 《Tsinghua Science and Technology》 SCIE EI CAS 2008年第4期515-521,共7页
The employment of non-uniform processes assists greatly in the corpus-based text-to-speech (TTS) system to synthesize natural speech. However, tailoring a TTS voice font, or pruning redundant synthesis instances, us... The employment of non-uniform processes assists greatly in the corpus-based text-to-speech (TTS) system to synthesize natural speech. However, tailoring a TTS voice font, or pruning redundant synthesis instances, usually results in loss of non-uniform synthesis instances. In order to solve this problem, we propose the concept of virtual non-uniform instances. According to this concept and the synthesis frequency of each instance, the algorithm named StaRp-VPA is constructed to make up for the loss of nonuniform instances. In experimental testing, the naturalness scored by the mean opinion score (MOS) remains almost unchanged when less than 50% instances are pruned, and the MOS is only slightly degraded for reduction rates above 50%. The test results show that the algorithm StaRp-VPA is effective. 展开更多
关键词 text-to-speech system speech synthesis synthesis instance pruning non-uniform unit
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部