Some remarks are made on the use of the Abadie constraint qualification, the Guignard constraint qualifications and the Guignard regularity condition in obtaining weak and strong Kuhn-Tucker type optimality conditions...Some remarks are made on the use of the Abadie constraint qualification, the Guignard constraint qualifications and the Guignard regularity condition in obtaining weak and strong Kuhn-Tucker type optimality conditions in differentiable vector optimization problems.展开更多
预训练模型通过自监督学习表示在非平行语料语音转换(VC)取得了重大突破。随着自监督预训练表示(SSPR)的广泛使用,预训练模型提取的特征中被证实包含更多的内容信息。提出一种基于SSPR同时结合矢量量化(VQ)和联结时序分类(CTC)的VC模型...预训练模型通过自监督学习表示在非平行语料语音转换(VC)取得了重大突破。随着自监督预训练表示(SSPR)的广泛使用,预训练模型提取的特征中被证实包含更多的内容信息。提出一种基于SSPR同时结合矢量量化(VQ)和联结时序分类(CTC)的VC模型。将预训练模型提取的SSPR作为端到端模型的输入,用于提高单次语音转换质量。如何有效地解耦内容表示和说话人表示成为语音转换中的关键问题。使用SSPR作为初步的内容信息,采用VQ从语音中解耦内容和说话人表示。然而,仅使用VQ只能将内容信息离散化,很难将纯粹的内容表示从语音中分离出来,为了进一步消除内容信息中说话人的不变信息,提出CTC损失指导内容编码器。CTC不仅作为辅助网络加快模型收敛,同时其额外的文本监督可以与VQ联合优化,实现性能互补,学习纯内容表示。说话人表示采用风格嵌入学习,2种表示作为系统的输入进行语音转换。在开源的CMU数据集和VCTK语料库对所提的方法进行评估,实验结果表明,该方法在客观上的梅尔倒谱失真(MCD)达到8.896 d B,在主观上的语音自然度平均意见分数(MOS)和说话人相似度MOS分别为3.29和3.22,均优于基线模型,此方法在语音转换的质量和说话人相似度上能够获得最佳性能。展开更多
A VQ based efficient speech recognition method is introduced, and the key parameters of this method are comparatively studied. This method is especially designed for mandarin speaker dependent small size word set r...A VQ based efficient speech recognition method is introduced, and the key parameters of this method are comparatively studied. This method is especially designed for mandarin speaker dependent small size word set recognition. It has less complexity, less resource consumption but higher ARR (accurate recognition rate) compared with traditional HMM or NN approach. A large scale test on the task of 11 mandarin digits recognition shows that the WER(word error rate) can reach 3 86%. This method is suitable for being embedded in PDA (personal digital assistant), mobile phone and so on to perform voice controlling like digits dialing, name dialing, calculating, voice commanding, etc.展开更多
Vector quantization (VQ) is an important data compression method. The key of the encoding of VQ is to find the closest vector among N vectors for a feature vector. Many classical linear search algorithms take O(N)...Vector quantization (VQ) is an important data compression method. The key of the encoding of VQ is to find the closest vector among N vectors for a feature vector. Many classical linear search algorithms take O(N) steps of distance computing between two vectors. The quantum VQ iteration and corresponding quantum VQ encoding algorithm that takes O(√N) steps are presented in this paper. The unitary operation of distance computing can be performed on a number of vectors simultaneously because the quantum state exists in a superposition of states. The quantum VQ iteration comprises three oracles, by contrast many quantum algorithms have only one oracle, such as Shor's factorization algorithm and Grover's algorithm. Entanglement state is generated and used, by contrast the state in Grover's algorithm is not an entanglement state. The quantum VQ iteration is a rotation over subspace, by contrast the Grover iteration is a rotation over global space. The quantum VQ iteration extends the Grover iteration to the more complex search that requires more oracles. The method of the quantum VQ iteration is universal.展开更多
Image subbands can be obtained by using filterbank. Traditional compression method uses direct entropy coding for each subband. After studying the energy distribution in image subbands, we proposed a vector quantizati...Image subbands can be obtained by using filterbank. Traditional compression method uses direct entropy coding for each subband. After studying the energy distribution in image subbands, we proposed a vector quantization (VQ) coding algorithm to image subband. In the algorithm, vector quantizers were adaptively designed for high-frequency bands in an image. In particular, the edges of the image were examined and fewer bits were assigned to high-energy regions. The experimental result showed that the algorithm had higher SNR and higher compression ratio than possible by traditional subband coding, JPEG and JPEG 2000.展开更多
In this paper, we present a theoretical codebook design method for VQ-based fast face recognition algorithm to im-prove recognition accuracy. Based on the systematic analysis and classification of code patterns, first...In this paper, we present a theoretical codebook design method for VQ-based fast face recognition algorithm to im-prove recognition accuracy. Based on the systematic analysis and classification of code patterns, firstly we theoretically create a systematically organized codebook. Combined with another codebook created by Kohonen’s Self-Organizing Maps (SOM) method, an optimized codebook consisted of 2×2 codevectors for facial images is generated. Experimental results show face recognition using such a codebook is more efficient than the codebook consisted of 4×4 codevector used in conventional algorithm. The highest average recognition rate of 98.6% is obtained for 40 persons’ 400 images of publicly available face database of AT&T Laboratories Cambridge containing variations in lighting, posing, and expressions. A table look-up (TLU) method is also proposed for the speed up of the recognition processing. By applying this method in the quantization step, the total recognition processing time achieves only 28 msec, enabling real-time face recognition.展开更多
A novel approach for near-lossless compression of Color Filtering Array (CFA) data in wireless endoscopy capsule is proposed in this paper. The compression method is based on pre-processing and vector quantization. Fi...A novel approach for near-lossless compression of Color Filtering Array (CFA) data in wireless endoscopy capsule is proposed in this paper. The compression method is based on pre-processing and vector quantization. First, the CFA raw data are low pass filtered and rearranged during pre-processing. Then, pairs of pixels are vector quantized into macros of 9 bits by applying block par-tition and index mapping in succession. These macros are entropy compressed by Joint Photographic Experts Group-Lossless Standard (JPEG-LS) finally. The complex step of codeword searching in Vector Quantization (VQ) is avoided by a predefined partition rule, which is suitable for hardware imple-mentation. By control of the pre-processor and VQ scheme, either high quality compression under un- filtered case or high ratio compression under filtered case can be realized, with the average Peak Sig-nal-to-Noise Ratio (PSNR) more than 43dB and 37dB respectively. Compared with the state-of-the-art method and the previously proposed method, our compression approach outperforms in compression performance as well as in flexibility.展开更多
文摘Some remarks are made on the use of the Abadie constraint qualification, the Guignard constraint qualifications and the Guignard regularity condition in obtaining weak and strong Kuhn-Tucker type optimality conditions in differentiable vector optimization problems.
文摘预训练模型通过自监督学习表示在非平行语料语音转换(VC)取得了重大突破。随着自监督预训练表示(SSPR)的广泛使用,预训练模型提取的特征中被证实包含更多的内容信息。提出一种基于SSPR同时结合矢量量化(VQ)和联结时序分类(CTC)的VC模型。将预训练模型提取的SSPR作为端到端模型的输入,用于提高单次语音转换质量。如何有效地解耦内容表示和说话人表示成为语音转换中的关键问题。使用SSPR作为初步的内容信息,采用VQ从语音中解耦内容和说话人表示。然而,仅使用VQ只能将内容信息离散化,很难将纯粹的内容表示从语音中分离出来,为了进一步消除内容信息中说话人的不变信息,提出CTC损失指导内容编码器。CTC不仅作为辅助网络加快模型收敛,同时其额外的文本监督可以与VQ联合优化,实现性能互补,学习纯内容表示。说话人表示采用风格嵌入学习,2种表示作为系统的输入进行语音转换。在开源的CMU数据集和VCTK语料库对所提的方法进行评估,实验结果表明,该方法在客观上的梅尔倒谱失真(MCD)达到8.896 d B,在主观上的语音自然度平均意见分数(MOS)和说话人相似度MOS分别为3.29和3.22,均优于基线模型,此方法在语音转换的质量和说话人相似度上能够获得最佳性能。
文摘A VQ based efficient speech recognition method is introduced, and the key parameters of this method are comparatively studied. This method is especially designed for mandarin speaker dependent small size word set recognition. It has less complexity, less resource consumption but higher ARR (accurate recognition rate) compared with traditional HMM or NN approach. A large scale test on the task of 11 mandarin digits recognition shows that the WER(word error rate) can reach 3 86%. This method is suitable for being embedded in PDA (personal digital assistant), mobile phone and so on to perform voice controlling like digits dialing, name dialing, calculating, voice commanding, etc.
文摘Vector quantization (VQ) is an important data compression method. The key of the encoding of VQ is to find the closest vector among N vectors for a feature vector. Many classical linear search algorithms take O(N) steps of distance computing between two vectors. The quantum VQ iteration and corresponding quantum VQ encoding algorithm that takes O(√N) steps are presented in this paper. The unitary operation of distance computing can be performed on a number of vectors simultaneously because the quantum state exists in a superposition of states. The quantum VQ iteration comprises three oracles, by contrast many quantum algorithms have only one oracle, such as Shor's factorization algorithm and Grover's algorithm. Entanglement state is generated and used, by contrast the state in Grover's algorithm is not an entanglement state. The quantum VQ iteration is a rotation over subspace, by contrast the Grover iteration is a rotation over global space. The quantum VQ iteration extends the Grover iteration to the more complex search that requires more oracles. The method of the quantum VQ iteration is universal.
文摘Image subbands can be obtained by using filterbank. Traditional compression method uses direct entropy coding for each subband. After studying the energy distribution in image subbands, we proposed a vector quantization (VQ) coding algorithm to image subband. In the algorithm, vector quantizers were adaptively designed for high-frequency bands in an image. In particular, the edges of the image were examined and fewer bits were assigned to high-energy regions. The experimental result showed that the algorithm had higher SNR and higher compression ratio than possible by traditional subband coding, JPEG and JPEG 2000.
文摘In this paper, we present a theoretical codebook design method for VQ-based fast face recognition algorithm to im-prove recognition accuracy. Based on the systematic analysis and classification of code patterns, firstly we theoretically create a systematically organized codebook. Combined with another codebook created by Kohonen’s Self-Organizing Maps (SOM) method, an optimized codebook consisted of 2×2 codevectors for facial images is generated. Experimental results show face recognition using such a codebook is more efficient than the codebook consisted of 4×4 codevector used in conventional algorithm. The highest average recognition rate of 98.6% is obtained for 40 persons’ 400 images of publicly available face database of AT&T Laboratories Cambridge containing variations in lighting, posing, and expressions. A table look-up (TLU) method is also proposed for the speed up of the recognition processing. By applying this method in the quantization step, the total recognition processing time achieves only 28 msec, enabling real-time face recognition.
基金the National Natural Science Foundation of China (No. 60506007).
文摘A novel approach for near-lossless compression of Color Filtering Array (CFA) data in wireless endoscopy capsule is proposed in this paper. The compression method is based on pre-processing and vector quantization. First, the CFA raw data are low pass filtered and rearranged during pre-processing. Then, pairs of pixels are vector quantized into macros of 9 bits by applying block par-tition and index mapping in succession. These macros are entropy compressed by Joint Photographic Experts Group-Lossless Standard (JPEG-LS) finally. The complex step of codeword searching in Vector Quantization (VQ) is avoided by a predefined partition rule, which is suitable for hardware imple-mentation. By control of the pre-processor and VQ scheme, either high quality compression under un- filtered case or high ratio compression under filtered case can be realized, with the average Peak Sig-nal-to-Noise Ratio (PSNR) more than 43dB and 37dB respectively. Compared with the state-of-the-art method and the previously proposed method, our compression approach outperforms in compression performance as well as in flexibility.