The use of hidden conditional random fields (HCRFs) for tone modeling is explored. The tone recognition performance is improved using HCRFs by taking advantage of intra-syllable dynamic, inter-syllable dynamic and d...The use of hidden conditional random fields (HCRFs) for tone modeling is explored. The tone recognition performance is improved using HCRFs by taking advantage of intra-syllable dynamic, inter-syllable dynamic and duration features. When the tone model is integrated into continuous speech recognition, the discriminative model weight training (DMWT) is proposed. Acoustic and tone scores are scaled by model weights discriminatively trained by the minimum phone error (MPE) criterion. Two schemes of weight training are evaluated and a smoothing technique is used to make training robust to overtraining problem. Experiments show that the accuracies of tone recognition and large vocabulary continuous speech recognition (LVCSR) can be improved by the HCRFs based tone model. Compared with the global weight scheme, continuous speech recognition can be improved by the discriminative trained weight combinations.展开更多
Video object segmentation is important for video surveillance, object tracking, video object recognition and video editing. An adaptive video segmentation algorithm based on hidden conditional random fields (HCRFs) is...Video object segmentation is important for video surveillance, object tracking, video object recognition and video editing. An adaptive video segmentation algorithm based on hidden conditional random fields (HCRFs) is proposed, which models spatio-temporal constraints of video sequence. In order to improve the segmentation quality, the weights of spatio-temporal con- straints are adaptively updated by on-line learning for HCRFs. Shadows are the factors affecting segmentation quality. To separate foreground objects from the shadows they cast, linear transform for Gaussian distribution of the background is adopted to model the shadow. The experimental results demonstrated that the error ratio of our algorithm is reduced by 23% and 19% respectively, compared with the Gaussian mixture model (GMM) and spatio-temporal Markov random fields (MRFs).展开更多
文摘The use of hidden conditional random fields (HCRFs) for tone modeling is explored. The tone recognition performance is improved using HCRFs by taking advantage of intra-syllable dynamic, inter-syllable dynamic and duration features. When the tone model is integrated into continuous speech recognition, the discriminative model weight training (DMWT) is proposed. Acoustic and tone scores are scaled by model weights discriminatively trained by the minimum phone error (MPE) criterion. Two schemes of weight training are evaluated and a smoothing technique is used to make training robust to overtraining problem. Experiments show that the accuracies of tone recognition and large vocabulary continuous speech recognition (LVCSR) can be improved by the HCRFs based tone model. Compared with the global weight scheme, continuous speech recognition can be improved by the discriminative trained weight combinations.
基金Project supported by the National Natural Science Foundation of China (Nos. 60473106, 60273060 and 60333010)the Ministry of Education of China (No. 20030335064)the Education Depart-ment of Zhejiang Province, China (No. G20030433)
文摘Video object segmentation is important for video surveillance, object tracking, video object recognition and video editing. An adaptive video segmentation algorithm based on hidden conditional random fields (HCRFs) is proposed, which models spatio-temporal constraints of video sequence. In order to improve the segmentation quality, the weights of spatio-temporal con- straints are adaptively updated by on-line learning for HCRFs. Shadows are the factors affecting segmentation quality. To separate foreground objects from the shadows they cast, linear transform for Gaussian distribution of the background is adopted to model the shadow. The experimental results demonstrated that the error ratio of our algorithm is reduced by 23% and 19% respectively, compared with the Gaussian mixture model (GMM) and spatio-temporal Markov random fields (MRFs).