Synchronized acoustic-articulatory data is the basis of various applications,such as exploring the fundamental mechanisms of speech production,acoustic to articulatory inversion(AAI),and articulatory to acoustic mappi...Synchronized acoustic-articulatory data is the basis of various applications,such as exploring the fundamental mechanisms of speech production,acoustic to articulatory inversion(AAI),and articulatory to acoustic mapping(AAM).Numerous studies have been conducted based on the synchronized ElectroMagnetic Articulograhy(EMA)data and acoustic data.Hence,it is necessary to make clear whether the EMA-synchronized speech and stand-alone speech are different,and if so,how it affects the performance of the applications that are based on synchronized acoustic-articulatory data.In this study,we compare the differences between EMA-synchronized speech and stand-alone speech from the aspect of speech recognition based on the data of a male speaker.It is found that:i)the general error rate of EMA-synchronized speech is much higher than that of stand-alone speech;ii)apical vowels and apical/blade consonants are more significantly affected by the presence of EMA coils;iii)parts of vowel and consonant tokens are confused with the sounds who use the same articulator or the articulators nearby,such as confusion among apical vowels and confusion among apical and blade consonants;iv)the confusion of labial tokens demonstrates a diverse pattern.展开更多
In this paper we introduce a video post-processing method that enhances the rhythm of a dancing performance,in the sense that the dancing movements are more in time to the beat of the music.The dancing performance as ...In this paper we introduce a video post-processing method that enhances the rhythm of a dancing performance,in the sense that the dancing movements are more in time to the beat of the music.The dancing performance as observed in a video is analyzed and segmented into motion intervals delimited by motion beats.We present an image-space method to extract the motion beats of a video by detecting frames at which there is a significant change in direction or motion stops.The motion beats are then synchronized with the music beats such that as many beats as possible are matched with as little as possible time-warping distortion to the video.We show two applications for this cross-media synchronization:one where a given dance performance is enhanced to be better synchronized with its original music,and one where a given dance video is automatically adapted to be synchronized with different music.展开更多
基金supported by the National Natural Science Foundation of China(No.61977049)Advanced Innovation Center for Language Resource and Intelligence(KYR17005)
文摘Synchronized acoustic-articulatory data is the basis of various applications,such as exploring the fundamental mechanisms of speech production,acoustic to articulatory inversion(AAI),and articulatory to acoustic mapping(AAM).Numerous studies have been conducted based on the synchronized ElectroMagnetic Articulograhy(EMA)data and acoustic data.Hence,it is necessary to make clear whether the EMA-synchronized speech and stand-alone speech are different,and if so,how it affects the performance of the applications that are based on synchronized acoustic-articulatory data.In this study,we compare the differences between EMA-synchronized speech and stand-alone speech from the aspect of speech recognition based on the data of a male speaker.It is found that:i)the general error rate of EMA-synchronized speech is much higher than that of stand-alone speech;ii)apical vowels and apical/blade consonants are more significantly affected by the presence of EMA coils;iii)parts of vowel and consonant tokens are confused with the sounds who use the same articulator or the articulators nearby,such as confusion among apical vowels and confusion among apical and blade consonants;iv)the confusion of labial tokens demonstrates a diverse pattern.
文摘In this paper we introduce a video post-processing method that enhances the rhythm of a dancing performance,in the sense that the dancing movements are more in time to the beat of the music.The dancing performance as observed in a video is analyzed and segmented into motion intervals delimited by motion beats.We present an image-space method to extract the motion beats of a video by detecting frames at which there is a significant change in direction or motion stops.The motion beats are then synchronized with the music beats such that as many beats as possible are matched with as little as possible time-warping distortion to the video.We show two applications for this cross-media synchronization:one where a given dance performance is enhanced to be better synchronized with its original music,and one where a given dance video is automatically adapted to be synchronized with different music.