Montage is a term in French architecture originally. Visual media borrows it to refer to the pictorial technique in which cut—out illustrations and fragments are re-arranged, producing a new composite picture with se...Montage is a term in French architecture originally. Visual media borrows it to refer to the pictorial technique in which cut—out illustrations and fragments are re-arranged, producing a new composite picture with several different images. Literary field applies this artistic technique as a vehicle to create works, especially for stream-of-consciousness fiction which flourished in the early 20 thcentury. This article turns to account for the similarity of montage and literature to divide the writing technique into narrative montage and space montage and dissect the montage use in William Faulkner's The Sound and the Fury which is distinguished by its space-skipping and mixed-expression.展开更多
Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The p...Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.展开更多
This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some ligh...This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some lights on our national audio-visual practice and research.The review on the Chinese scholars’ audio-visual translation studies is to offer the potential developing direction and guidelines to the studies and aspects neglected as well.Based on the summary of relevant studies,possible topics for further studies are proposed.展开更多
Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A mod...Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A modified Baum-Welch algorithm is proposed for component HMM learn- ing and adaptive boosting (AdaBoost) is used to train ensemble classifiers for different layers (cues). Except for the first layer, the initial weights of training samples in current layer are decided by recognition results of the ensemble classifier in the upper layer. Thus the training procedure using current cue can focus more on the difficult samples according to the previous cue. Our MBHMM clas- sifier is combined by these ensemble classifiers and takes advantage of the complementary informa- tion from multiple cues and modalities. Experimental results on audio-visual emotion data collected in Wizard of Oz scenarios and labeled under two types of emotion category sets demonstrate that our approach is effective and promising.展开更多
My Memories of Old Beijing,from the perspective of contemporary Aesthetics,explores the position of people in the multicontext-overlapping society,including pre-modern,modern,post-modern eras,by the mechanism of Audio...My Memories of Old Beijing,from the perspective of contemporary Aesthetics,explores the position of people in the multicontext-overlapping society,including pre-modern,modern,post-modern eras,by the mechanism of Audio-video Montage.Its aesthetic significance lies in its representation of one way to revolt against fate in the course of Chinese modernization,its effective change of realistic people’s emotional structure,and hence one modern form of tragic humanism.展开更多
February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese ...February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese chime bells for the American audience at the world s top-level Buntrock Hall at Symphony Center.展开更多
Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense...Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense of identity to the overall ethnic group through the influence of film and television and music,and on this basis constantly evolve a new culture in line with modern and contemporary life to further enhance their sense of belonging to the ethnic nation.展开更多
Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. Af...Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. After further analysis and combination with the characteristics of college English audio-visual teaching in China, it puts forward the application of task-based teaching method to college audio-visual English teaching and its steps, attempting to alleviate the avoidance phenomenon in students through task-based teaching method.展开更多
We often see different time and space scenes, different stories and different viewpoints change organically through one lens converge to form a complete film in the movie, and one of a series of the shearing lens elem...We often see different time and space scenes, different stories and different viewpoints change organically through one lens converge to form a complete film in the movie, and one of a series of the shearing lens elements and convergence is the montage technique. Such a method actually also often exists in different scenes in our ancient literature and different scenes and jumping thinking thoughts form a series of montage in the mind, and then it can be expressed through poetry. This case is the same as it in art, different time and spaces, the characters appear in a scene in a picture, and we distinguish them with a series of symbols to convergence in each scene. Therefore, we can say that if the montage was born after his birth of firms in the twentieth century, then it has existed since ancient time, and it has always existed in the fine arts, however, such thinking played an important role in the development of Chinese art, especially in the convergence and performance practices screen scene.展开更多
This article analyses montage theory for creating meaningful and engaging narratively in the novel The Mark on the Wall,where the dialectic relationship between space and setting that govern Montage shape the narrativ...This article analyses montage theory for creating meaningful and engaging narratively in the novel The Mark on the Wall,where the dialectic relationship between space and setting that govern Montage shape the narrative structure.So the research question is raised,"what kind of role that montage can play in a text?"Based on such questions,the purpose of the thesis is to prove that montage and stream of consciousness can serve to each other,so the methodology is text analysis to put montage in a specific context.This paper aims to demonstrate how the concept of montage within a literary text will be ultimately an issue of research methods,determined and defined by the constructed space.After analysis,the paper discovered that the text has built a narrative space to narrate the stream of thoughts;the metaphorical montage is also a vital part in the text.展开更多
The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are e...The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are extracted and classified into different groups according to their priority values and scalable layers (visual importance). These priority values are mapped to the 1P DiffServ per hop behaviors (PHB). This scheme can selectively discard packets with low importance, in order to avoid the network congestion. Simulation results show that the quality of received video can gracefully adapt to network state, as compared with the ‘best-effort' manner. Also, by allowing the content provider to define prioritization of each audio-visual object, the adaptive transmission of object-based scalable video can be customized based on the content.展开更多
Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types, such as audio and visual information. We harnessed this capability to...Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types, such as audio and visual information. We harnessed this capability to develop a deep learning model that utilizes Distil HuBERT for jointly learning these combined features in speech emotion recognition (SER). Our experiments highlight its distinct advantages: it significantly outperforms Wav2vec 2.0 in both offline and real-time accuracy on RAVDESS and BAVED datasets. Although slightly trailing HuBERT’s offline accuracy, Distil HuBERT shines with comparable performance at a fraction of the model size, making it an ideal choice for resource-constrained environments like mobile devices. This smaller size does come with a slight trade-off: Distil HuBERT achieved notable accuracy in offline evaluation, with 96.33% on the BAVED database and 87.01% on the RAVDESS database. In real-time evaluation, the accuracy decreased to 79.3% on the BAVED database and 77.87% on the RAVDESS database. This decrease is likely a result of the challenges associated with real-time processing, including latency and noise, but still demonstrates strong performance in practical scenarios. Therefore, Distil HuBERT emerges as a compelling choice for SER, especially when prioritizing accuracy over real-time processing. Its compact size further enhances its potential for resource-limited settings, making it a versatile tool for a wide range of applications.展开更多
In recent years,computing art has developed rapidly with the in-depth cross study of artificial intelligence generated con-tent(AIGC)and the main features of artworks.Audio-visual content generation has gradually been...In recent years,computing art has developed rapidly with the in-depth cross study of artificial intelligence generated con-tent(AIGC)and the main features of artworks.Audio-visual content generation has gradually been applied to various practical tasks,including video or game score,assisting artists in creation,art education and other aspects,which demonstrates a broad application pro-spect.In this paper,we introduce innovative achievements in audio-visual content generation from the perspective of visual art genera-tion and auditory art generation based on artificial intelligence(Al).We outline the development tendency of image and music datasets,visual and auditory content modelling,and related automatic generation systems.The objective and subjective evaluation of generated samples plays an important role in the measurement of algorithm performance.We provide a cogeneration mechanism of audio-visual content in multimodal tasks from image to music and display the construction of specific stylized datasets.There are still many new op-portunities and challenges in the field of audio-visual synesthesia generation,and we provide a comprehensive discussion on them.展开更多
文摘Montage is a term in French architecture originally. Visual media borrows it to refer to the pictorial technique in which cut—out illustrations and fragments are re-arranged, producing a new composite picture with several different images. Literary field applies this artistic technique as a vehicle to create works, especially for stream-of-consciousness fiction which flourished in the early 20 thcentury. This article turns to account for the similarity of montage and literature to divide the writing technique into narrative montage and space montage and dissect the montage use in William Faulkner's The Sound and the Fury which is distinguished by its space-skipping and mixed-expression.
文摘Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.
文摘This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some lights on our national audio-visual practice and research.The review on the Chinese scholars’ audio-visual translation studies is to offer the potential developing direction and guidelines to the studies and aspects neglected as well.Based on the summary of relevant studies,possible topics for further studies are proposed.
基金Supported by the National Natural Science Foundation of China(60905006)the NSFC-Guangdong Joint Fund(U1035004)
文摘Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A modified Baum-Welch algorithm is proposed for component HMM learn- ing and adaptive boosting (AdaBoost) is used to train ensemble classifiers for different layers (cues). Except for the first layer, the initial weights of training samples in current layer are decided by recognition results of the ensemble classifier in the upper layer. Thus the training procedure using current cue can focus more on the difficult samples according to the previous cue. Our MBHMM clas- sifier is combined by these ensemble classifiers and takes advantage of the complementary informa- tion from multiple cues and modalities. Experimental results on audio-visual emotion data collected in Wizard of Oz scenarios and labeled under two types of emotion category sets demonstrate that our approach is effective and promising.
文摘My Memories of Old Beijing,from the perspective of contemporary Aesthetics,explores the position of people in the multicontext-overlapping society,including pre-modern,modern,post-modern eras,by the mechanism of Audio-video Montage.Its aesthetic significance lies in its representation of one way to revolt against fate in the course of Chinese modernization,its effective change of realistic people’s emotional structure,and hence one modern form of tragic humanism.
文摘February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese chime bells for the American audience at the world s top-level Buntrock Hall at Symphony Center.
基金This paper is the periodic research result of the research project:Basic Research Project of Beijing Institute of Graphic Communication:Research on the Artistic,Modern Communication and Publishing of Dian-shi Zhai Pictorial(1884-1898)(Serial Number Eb202008).
文摘Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense of identity to the overall ethnic group through the influence of film and television and music,and on this basis constantly evolve a new culture in line with modern and contemporary life to further enhance their sense of belonging to the ethnic nation.
文摘Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. After further analysis and combination with the characteristics of college English audio-visual teaching in China, it puts forward the application of task-based teaching method to college audio-visual English teaching and its steps, attempting to alleviate the avoidance phenomenon in students through task-based teaching method.
文摘We often see different time and space scenes, different stories and different viewpoints change organically through one lens converge to form a complete film in the movie, and one of a series of the shearing lens elements and convergence is the montage technique. Such a method actually also often exists in different scenes in our ancient literature and different scenes and jumping thinking thoughts form a series of montage in the mind, and then it can be expressed through poetry. This case is the same as it in art, different time and spaces, the characters appear in a scene in a picture, and we distinguish them with a series of symbols to convergence in each scene. Therefore, we can say that if the montage was born after his birth of firms in the twentieth century, then it has existed since ancient time, and it has always existed in the fine arts, however, such thinking played an important role in the development of Chinese art, especially in the convergence and performance practices screen scene.
文摘This article analyses montage theory for creating meaningful and engaging narratively in the novel The Mark on the Wall,where the dialectic relationship between space and setting that govern Montage shape the narrative structure.So the research question is raised,"what kind of role that montage can play in a text?"Based on such questions,the purpose of the thesis is to prove that montage and stream of consciousness can serve to each other,so the methodology is text analysis to put montage in a specific context.This paper aims to demonstrate how the concept of montage within a literary text will be ultimately an issue of research methods,determined and defined by the constructed space.After analysis,the paper discovered that the text has built a narrative space to narrate the stream of thoughts;the metaphorical montage is also a vital part in the text.
文摘The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are extracted and classified into different groups according to their priority values and scalable layers (visual importance). These priority values are mapped to the 1P DiffServ per hop behaviors (PHB). This scheme can selectively discard packets with low importance, in order to avoid the network congestion. Simulation results show that the quality of received video can gracefully adapt to network state, as compared with the ‘best-effort' manner. Also, by allowing the content provider to define prioritization of each audio-visual object, the adaptive transmission of object-based scalable video can be customized based on the content.
文摘Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types, such as audio and visual information. We harnessed this capability to develop a deep learning model that utilizes Distil HuBERT for jointly learning these combined features in speech emotion recognition (SER). Our experiments highlight its distinct advantages: it significantly outperforms Wav2vec 2.0 in both offline and real-time accuracy on RAVDESS and BAVED datasets. Although slightly trailing HuBERT’s offline accuracy, Distil HuBERT shines with comparable performance at a fraction of the model size, making it an ideal choice for resource-constrained environments like mobile devices. This smaller size does come with a slight trade-off: Distil HuBERT achieved notable accuracy in offline evaluation, with 96.33% on the BAVED database and 87.01% on the RAVDESS database. In real-time evaluation, the accuracy decreased to 79.3% on the BAVED database and 77.87% on the RAVDESS database. This decrease is likely a result of the challenges associated with real-time processing, including latency and noise, but still demonstrates strong performance in practical scenarios. Therefore, Distil HuBERT emerges as a compelling choice for SER, especially when prioritizing accuracy over real-time processing. Its compact size further enhances its potential for resource-limited settings, making it a versatile tool for a wide range of applications.
基金This work was supported by National Natural Science Foundation of China(No.62176006)the National Key Research and Development Program of China(No.2022YFF0902302).
文摘In recent years,computing art has developed rapidly with the in-depth cross study of artificial intelligence generated con-tent(AIGC)and the main features of artworks.Audio-visual content generation has gradually been applied to various practical tasks,including video or game score,assisting artists in creation,art education and other aspects,which demonstrates a broad application pro-spect.In this paper,we introduce innovative achievements in audio-visual content generation from the perspective of visual art genera-tion and auditory art generation based on artificial intelligence(Al).We outline the development tendency of image and music datasets,visual and auditory content modelling,and related automatic generation systems.The objective and subjective evaluation of generated samples plays an important role in the measurement of algorithm performance.We provide a cogeneration mechanism of audio-visual content in multimodal tasks from image to music and display the construction of specific stylized datasets.There are still many new op-portunities and challenges in the field of audio-visual synesthesia generation,and we provide a comprehensive discussion on them.