期刊文献+
共找到52,536篇文章
< 1 2 250 >
每页显示 20 50 100
Automatic recognition of depression based on audio and video:A review
1
作者 Meng-Meng Han Xing-Yun Li +4 位作者 Xin-Yu Yi Yun-Shao Zheng Wei-Li Xia Ya-Fei Liu Qing-Xiang Wang 《World Journal of Psychiatry》 SCIE 2024年第2期225-233,共9页
Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary mea... Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment.Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized.Specialized physicians usually require extensive training and experience to capture changes in these features.Advancements in deep learning technology have provided technical support for capturing non-biological markers.Several researchers have proposed automatic depression estimation(ADE)systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening.This article summarizes commonly used public datasets and recent research on audio-and video-based ADE based on three perspectives:Datasets,deficiencies in existing research,and future development directions. 展开更多
关键词 Depression recognition Deep learning Automatic depression estimation System audio processing Image processing Feature fusion Future development
下载PDF
Audio Description for Educational Videos on COVID-19 Response:A Corpus-Based Study on Linguistic and Textual Idiosyncrasies
2
作者 XIONG Ling-song 《Journal of Literature and Art Studies》 2023年第4期276-285,共10页
Audio description(AD),unlike interlingual translation and interpretation,is subject to unique constraints as a spoken text.Facilitated by AD,educational videos on COVID-19 anti-virus measures are made accessible to th... Audio description(AD),unlike interlingual translation and interpretation,is subject to unique constraints as a spoken text.Facilitated by AD,educational videos on COVID-19 anti-virus measures are made accessible to the visually disadvantaged.In this study,a corpus of AD of COVID-19 educational videos is developed,named“Audio Description Corpus of COVID-19 Educational Videos”(ADCCEV).Drawing on the model of Textual and Linguistic Audio Description Matrix(TLADM),this paper aims to identify the linguistic and textual idiosyncrasies of AD themed on COVID-19 response released by the New Zealand Government.This study finds that linguistically,the AD script uses a mix of complete sentences and phrases,the majority being in Present Simple tense.Present participles and the“with”structure are used for brevity.Vocabulary is diverse,with simpler words for animated explainers.Third-person pronouns are common in educational videos.Color words are a salient feature of AD,where“yellow”denotes urgency,and“red”indicates importance,negativity,and hostility.On textual idiosyncrasies,coherence is achieved through intermodal components that align with the video’s mood and style.AD style varies depending on the video’s purpose,from informative to narrative or expressive. 展开更多
关键词 audio Description COVID-19 educational videos corpus-based study
下载PDF
访森海塞尔中国内地地区专业音频Audio for Video销售负责人贾毅阳及诺音曼中国内地地区销售负责人储海涛
3
作者 曹徐洋 《现代电视技术》 2023年第9期48-49,共2页
BIRTV2023期间,在中央广播电视总台展台《现代电视技术》现场访谈间,本刊对森海塞尔中国内地地区专业音频Audio for Video销售负责人贾毅阳以及诺音曼中国内地地区销售负责人储海涛进行了采访,采访围绕两个品牌的产品亮点、优势及市场... BIRTV2023期间,在中央广播电视总台展台《现代电视技术》现场访谈间,本刊对森海塞尔中国内地地区专业音频Audio for Video销售负责人贾毅阳以及诺音曼中国内地地区销售负责人储海涛进行了采访,采访围绕两个品牌的产品亮点、优势及市场定位等话题展开。曹徐洋:在今年的BIRTV展会上,森海塞尔和诺音曼的展台都展出了大量优秀的产品,这些产品里有哪些是重点推出的?请介绍一下它们的主要亮点。 展开更多
关键词 专业音频 森海塞尔 BIRTV 现场访谈 市场定位 audio video 广播电视总台
下载PDF
Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video 被引量:1
4
作者 Liu Hua-yong, Zhou Dong-ru School of Computer,Wuhan University,Wuhan 430072, Hubei, China 《Wuhan University Journal of Natural Sciences》 CAS 2003年第04A期1070-1074,共5页
Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The p... Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust. 展开更多
关键词 news video story segmentation audio-visual features analysis text detection
下载PDF
Content-Based Hierarchical Analysis of News Video Using Audio and Visual Information
5
作者 Yu Jun-qing Zhou Dong-ru +1 位作者 Jin Ye Liu Hua-yong 《Wuhan University Journal of Natural Sciences》 EI CAS 2001年第4期779-783,共5页
A schema for content-based analysis of broadcast news video is presented. First, we separate commercials from news using audiovisual features. Then, we automatically organize news programs into a content hierarchy at ... A schema for content-based analysis of broadcast news video is presented. First, we separate commercials from news using audiovisual features. Then, we automatically organize news programs into a content hierarchy at various levels of abstraction via effective integration of video, audio, and text data available from the news programs. Based on these news video structure and content analysis technologies, a TV news video Library is generated, from which users can retrieve definite news story according to their demands. 展开更多
关键词 CONTENT-BASED audio news video SEGMENTATION
下载PDF
Study on an Audio and Video Network Monitoring System for Weather Modification Operation
6
作者 Yilin Wang Xueyi Xu +2 位作者 Desheng Xu Changzong Miao Gang Zhao 《Meteorological and Environmental Research》 CAS 2013年第1期5-7,共3页
An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The a... An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The all-in-one machine of 3G audio and video network highly integrates all front-end devices used for audio and video collection, communication, power supply and information storage, and has advantages of wireless video transmission, clear two-way voice intercom with the command center, waterproof and dustproof function, simple operation, good portability, and long working hours. Compression code of the system is transmitted by dynamic bandwidth, and compression rate varies from 32 kbps to 4 Mbps under different network conditions. This system has forwarding mode, that is, monitoring information from each front-end monitoring point is trans- mitted to the server of the command center by 3G/ADSL, and the server codes'and decodes again, then beck-end users call images from the serv- er, which can address 3G network stoppage caused by many users calling front-end video at the same time. In addition, the system has been ap- plied in surface weather modification operation of Tai'an City, and has made a great contribution to transmitting operation orders in real time, monitoring, standardizing and recording operating process, and improving operating safety. 展开更多
关键词 Weather modification operation Network monitoring audio and video INTEGRATION China
下载PDF
Stylistic Analysis of Internet News——Taking Internet Video Newsand Internet Audio News as Examples
7
作者 周逸轩 《海外英语》 2019年第9期212-213,共2页
With the rapid development of Internet around the world, network is transmitting all kinds of information to human beings nowadays. Net news, also called cyber news is affecting people’s expression of daily English. ... With the rapid development of Internet around the world, network is transmitting all kinds of information to human beings nowadays. Net news, also called cyber news is affecting people’s expression of daily English. A large number of cyber words, phrases even sentences, which are different from conventional English, are formed and become popular in the cyber world. This paper discusses different markers of net news by taking Internet video news and Internet audio news as examples so that the readers can fully understand the properties of net news. 展开更多
关键词 INTERNET NEWS INTERNET video NEWS INTERNET audio NEWS STYLISTICS features of INTERNET NEWS
下载PDF
跃威USB VIDEO AUDIO延长器
8
作者 Shawn 《数字世界》 2007年第8期67-67,共1页
亲爱的,俺把电脑延长了科技的发展有时候总会让人措手不及。当我还在犹豫到底是否需要斥“巨资”购买一台HDTV的时候,发现只需要用USB VIDEO AUDIO延长器就可以把书房的电脑延伸到客厅中,—切问题都迎刃而解了。
关键词 video audio
下载PDF
The College Video English Visual-audio-oral Learning System
9
作者 Jianghui Liu Hongting Wang Xiaodan Li 《教育研究前沿(中英文版)》 2019年第3期183-188,共6页
In order to respond to the need of social development,cultivate international talents,and improve the current English teaching mode,this paper studies video English visual-audio-oral learning system based on machine l... In order to respond to the need of social development,cultivate international talents,and improve the current English teaching mode,this paper studies video English visual-audio-oral learning system based on machine learning from the perspective of teaching and learning video English.It mainly analyzes the knowledge discovery process of machine learning,the design and application of video English visual-audio-oral learning system.It is found that the video English visual-audio-oral learning system based on machine learning has much higher level of practicality and efficiency compared with the traditional English language teaching in real life.The application of this system can also be of great significance in changes on language learning modes and methods in the future. 展开更多
关键词 video English Visual-audio-oral Learning Machine Learning Learning System
下载PDF
AB005.The effect of audio quality on eye movements in a video chat
10
作者 Sophie Hallot Aaron Johnson 《Annals of Eye Science》 2019年第1期180-180,共1页
Background:Difficulty in hearing can occur for numerous reasons across a variety of ages in humans.To overcome this,humans can employ a number of techniques to help improve their understanding of sound in other ways.O... Background:Difficulty in hearing can occur for numerous reasons across a variety of ages in humans.To overcome this,humans can employ a number of techniques to help improve their understanding of sound in other ways.One is to use vision,and attempt to lip-read in order to understand someone else in a face-to-face conversation.Audio-visual integration has a long history in perception(e.g.,the McGurk Effect),and researchers have shown that older adults will look at the mouth region for additional information in noisy situations.However,this concept has not been explored in the context of social media.A common way to communicate virtually that simulates a live conversation is the concept of video chatting or conferencing.It is used for a variety of reasons including work,maintaining social interactions,and has started to be used in clinical settings.However,video chat session quality is often sub-optimal,and may contain degraded audio and/or decoupled audio and video.The goal of this study is to determine whether humans use the same visual compensation mechanism,lip reading,in a digital setting as they would in a face-to-face conversation.Methods:The participants(n=116,age 18 to 41)answered a demographics questionnaire including questions about their use of the video chatting software.Then,the participants viewed two videos of a video call:one with synchronized audio and video,and the other dyssynchronous(1 second delay).The order of video was randomized across participants.Binocular eye movements were monitored at 60 Hz using a Mirametrix S2 eye tracker connected to Ogama 5.0(http://www.ogama.net/).After each video,the participants answered questions about the call quality,and the content of the video.Results:There was no significant difference in the total dwell time at the eyes and the mouth of the speaker remained,t(116)=−1.574,P=0.059,d=−0.147,BF10=0.643.However,using the heat maps generated by Ogama,we observed when viewing the poor-quality video,the participants looked more towards the mouth than the eyes of the speaker.It was found that as call quality decreased,the number of fixations increased from n=79.87 in the synchronous condition to n=113.4 in the asynchronous condition,and the median duration of each fixation decreased from 218.3 ms in the synchronous condition to 205ms in the asynchronous condition.Conclusions:The above results may indicate that humans employ similar compensation mechanisms in response to a decrease in auditory comprehension,given the tendency of participants looking towards the mouth of the speaker more.However,more study is needed because of the inconsistency in the results. 展开更多
关键词 video chat audio-visual integration social media visual compensation
下载PDF
称雄未来:—Windows Media Audio and Video8概览
11
作者 ChinaByte.Jaro 《计算机应用文摘》 2001年第2期100-101,共2页
关键词 图像处理软件 WindowsMediaaudioAn video8 音频文件
下载PDF
安桥DV—S939 DVD Video/Audio兼容机
12
作者 管正 《现代音响技术》 2001年第7期13-13,共1页
关键词 安桥DV-S939 DVDvideo/audio兼容机 影碟机
下载PDF
Research on the Audio Publishing of Classical Books Based on the Theory of Business Model Canvas: Taking Romance of the Three Kingdoms in Digital Audio Platforms as an Example
13
作者 Ding Qin Liu Mengzhi 《Contemporary Social Sciences》 2024年第1期58-74,共17页
Visual media have dominated sensory communications for decades,and the resulting“visual hegemony”leads to the call for the“auditory return”in order to achieve a holistic balance in cultural acceptance.Romance of t... Visual media have dominated sensory communications for decades,and the resulting“visual hegemony”leads to the call for the“auditory return”in order to achieve a holistic balance in cultural acceptance.Romance of the Three Kingdoms,a classic literary work in China,has received significant attention and promotion from leading audio platforms.However,the commercialization of digital audio publishing faces unprecedented challenges due to the mismatch between the dissemination of long-form content on digital audio platforms and the current trend of short and fast information reception.Drawing on the Business Model Canvas Theory and taking Romance of the Three Kingdoms as the main focus of analysis,this paper argues that the construction of a business model for the audio publishing of classical books should start from three aspects:the user evaluation of digital audio platforms,the establishment of value propositions based on the“creative transformation and innovative development”principle,and the improvement of the audio publishing infrastructure to ensure the healthy operation and development of the digital audio platforms and consequently improve their current state of development and expand the boundaries of cultural heritage. 展开更多
关键词 Romance of the Three Kingdoms audio publishing Business Model Canvas digital audio platforms
下载PDF
Audio2AB:Audio-driven collaborative generation of virtual character animation
14
作者 Lichao NIU Wenjun XIE +2 位作者 Dong WANG Zhongrui CAO Xiaoping LIU 《虚拟现实与智能硬件(中英文)》 EI 2024年第1期56-70,共15页
Background Considerable research has been conducted in the areas of audio-driven virtual character gestures and facial animation with some degree of success.However,few methods exist for generating full-body animation... Background Considerable research has been conducted in the areas of audio-driven virtual character gestures and facial animation with some degree of success.However,few methods exist for generating full-body animations,and the portability of virtual character gestures and facial animations has not received sufficient attention.Methods Therefore,we propose a deep-learning-based audio-to-animation-and-blendshape(Audio2AB)network that generates gesture animations and ARK it's 52 facial expression parameter blendshape weights based on audio,audio-corresponding text,emotion labels,and semantic relevance labels to generate parametric data for full-body animations.This parameterization method can be used to drive full-body animations of virtual characters and improve their portability.In the experiment,we first downsampled the gesture and facial data to achieve the same temporal resolution for the input,output,and facial data.The Audio2AB network then encoded the audio,audio-corresponding text,emotion labels,and semantic relevance labels,and then fused the text,emotion labels,and semantic relevance labels into the audio to obtain better audio features.Finally,we established links between the body,gestures,and facial decoders and generated the corresponding animation sequences through our proposed GAN-GF loss function.Results By using audio,audio-corresponding text,and emotional and semantic relevance labels as input,the trained Audio2AB network could generate gesture animation data containing blendshape weights.Therefore,different 3D virtual character animations could be created through parameterization.Conclusions The experimental results showed that the proposed method could generate significant gestures and facial animations. 展开更多
关键词 audio-driven Virtual character Full-body animation audio2AB Blendshape GAN-GF
下载PDF
Should Audio and Video Recordings for Public Safety in a Ride-Sharing Vehicle Be Allowed?
15
《ChinAfrica》 2018年第7期12-13,共2页
关键词 video Recordings Public Safety
下载PDF
Customized Convolutional Neural Network for Accurate Detection of Deep Fake Images in Video Collections 被引量:1
16
作者 Dmitry Gura Bo Dong +1 位作者 Duaa Mehiar Nidal Al Said 《Computers, Materials & Continua》 SCIE EI 2024年第5期1995-2014,共20页
The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method in... The motivation for this study is that the quality of deep fakes is constantly improving,which leads to the need to develop new methods for their detection.The proposed Customized Convolutional Neural Network method involves extracting structured data from video frames using facial landmark detection,which is then used as input to the CNN.The customized Convolutional Neural Network method is the date augmented-based CNN model to generate‘fake data’or‘fake images’.This study was carried out using Python and its libraries.We used 242 films from the dataset gathered by the Deep Fake Detection Challenge,of which 199 were made up and the remaining 53 were real.Ten seconds were allotted for each video.There were 318 videos used in all,199 of which were fake and 119 of which were real.Our proposedmethod achieved a testing accuracy of 91.47%,loss of 0.342,and AUC score of 0.92,outperforming two alternative approaches,CNN and MLP-CNN.Furthermore,our method succeeded in greater accuracy than contemporary models such as XceptionNet,Meso-4,EfficientNet-BO,MesoInception-4,VGG-16,and DST-Net.The novelty of this investigation is the development of a new Convolutional Neural Network(CNN)learning model that can accurately detect deep fake face photos. 展开更多
关键词 Deep fake detection video analysis convolutional neural network machine learning video dataset collection facial landmark prediction accuracy models
下载PDF
Pulse rate estimation based on facial videos:an evaluation and optimization of the classical methods using both self-constructed and public datasets 被引量:1
17
作者 Chao-Yong Wu Jian-Xin Chen +3 位作者 Yu Chen Ai-Ping Chen Lu Zhou Xu Wang 《Traditional Medicine Research》 2024年第1期14-22,共9页
Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate b... Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate based on facial video is an exciting research field for getting palpation information by observation diagnosis.However,most studies focus on optimizing the algorithm based on a small sample of participants without systematically investigating multiple influencing factors.A total of 209 participants and 2,435 facial videos,based on our self-constructed Multi-Scene Sign Dataset and the public datasets,were used to perform a multi-level and multi-factor comprehensive comparison.The effects of different datasets,blood volume pulse signal extraction algorithms,region of interests,time windows,color spaces,pulse rate calculation methods,and video recording scenes were analyzed.Furthermore,we proposed a blood volume pulse signal quality optimization strategy based on the inverse Fourier transform and an improvement strategy for pulse rate estimation based on signal-to-noise ratio threshold sliding.We found that the effects of video estimation of pulse rate in the Multi-Scene Sign Dataset and Pulse Rate Detection Dataset were better than in other datasets.Compared with Fast independent component analysis and Single Channel algorithms,chrominance-based method and plane-orthogonal-to-skin algorithms have a more vital anti-interference ability and higher robustness.The performances of the five-organs fusion area and the full-face area were better than that of single sub-regions,and the fewer motion artifacts and better lighting can improve the precision of pulse rate estimation. 展开更多
关键词 pulse rate heart rate PHOTOPLETHYSMOGRAPHY observation and pulse diagnosis facial videos
下载PDF
MarkINeRV: A Robust Watermarking Scheme for Neural Representation for Videos Based on Invertible Neural Networks
18
作者 Wenquan Sun Jia Liu +2 位作者 Lifeng Chen Weina Dong Fuqiang Di 《Computers, Materials & Continua》 SCIE EI 2024年第9期4031-4046,共16页
Recent research advances in implicit neural representation have shown that a wide range of video data distributions are achieved by sharing model weights for Neural Representation for Videos(NeRV).While explicit metho... Recent research advances in implicit neural representation have shown that a wide range of video data distributions are achieved by sharing model weights for Neural Representation for Videos(NeRV).While explicit methods exist for accurately embedding ownership or copyright information in video data,the nascent NeRV framework has yet to address this issue comprehensively.In response,this paper introduces MarkINeRV,a scheme designed to embed watermarking information into video frames using an invertible neural network watermarking approach to protect the copyright of NeRV,which models the embedding and extraction of watermarks as a pair of inverse processes of a reversible network and employs the same network to achieve embedding and extraction of watermarks.It is just that the information flow is in the opposite direction.Additionally,a video frame quality enhancement module is incorporated to mitigate watermarking information losses in the rendering process and the possibility ofmalicious attacks during transmission,ensuring the accurate extraction of watermarking information through the invertible network’s inverse process.This paper evaluates the accuracy,robustness,and invisibility of MarkINeRV through multiple video datasets.The results demonstrate its efficacy in extracting watermarking information for copyright protection of NeRV.MarkINeRV represents a pioneering investigation into copyright issues surrounding NeRV. 展开更多
关键词 Invertible neural network neural representations for videos WATERMARKING ROBUSTNESS
下载PDF
A Hybrid Machine Learning Approach for Improvised QoE in Video Services over 5G Wireless Networks
19
作者 K.B.Ajeyprasaath P.Vetrivelan 《Computers, Materials & Continua》 SCIE EI 2024年第3期3195-3213,共19页
Video streaming applications have grown considerably in recent years.As a result,this becomes one of the most significant contributors to global internet traffic.According to recent studies,the telecommunications indu... Video streaming applications have grown considerably in recent years.As a result,this becomes one of the most significant contributors to global internet traffic.According to recent studies,the telecommunications industry loses millions of dollars due to poor video Quality of Experience(QoE)for users.Among the standard proposals for standardizing the quality of video streaming over internet service providers(ISPs)is the Mean Opinion Score(MOS).However,the accurate finding of QoE by MOS is subjective and laborious,and it varies depending on the user.A fully automated data analytics framework is required to reduce the inter-operator variability characteristic in QoE assessment.This work addresses this concern by suggesting a novel hybrid XGBStackQoE analytical model using a two-level layering technique.Level one combines multiple Machine Learning(ML)models via a layer one Hybrid XGBStackQoE-model.Individual ML models at level one are trained using the entire training data set.The level two Hybrid XGBStackQoE-Model is fitted using the outputs(meta-features)of the layer one ML models.The proposed model outperformed the conventional models,with an accuracy improvement of 4 to 5 percent,which is still higher than the current traditional models.The proposed framework could significantly improve video QoE accuracy. 展开更多
关键词 Hybrid XGBStackQoE-model machine learning MOS performance metrics QOE 5G video services
下载PDF
Exploring Frontier Technologies in Video-Based Person Re-Identification:A Survey on Deep Learning Approach
20
作者 Jiahe Wang Xizhan Gao +1 位作者 Fa Zhu Xingchi Chen 《Computers, Materials & Continua》 SCIE EI 2024年第10期25-51,共27页
Video-based person re-identification(Re-ID),a subset of retrieval tasks,faces challenges like uncoordinated sample capturing,viewpoint variations,occlusions,cluttered backgrounds,and sequence uncertainties.Recent adva... Video-based person re-identification(Re-ID),a subset of retrieval tasks,faces challenges like uncoordinated sample capturing,viewpoint variations,occlusions,cluttered backgrounds,and sequence uncertainties.Recent advancements in deep learning have significantly improved video-based person Re-ID,laying a solid foundation for further progress in the field.In order to enrich researchers’insights into the latest research findings and prospective developments,we offer an extensive overview and meticulous analysis of contemporary video-based person ReID methodologies,with a specific emphasis on network architecture design and loss function design.Firstly,we introduce methods based on network architecture design and loss function design from multiple perspectives,and analyzes the advantages and disadvantages of these methods.Furthermore,we provide a synthesis of prevalent datasets and key evaluation metrics utilized within this field to assist researchers in assessing methodological efficacy and establishing benchmarks for performance evaluation.Lastly,through a critical evaluation of the experimental outcomes derived from various methodologies across four prominent public datasets,we identify promising research avenues and offer valuable insights to steer future exploration and innovation in this vibrant and evolving field of video-based person Re-ID.This comprehensive analysis aims to equip researchers with the necessary knowledge and strategic foresight to navigate the complexities of video-based person Re-ID,fostering continued progress and breakthroughs in this challenging yet promising research domain. 展开更多
关键词 video-based person Re-ID deep learning survey of video Re-ID loss function
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部