Zero-watermark technique,embedding watermark without modifying carriers,has been broadly applied for copyright protection of images.However,there is little research on audio zerowatermark.This paper proposes an audio ...Zero-watermark technique,embedding watermark without modifying carriers,has been broadly applied for copyright protection of images.However,there is little research on audio zerowatermark.This paper proposes an audio zero-watermark scheme based on energy relationship between adjacent audio sections.Taking use of discrete wavelet transformation(DWT),it gets power approximations,or energies,of audio segments.Then,it extracts the audio profile,i.e.the zero-watermark,according to the relative size of energies of consecutive fragments.The experimental results demonstrate that the proposed scheme is robust against general malicious attacks including noise addition,resampling,low-pass filtering,etc.,and this approach effectively solves the contradiction between inaudibility and robustness.展开更多
Traditional watermark embedding schemes inevitably modify the data in a host audio signal and lead to the degradation of the host signal.In this paper,a novel audio zero-watermarking algorithm based on discrete wavele...Traditional watermark embedding schemes inevitably modify the data in a host audio signal and lead to the degradation of the host signal.In this paper,a novel audio zero-watermarking algorithm based on discrete wavelet transform(DWT),discrete cosine transform(DCT),and singular value decomposition(SVD) is presented.The watermark is registered by performing SVD on the coefficients generated through DWT and DCT to avoid data modification and host signal degradation.Simulation results show that the proposed zero-watermarking algorithm is strongly robust to common signal processing methods such as requantization,MP3 compression,resampling,addition of white Gaussian noise,and low-pass filtering.展开更多
In the intricate network environment,the secure transmission of medical images faces challenges such as information leakage and malicious tampering,significantly impacting the accuracy of disease diagnoses by medical ...In the intricate network environment,the secure transmission of medical images faces challenges such as information leakage and malicious tampering,significantly impacting the accuracy of disease diagnoses by medical professionals.To address this problem,the authors propose a robust feature watermarking algorithm for encrypted medical images based on multi-stage discrete wavelet transform(DWT),Daisy descriptor,and discrete cosine transform(DCT).The algorithm initially encrypts the original medical image through DWT-DCT and Logistic mapping.Subsequently,a 3-stage DWT transformation is applied to the encrypted medical image,with the centre point of the LL3 sub-band within its low-frequency component serving as the sampling point.The Daisy descriptor matrix for this point is then computed.Finally,a DCT transformation is performed on the Daisy descriptor matrix,and the low-frequency portion is processed using the perceptual hashing algorithm to generate a 32-bit binary feature vector for the medical image.This scheme utilises cryptographic knowledge and zero-watermarking technique to embed watermarks without modifying medical images and can extract the watermark from test images without the original image,which meets the basic re-quirements of medical image watermarking.The embedding and extraction of water-marks are accomplished in a mere 0.160 and 0.411s,respectively,with minimal computational overhead.Simulation results demonstrate the robustness of the algorithm against both conventional attacks and geometric attacks,with a notable performance in resisting rotation attacks.展开更多
Visual media have dominated sensory communications for decades,and the resulting“visual hegemony”leads to the call for the“auditory return”in order to achieve a holistic balance in cultural acceptance.Romance of t...Visual media have dominated sensory communications for decades,and the resulting“visual hegemony”leads to the call for the“auditory return”in order to achieve a holistic balance in cultural acceptance.Romance of the Three Kingdoms,a classic literary work in China,has received significant attention and promotion from leading audio platforms.However,the commercialization of digital audio publishing faces unprecedented challenges due to the mismatch between the dissemination of long-form content on digital audio platforms and the current trend of short and fast information reception.Drawing on the Business Model Canvas Theory and taking Romance of the Three Kingdoms as the main focus of analysis,this paper argues that the construction of a business model for the audio publishing of classical books should start from three aspects:the user evaluation of digital audio platforms,the establishment of value propositions based on the“creative transformation and innovative development”principle,and the improvement of the audio publishing infrastructure to ensure the healthy operation and development of the digital audio platforms and consequently improve their current state of development and expand the boundaries of cultural heritage.展开更多
Background Considerable research has been conducted in the areas of audio-driven virtual character gestures and facial animation with some degree of success.However,few methods exist for generating full-body animation...Background Considerable research has been conducted in the areas of audio-driven virtual character gestures and facial animation with some degree of success.However,few methods exist for generating full-body animations,and the portability of virtual character gestures and facial animations has not received sufficient attention.Methods Therefore,we propose a deep-learning-based audio-to-animation-and-blendshape(Audio2AB)network that generates gesture animations and ARK it's 52 facial expression parameter blendshape weights based on audio,audio-corresponding text,emotion labels,and semantic relevance labels to generate parametric data for full-body animations.This parameterization method can be used to drive full-body animations of virtual characters and improve their portability.In the experiment,we first downsampled the gesture and facial data to achieve the same temporal resolution for the input,output,and facial data.The Audio2AB network then encoded the audio,audio-corresponding text,emotion labels,and semantic relevance labels,and then fused the text,emotion labels,and semantic relevance labels into the audio to obtain better audio features.Finally,we established links between the body,gestures,and facial decoders and generated the corresponding animation sequences through our proposed GAN-GF loss function.Results By using audio,audio-corresponding text,and emotional and semantic relevance labels as input,the trained Audio2AB network could generate gesture animation data containing blendshape weights.Therefore,different 3D virtual character animations could be created through parameterization.Conclusions The experimental results showed that the proposed method could generate significant gestures and facial animations.展开更多
Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary mea...Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment.Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized.Specialized physicians usually require extensive training and experience to capture changes in these features.Advancements in deep learning technology have provided technical support for capturing non-biological markers.Several researchers have proposed automatic depression estimation(ADE)systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening.This article summarizes commonly used public datasets and recent research on audio-and video-based ADE based on three perspectives:Datasets,deficiencies in existing research,and future development directions.展开更多
Behind the prevalence of multimedia technology,digital copyright disputes are becoming increasingly serious.The digital watermarking prevention technique against the copyright infringement needs to be improved urgentl...Behind the prevalence of multimedia technology,digital copyright disputes are becoming increasingly serious.The digital watermarking prevention technique against the copyright infringement needs to be improved urgently.Among the proposed technologies,zero-watermarking has been favored recently.In order to improve the robustness of the zero-watermarking,a novel robust audio zerowatermarking method based on sparse representation is proposed.The proposed scheme is mainly based on the K-singular value decomposition(K-SVD)algorithm to construct an optimal over complete dictionary from the background audio signal.After that,the orthogonal matching pursuit(OMP)algorithm is used to calculate the sparse coefficient of the segmented test audio and generate the corresponding sparse coefficient matrix.Then,the mean value of absolute sparse coefficients in the sparse matrix of segmented speech is calculated and selected,and then comparing the mean absolute coefficient of segmented speech with the average value of the selected coefficients to realize the embedding of zero-watermarking.Experimental results show that the proposed audio zerowatermarking algorithm based on sparse representation performs effectively in resisting various common attacks.Compared with the baseline works,the proposed method has better robustness.展开更多
基金This work is supported by the National Natural Science Foundation of China under Grant No. 61170269, No. 61170272, No. 61202082, Beijing Natural Science Foundation under Grant No. 4122026, the Fundamental Research Funds for the Central Universities under Grant No. BUPT2013RC0308, No. BUPT2013RC0311, and Scientific Research Common Program of Beijing Municipal Commission of Education under Grant No. KM201210015007, KM201210015006.
文摘Zero-watermark technique,embedding watermark without modifying carriers,has been broadly applied for copyright protection of images.However,there is little research on audio zerowatermark.This paper proposes an audio zero-watermark scheme based on energy relationship between adjacent audio sections.Taking use of discrete wavelet transformation(DWT),it gets power approximations,or energies,of audio segments.Then,it extracts the audio profile,i.e.the zero-watermark,according to the relative size of energies of consecutive fragments.The experimental results demonstrate that the proposed scheme is robust against general malicious attacks including noise addition,resampling,low-pass filtering,etc.,and this approach effectively solves the contradiction between inaudibility and robustness.
基金supported by the Open Foundation of Jiangsu Engineering Center of Network Monitoring(Nanjing University of Information Science&Technology)(Grant No.KJR1509)the PAPD fundthe CICAEET fund
文摘Traditional watermark embedding schemes inevitably modify the data in a host audio signal and lead to the degradation of the host signal.In this paper,a novel audio zero-watermarking algorithm based on discrete wavelet transform(DWT),discrete cosine transform(DCT),and singular value decomposition(SVD) is presented.The watermark is registered by performing SVD on the coefficients generated through DWT and DCT to avoid data modification and host signal degradation.Simulation results show that the proposed zero-watermarking algorithm is strongly robust to common signal processing methods such as requantization,MP3 compression,resampling,addition of white Gaussian noise,and low-pass filtering.
基金National Natural Science Foundation of China,Grant/Award Numbers:62063004,62350410483Key Research and Development Project of Hainan Province,Grant/Award Number:ZDYF2021SHFZ093Zhejiang Provincial Postdoctoral Science Foundation,Grant/Award Number:ZJ2021028。
文摘In the intricate network environment,the secure transmission of medical images faces challenges such as information leakage and malicious tampering,significantly impacting the accuracy of disease diagnoses by medical professionals.To address this problem,the authors propose a robust feature watermarking algorithm for encrypted medical images based on multi-stage discrete wavelet transform(DWT),Daisy descriptor,and discrete cosine transform(DCT).The algorithm initially encrypts the original medical image through DWT-DCT and Logistic mapping.Subsequently,a 3-stage DWT transformation is applied to the encrypted medical image,with the centre point of the LL3 sub-band within its low-frequency component serving as the sampling point.The Daisy descriptor matrix for this point is then computed.Finally,a DCT transformation is performed on the Daisy descriptor matrix,and the low-frequency portion is processed using the perceptual hashing algorithm to generate a 32-bit binary feature vector for the medical image.This scheme utilises cryptographic knowledge and zero-watermarking technique to embed watermarks without modifying medical images and can extract the watermark from test images without the original image,which meets the basic re-quirements of medical image watermarking.The embedding and extraction of water-marks are accomplished in a mere 0.160 and 0.411s,respectively,with minimal computational overhead.Simulation results demonstrate the robustness of the algorithm against both conventional attacks and geometric attacks,with a notable performance in resisting rotation attacks.
基金This study is a phased achievement of the“Research on Innovative Communication of Romance of the Three Kingdoms under Audio Empowerment”project(No.23ZGL16)funded by Zhuge Liang Research Center,a key research base of social sciences in Sichuan Province.
文摘Visual media have dominated sensory communications for decades,and the resulting“visual hegemony”leads to the call for the“auditory return”in order to achieve a holistic balance in cultural acceptance.Romance of the Three Kingdoms,a classic literary work in China,has received significant attention and promotion from leading audio platforms.However,the commercialization of digital audio publishing faces unprecedented challenges due to the mismatch between the dissemination of long-form content on digital audio platforms and the current trend of short and fast information reception.Drawing on the Business Model Canvas Theory and taking Romance of the Three Kingdoms as the main focus of analysis,this paper argues that the construction of a business model for the audio publishing of classical books should start from three aspects:the user evaluation of digital audio platforms,the establishment of value propositions based on the“creative transformation and innovative development”principle,and the improvement of the audio publishing infrastructure to ensure the healthy operation and development of the digital audio platforms and consequently improve their current state of development and expand the boundaries of cultural heritage.
基金Supported by the National Natural Science Foundation of China (62277014)the National Key Research and Development Program of China (2020YFC1523100)the Fundamental Research Funds for the Central Universities of China (PA2023GDSK0047)。
文摘Background Considerable research has been conducted in the areas of audio-driven virtual character gestures and facial animation with some degree of success.However,few methods exist for generating full-body animations,and the portability of virtual character gestures and facial animations has not received sufficient attention.Methods Therefore,we propose a deep-learning-based audio-to-animation-and-blendshape(Audio2AB)network that generates gesture animations and ARK it's 52 facial expression parameter blendshape weights based on audio,audio-corresponding text,emotion labels,and semantic relevance labels to generate parametric data for full-body animations.This parameterization method can be used to drive full-body animations of virtual characters and improve their portability.In the experiment,we first downsampled the gesture and facial data to achieve the same temporal resolution for the input,output,and facial data.The Audio2AB network then encoded the audio,audio-corresponding text,emotion labels,and semantic relevance labels,and then fused the text,emotion labels,and semantic relevance labels into the audio to obtain better audio features.Finally,we established links between the body,gestures,and facial decoders and generated the corresponding animation sequences through our proposed GAN-GF loss function.Results By using audio,audio-corresponding text,and emotional and semantic relevance labels as input,the trained Audio2AB network could generate gesture animation data containing blendshape weights.Therefore,different 3D virtual character animations could be created through parameterization.Conclusions The experimental results showed that the proposed method could generate significant gestures and facial animations.
基金Supported by Shandong Province Key R and D Program,No.2021SFGC0504Shandong Provincial Natural Science Foundation,No.ZR2021MF079Science and Technology Development Plan of Jinan(Clinical Medicine Science and Technology Innovation Plan),No.202225054.
文摘Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment.Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized.Specialized physicians usually require extensive training and experience to capture changes in these features.Advancements in deep learning technology have provided technical support for capturing non-biological markers.Several researchers have proposed automatic depression estimation(ADE)systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening.This article summarizes commonly used public datasets and recent research on audio-and video-based ADE based on three perspectives:Datasets,deficiencies in existing research,and future development directions.
基金the National Natural Science Foundation of China(No.62001100)the Fundamental Research Funds for the Central Universities(No.2232019D3-52)Shanghai Sailing Program.(No.19YF1402000).
文摘Behind the prevalence of multimedia technology,digital copyright disputes are becoming increasingly serious.The digital watermarking prevention technique against the copyright infringement needs to be improved urgently.Among the proposed technologies,zero-watermarking has been favored recently.In order to improve the robustness of the zero-watermarking,a novel robust audio zerowatermarking method based on sparse representation is proposed.The proposed scheme is mainly based on the K-singular value decomposition(K-SVD)algorithm to construct an optimal over complete dictionary from the background audio signal.After that,the orthogonal matching pursuit(OMP)algorithm is used to calculate the sparse coefficient of the segmented test audio and generate the corresponding sparse coefficient matrix.Then,the mean value of absolute sparse coefficients in the sparse matrix of segmented speech is calculated and selected,and then comparing the mean absolute coefficient of segmented speech with the average value of the selected coefficients to realize the embedding of zero-watermarking.Experimental results show that the proposed audio zerowatermarking algorithm based on sparse representation performs effectively in resisting various common attacks.Compared with the baseline works,the proposed method has better robustness.