A filter algorithm based on cochlear mechanics and neuron filter mechanism is proposed from the view point of vibration.It helps to solve the problem that the non-linear amplification is rarely considered in studying ...A filter algorithm based on cochlear mechanics and neuron filter mechanism is proposed from the view point of vibration.It helps to solve the problem that the non-linear amplification is rarely considered in studying the auditory filters.A cochlear mechanical transduction model is built to illustrate the audio signals processing procedure in cochlea,and then the neuron filter mechanism is modeled to indirectly obtain the outputs with the cochlear properties of frequency tuning and non-linear amplification.The mathematic description of the proposed algorithm is derived by the two models.The parameter space,the parameter selection rules and the error correction of the proposed algorithm are discussed.The unit impulse responses in the time domain and the frequency domain are simulated and compared to probe into the characteristics of the proposed algorithm.Then a 24-channel filter bank is built based on the proposed algorithm and applied to the enhancements of the audio signals.The experiments and comparisons verify that,the proposed algorithm can effectively divide the audio signals into different frequencies,significantly enhance the high frequency parts,and provide positive impacts on the performance of speech enhancement in different noise environments,especially for the babble noise and the volvo noise.展开更多
Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still c...Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still considered as a challenging task due to the difficulty of extracting and selecting the optimal audio features. Hence, this paper proposes an efficient approach for segmentation, feature extraction and classification of audio signals. Enhanced Mel Frequency Cepstral Coefficient (EMFCC)-Enhanced Power Normalized Cepstral Coefficients (EPNCC) based feature extraction is applied for the extraction of features from the audio signal. Then, multi-level classification is done to classify the audio signal as a musical or non-musical signal. The proposed approach achieves better performance in terms of precision, Normalized Mutual Information (NMI), F-score and entropy. The PNN classifier shows high False Rejection Rate (FRR), False Acceptance Rate (FAR), Genuine Acceptance rate (GAR), sensitivity, specificity and accuracy with respect to the number of classes.展开更多
Foods are often contaminated by multiple foodborne pathogens,which threatens human health.In this work,we developed a microfluidic biosensor for multiplex immunoassay of foodborne bacteria with agitation driven by pro...Foods are often contaminated by multiple foodborne pathogens,which threatens human health.In this work,we developed a microfluidic biosensor for multiplex immunoassay of foodborne bacteria with agitation driven by programmed audio signals.This agitation,powered by the vibration of a speaker cone during music playing,accelerated the mass transport in the incubation process to form bacterial complexes within 10 min.Immunoassay reagents of the two target bacteria(Escherichia coli O157:H7 and Salmonella typhimurium)were preloaded into the corresponding fore-vacuum storage chamber on the chip,and released to participate in the subsequent immune analysis process by piercing the chambers.All the detection processes were integrated into a single microfluidic chip and controlled by a smartphone through Bluetooth.Under selected conditions,wide linear ranges and low limits of detection(LODs<2CFU/m L)were obtained,and real food samples were successfully determined within 30 min.This biosensing method can be extended to wide-ranging applications by loading different recognizing reagents.展开更多
Audio signal separation is an open and challenging issue in the classical“Cocktail Party Problem”.Especially in a reverberation environment,the separation of mixed signals is more difficult separated due to the infl...Audio signal separation is an open and challenging issue in the classical“Cocktail Party Problem”.Especially in a reverberation environment,the separation of mixed signals is more difficult separated due to the influence of reverberation and echo.To solve the problem,we propose a determined reverberant blind source separation algorithm.The main innovation of the algorithm focuses on the estimation of the mixing matrix.A new cost function is built to obtain the accurate demixing matrix,which shows the gap between the prediction and the actual data.Then,the update rule of the demixing matrix is derived using Newton gradient descent method.The identity matrix is employed as the initial demixing matrix for avoiding local optima problem.Through the real-time iterative update of the demixing matrix,frequency-domain sources are obtained.Then,time-domain sources can be obtained using an inverse short-time Fourier transform.Experi-mental results based on a series of source separation of speech and music mixing signals demonstrate that the proposed algorithm achieves better separation performance than the state-of-the-art methods.In particular,it has much better superiority in the highly reverberant environment.展开更多
A large part of our daily lives is spent with audio information. Massive obstacles are frequently presented by the colossal amounts of acoustic information and the incredibly quick processing times. This results in th...A large part of our daily lives is spent with audio information. Massive obstacles are frequently presented by the colossal amounts of acoustic information and the incredibly quick processing times. This results in the need for applications and methodologies that are capable of automatically analyzing these contents. These technologies can be applied in automatic contentanalysis and emergency response systems. Breaks in manual communication usually occur in emergencies leading to accidents and equipment damage. The audio signal does a good job by sending a signal underground, which warrants action from an emergency management team at the surface. This paper, therefore, seeks to design and simulate an audio signal alerting and automatic control system using Unity Pro XL to substitute manual communication of emergencies and manual control of equipment. Sound data were trained using the neural network technique of machine learning. The metrics used are Fast Fourier transform magnitude, zero crossing rate, root mean square, and percentage error. Sounds were detected with an error of approximately 17%;thus, the system can detect sounds with an accuracy of 83%. With more data training, the system can detect sounds with minimal or no error. The paper, therefore, has critical policy implications about communication, safety, and health for underground mine.展开更多
Determining the frequency range of derma nerve that responds to audio current is fundamental for the development of skin-hearing technology. Previous studies have shown that the range of derma nerve responding to audi...Determining the frequency range of derma nerve that responds to audio current is fundamental for the development of skin-hearing technology. Previous studies have shown that the range of derma nerve responding to audio current is 15-15 000 Hz, because audio amplification is not separated from the step-up transformer. Therefore, the present study used a signal generator which directly drives plane electrodes, simplified the original experimental environment for skin-hearing, measured lower limit voltage of frequency for derma nerve receiving pulse current signals, and revealed that the frequency range of human derma nerve response was as wide as 0.1-30 000 Hz. Results demonstrate that human derma nerve receives audio signals and infrasound within a wide frequency range.展开更多
Multichannel audio signal is more difficult to be compressed than mono and stereo ones.A novel multichannel audio signal compression method based on tensor representation and decomposition is proposed in this paper.Th...Multichannel audio signal is more difficult to be compressed than mono and stereo ones.A novel multichannel audio signal compression method based on tensor representation and decomposition is proposed in this paper.The multichannel audio is represented with 3-order tensor space and is decomposed into core tensor with three factor matrices in the way of channel,time and frequency.Only the truncated core tensor is transmitted which will be multiplied by the pre-trained factor matrices to reconstruct the original tensor space.Objective and subjective experiments have been done to show a very noticeable compression capability with an acceptable output quality.The novelty of the proposed compression method is that it enables both high compression capability and backward compatibility with limited signal distortion to the hearing.展开更多
Hiding efficiency of traditional audio information hiding methods is always low since the sentience similarity cannot be guaranteed. A new audio information hiding method is proposed in this letter which can impose th...Hiding efficiency of traditional audio information hiding methods is always low since the sentience similarity cannot be guaranteed. A new audio information hiding method is proposed in this letter which can impose the insensitivity with the audio phase for auditory and realize the information hiding through specific algorithm in order to modify local phase within the auditory perception. The algorithm is to introduce the operation of "set 1" and "set 0" for every phase vectors, then the phases must lie on the boundary of a phase area after modified. If it lies on "1" boundary, it comes by set 1 operation. If it lies on "0" boundary, it comes by set 0 operation. The results show that, compared with the legacy method, the proposed method has better auditory similarity, larger information embedding capacity and lower code error rate. As a kind of blind detect method, it fits for application scenario without channel interference.展开更多
Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i...Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.展开更多
Diamond based quantum sensing is a fast-emerging field with both scientific and technological significance.The nitrogen–vacancy(NV)center,a crystal defect in diamond,has become a unique object for microwave sensing a...Diamond based quantum sensing is a fast-emerging field with both scientific and technological significance.The nitrogen–vacancy(NV)center,a crystal defect in diamond,has become a unique object for microwave sensing applications due to its excellent stability,long spin coherence time,and optical properties at ambient condition.In this work,we use diamond NV center as atomic receiver to demodulate on–off keying(OOK)signal transmitted in broad frequency range(2 GHz–14 GHz in a portable benchtop setup).We proposed a unique algorithm of voltage discrimination and demonstrated audio signal transceiving with fidelity above 99%.This diamond receiver is attached to the end of a tapered fiber,having all optic nature,which will find important applications in data transmission tasks under extreme conditions such as strong electromagnetic interference,high temperatures,and high corrosion.展开更多
The recognition and retrieval of identical videos by combing through entire video files requires a great deal of time and memory space. Therefore, most current video-matching methods analyze only a part of each video&...The recognition and retrieval of identical videos by combing through entire video files requires a great deal of time and memory space. Therefore, most current video-matching methods analyze only a part of each video's image frame information. All these methods, however, share the critical problem of erroneously categorizing identical videos as different if they have merely been altered in resolution or converted with a different codec. This paper deals instead with an identical-video-retrieval method using the low-peak feature of audio data. The low-peak feature remains relatively stable even with changes in bit-rate or codec. The proposed method showed a search success rate of 93.7% in a video matching experiment. This approach could provide a technique for recognizing identical content on video file share sites.展开更多
Recently,many audio search sites headed by Google have used audio fingerprinting technology to search for the same audio and protect the music copyright using one part of the audio data.However,if there are fingerprin...Recently,many audio search sites headed by Google have used audio fingerprinting technology to search for the same audio and protect the music copyright using one part of the audio data.However,if there are fingerprints per audio file,then the amount of query data for the audio search increases.In this paper,we propose a novel method that can reduce the number of fingerprints while providing a level of performance similar to that of existing methods.The proposed method uses the difference of Gaussians which is often used in feature extraction during image signal processing.In the experiment,we use the proposed method and dynamic time warping and undertake an experimental search for the same audio with a success rate of 90%.The proposed method,therefore,can be used for an effective audio search.展开更多
基金Project(17KJB510029)supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions,ChinaProject(GXL2017004)supported by the Scientific Research Foundation of Nanjing Forestry University,China+3 种基金Project(202102210132)supported by the Important Project of Science and Technology of Henan Province,ChinaProject(B2019-51)supported by the Scientific Research Foundation of Henan Polytechnic University,ChinaProject(51521003)supported by the Foundation for Innovative Research Groups of the National Natural Science Foundation of ChinaProject(KQTD2016112515134654)supported by Shenzhen Science and Technology Program,China。
文摘A filter algorithm based on cochlear mechanics and neuron filter mechanism is proposed from the view point of vibration.It helps to solve the problem that the non-linear amplification is rarely considered in studying the auditory filters.A cochlear mechanical transduction model is built to illustrate the audio signals processing procedure in cochlea,and then the neuron filter mechanism is modeled to indirectly obtain the outputs with the cochlear properties of frequency tuning and non-linear amplification.The mathematic description of the proposed algorithm is derived by the two models.The parameter space,the parameter selection rules and the error correction of the proposed algorithm are discussed.The unit impulse responses in the time domain and the frequency domain are simulated and compared to probe into the characteristics of the proposed algorithm.Then a 24-channel filter bank is built based on the proposed algorithm and applied to the enhancements of the audio signals.The experiments and comparisons verify that,the proposed algorithm can effectively divide the audio signals into different frequencies,significantly enhance the high frequency parts,and provide positive impacts on the performance of speech enhancement in different noise environments,especially for the babble noise and the volvo noise.
文摘Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still considered as a challenging task due to the difficulty of extracting and selecting the optimal audio features. Hence, this paper proposes an efficient approach for segmentation, feature extraction and classification of audio signals. Enhanced Mel Frequency Cepstral Coefficient (EMFCC)-Enhanced Power Normalized Cepstral Coefficients (EPNCC) based feature extraction is applied for the extraction of features from the audio signal. Then, multi-level classification is done to classify the audio signal as a musical or non-musical signal. The proposed approach achieves better performance in terms of precision, Normalized Mutual Information (NMI), F-score and entropy. The PNN classifier shows high False Rejection Rate (FRR), False Acceptance Rate (FAR), Genuine Acceptance rate (GAR), sensitivity, specificity and accuracy with respect to the number of classes.
基金supported financially by“Kunlun Talents High-end Innovation and Entrepreneurship Talents”of Qinghai Province in 2022National Natural Science Foundation of China(Nos.22322401 and 82073816)Beijing Nova Program(No.20220484055)。
文摘Foods are often contaminated by multiple foodborne pathogens,which threatens human health.In this work,we developed a microfluidic biosensor for multiplex immunoassay of foodborne bacteria with agitation driven by programmed audio signals.This agitation,powered by the vibration of a speaker cone during music playing,accelerated the mass transport in the incubation process to form bacterial complexes within 10 min.Immunoassay reagents of the two target bacteria(Escherichia coli O157:H7 and Salmonella typhimurium)were preloaded into the corresponding fore-vacuum storage chamber on the chip,and released to participate in the subsequent immune analysis process by piercing the chambers.All the detection processes were integrated into a single microfluidic chip and controlled by a smartphone through Bluetooth.Under selected conditions,wide linear ranges and low limits of detection(LODs<2CFU/m L)were obtained,and real food samples were successfully determined within 30 min.This biosensing method can be extended to wide-ranging applications by loading different recognizing reagents.
基金This research was partially supported by the National Natural Science Foundation of China under Grant 52105268Natural Science Foundation of Guangdong Province under Grant 2022A1515011409+2 种基金Key Platforms and Major Scientific Research Projects of Universities in Guangdong under Grants 2019KTSCX161 and 2019KTSCX165Key Projects of Natural Science Research Projects of Shaoguan University under Grants SZ2020KJ02 and SZ2021KJ04the Science and Technology Program of Shaoguan City of China under Grants 2019sn056,200811094530423,200811094530805,and 200811094530811.
文摘Audio signal separation is an open and challenging issue in the classical“Cocktail Party Problem”.Especially in a reverberation environment,the separation of mixed signals is more difficult separated due to the influence of reverberation and echo.To solve the problem,we propose a determined reverberant blind source separation algorithm.The main innovation of the algorithm focuses on the estimation of the mixing matrix.A new cost function is built to obtain the accurate demixing matrix,which shows the gap between the prediction and the actual data.Then,the update rule of the demixing matrix is derived using Newton gradient descent method.The identity matrix is employed as the initial demixing matrix for avoiding local optima problem.Through the real-time iterative update of the demixing matrix,frequency-domain sources are obtained.Then,time-domain sources can be obtained using an inverse short-time Fourier transform.Experi-mental results based on a series of source separation of speech and music mixing signals demonstrate that the proposed algorithm achieves better separation performance than the state-of-the-art methods.In particular,it has much better superiority in the highly reverberant environment.
文摘A large part of our daily lives is spent with audio information. Massive obstacles are frequently presented by the colossal amounts of acoustic information and the incredibly quick processing times. This results in the need for applications and methodologies that are capable of automatically analyzing these contents. These technologies can be applied in automatic contentanalysis and emergency response systems. Breaks in manual communication usually occur in emergencies leading to accidents and equipment damage. The audio signal does a good job by sending a signal underground, which warrants action from an emergency management team at the surface. This paper, therefore, seeks to design and simulate an audio signal alerting and automatic control system using Unity Pro XL to substitute manual communication of emergencies and manual control of equipment. Sound data were trained using the neural network technique of machine learning. The metrics used are Fast Fourier transform magnitude, zero crossing rate, root mean square, and percentage error. Sounds were detected with an error of approximately 17%;thus, the system can detect sounds with an accuracy of 83%. With more data training, the system can detect sounds with minimal or no error. The paper, therefore, has critical policy implications about communication, safety, and health for underground mine.
基金the National Natural Science Foundation of China,No.60672001the Special Fund of Education Department of Shaanxi Province,No.05JC0
文摘Determining the frequency range of derma nerve that responds to audio current is fundamental for the development of skin-hearing technology. Previous studies have shown that the range of derma nerve responding to audio current is 15-15 000 Hz, because audio amplification is not separated from the step-up transformer. Therefore, the present study used a signal generator which directly drives plane electrodes, simplified the original experimental environment for skin-hearing, measured lower limit voltage of frequency for derma nerve receiving pulse current signals, and revealed that the frequency range of human derma nerve response was as wide as 0.1-30 000 Hz. Results demonstrate that human derma nerve receives audio signals and infrasound within a wide frequency range.
基金This work was partially supported by the National Natural Science Foundation of China under Grants No.11161140319,No.61001188,the Specialized Research Fund for the Doctoral Program of Higher Education under Grant No.20101101110020,the Fund for Basic Research from Beijing Institute of Technology under Grant No.20120542011,the Fund for Beijing Higher Education Young Elite Teacher Project under Grant No.YETP1202
文摘Multichannel audio signal is more difficult to be compressed than mono and stereo ones.A novel multichannel audio signal compression method based on tensor representation and decomposition is proposed in this paper.The multichannel audio is represented with 3-order tensor space and is decomposed into core tensor with three factor matrices in the way of channel,time and frequency.Only the truncated core tensor is transmitted which will be multiplied by the pre-trained factor matrices to reconstruct the original tensor space.Objective and subjective experiments have been done to show a very noticeable compression capability with an acceptable output quality.The novelty of the proposed compression method is that it enables both high compression capability and backward compatibility with limited signal distortion to the hearing.
文摘Hiding efficiency of traditional audio information hiding methods is always low since the sentience similarity cannot be guaranteed. A new audio information hiding method is proposed in this letter which can impose the insensitivity with the audio phase for auditory and realize the information hiding through specific algorithm in order to modify local phase within the auditory perception. The algorithm is to introduce the operation of "set 1" and "set 0" for every phase vectors, then the phases must lie on the boundary of a phase area after modified. If it lies on "1" boundary, it comes by set 1 operation. If it lies on "0" boundary, it comes by set 0 operation. The results show that, compared with the legacy method, the proposed method has better auditory similarity, larger information embedding capacity and lower code error rate. As a kind of blind detect method, it fits for application scenario without channel interference.
基金This work was supported by High-grade,Precision and Advanced Discipline Construction Project of Beijing Universities,Major Projects of National Social Science Fund of China(No.21ZD19)Nation Culture and Tourism Technological Innovation Engineering Project of China.
文摘Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.
基金the National Key Research and Development Program of China(Grant No.2021YFB2012600)。
文摘Diamond based quantum sensing is a fast-emerging field with both scientific and technological significance.The nitrogen–vacancy(NV)center,a crystal defect in diamond,has become a unique object for microwave sensing applications due to its excellent stability,long spin coherence time,and optical properties at ambient condition.In this work,we use diamond NV center as atomic receiver to demodulate on–off keying(OOK)signal transmitted in broad frequency range(2 GHz–14 GHz in a portable benchtop setup).We proposed a unique algorithm of voltage discrimination and demonstrated audio signal transceiving with fidelity above 99%.This diamond receiver is attached to the end of a tapered fiber,having all optic nature,which will find important applications in data transmission tasks under extreme conditions such as strong electromagnetic interference,high temperatures,and high corrosion.
文摘The recognition and retrieval of identical videos by combing through entire video files requires a great deal of time and memory space. Therefore, most current video-matching methods analyze only a part of each video's image frame information. All these methods, however, share the critical problem of erroneously categorizing identical videos as different if they have merely been altered in resolution or converted with a different codec. This paper deals instead with an identical-video-retrieval method using the low-peak feature of audio data. The low-peak feature remains relatively stable even with changes in bit-rate or codec. The proposed method showed a search success rate of 93.7% in a video matching experiment. This approach could provide a technique for recognizing identical content on video file share sites.
文摘Recently,many audio search sites headed by Google have used audio fingerprinting technology to search for the same audio and protect the music copyright using one part of the audio data.However,if there are fingerprints per audio file,then the amount of query data for the audio search increases.In this paper,we propose a novel method that can reduce the number of fingerprints while providing a level of performance similar to that of existing methods.The proposed method uses the difference of Gaussians which is often used in feature extraction during image signal processing.In the experiment,we use the proposed method and dynamic time warping and undertake an experimental search for the same audio with a success rate of 90%.The proposed method,therefore,can be used for an effective audio search.