A large part of our daily lives is spent with audio information. Massive obstacles are frequently presented by the colossal amounts of acoustic information and the incredibly quick processing times. This results in th...A large part of our daily lives is spent with audio information. Massive obstacles are frequently presented by the colossal amounts of acoustic information and the incredibly quick processing times. This results in the need for applications and methodologies that are capable of automatically analyzing these contents. These technologies can be applied in automatic contentanalysis and emergency response systems. Breaks in manual communication usually occur in emergencies leading to accidents and equipment damage. The audio signal does a good job by sending a signal underground, which warrants action from an emergency management team at the surface. This paper, therefore, seeks to design and simulate an audio signal alerting and automatic control system using Unity Pro XL to substitute manual communication of emergencies and manual control of equipment. Sound data were trained using the neural network technique of machine learning. The metrics used are Fast Fourier transform magnitude, zero crossing rate, root mean square, and percentage error. Sounds were detected with an error of approximately 17%;thus, the system can detect sounds with an accuracy of 83%. With more data training, the system can detect sounds with minimal or no error. The paper, therefore, has critical policy implications about communication, safety, and health for underground mine.展开更多
A state machine can make program designing quicker,simpler and more efficient. This paper describes in detail the model for a state machine and the idea for its designing and gives the design process of the state mach...A state machine can make program designing quicker,simpler and more efficient. This paper describes in detail the model for a state machine and the idea for its designing and gives the design process of the state machine through an example of audio signal generator system based on Labview. The result shows that the introduction of the state machine can make complex design processes more clear and the revision of programs easier.展开更多
Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still c...Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still considered as a challenging task due to the difficulty of extracting and selecting the optimal audio features. Hence, this paper proposes an efficient approach for segmentation, feature extraction and classification of audio signals. Enhanced Mel Frequency Cepstral Coefficient (EMFCC)-Enhanced Power Normalized Cepstral Coefficients (EPNCC) based feature extraction is applied for the extraction of features from the audio signal. Then, multi-level classification is done to classify the audio signal as a musical or non-musical signal. The proposed approach achieves better performance in terms of precision, Normalized Mutual Information (NMI), F-score and entropy. The PNN classifier shows high False Rejection Rate (FRR), False Acceptance Rate (FAR), Genuine Acceptance rate (GAR), sensitivity, specificity and accuracy with respect to the number of classes.展开更多
Multichannel audio signal is more difficult to be compressed than mono and stereo ones.A novel multichannel audio signal compression method based on tensor representation and decomposition is proposed in this paper.Th...Multichannel audio signal is more difficult to be compressed than mono and stereo ones.A novel multichannel audio signal compression method based on tensor representation and decomposition is proposed in this paper.The multichannel audio is represented with 3-order tensor space and is decomposed into core tensor with three factor matrices in the way of channel,time and frequency.Only the truncated core tensor is transmitted which will be multiplied by the pre-trained factor matrices to reconstruct the original tensor space.Objective and subjective experiments have been done to show a very noticeable compression capability with an acceptable output quality.The novelty of the proposed compression method is that it enables both high compression capability and backward compatibility with limited signal distortion to the hearing.展开更多
Audio signal separation is an open and challenging issue in the classical“Cocktail Party Problem”.Especially in a reverberation environment,the separation of mixed signals is more difficult separated due to the infl...Audio signal separation is an open and challenging issue in the classical“Cocktail Party Problem”.Especially in a reverberation environment,the separation of mixed signals is more difficult separated due to the influence of reverberation and echo.To solve the problem,we propose a determined reverberant blind source separation algorithm.The main innovation of the algorithm focuses on the estimation of the mixing matrix.A new cost function is built to obtain the accurate demixing matrix,which shows the gap between the prediction and the actual data.Then,the update rule of the demixing matrix is derived using Newton gradient descent method.The identity matrix is employed as the initial demixing matrix for avoiding local optima problem.Through the real-time iterative update of the demixing matrix,frequency-domain sources are obtained.Then,time-domain sources can be obtained using an inverse short-time Fourier transform.Experi-mental results based on a series of source separation of speech and music mixing signals demonstrate that the proposed algorithm achieves better separation performance than the state-of-the-art methods.In particular,it has much better superiority in the highly reverberant environment.展开更多
This work is concerned with the development and optimization of a signal model for scalable perceptual audio coding at low bit rates. A complementary two-part signal model consisting of Sines plus Noise (SN) is descri...This work is concerned with the development and optimization of a signal model for scalable perceptual audio coding at low bit rates. A complementary two-part signal model consisting of Sines plus Noise (SN) is described. The paper presents essentially a fundamental enhancement to the sinusoidal modeling component. The enhancement involves an audio signal scheme based on carrying out overlap-add sinusoidal modeling at three successive time scales,large, medium, and small. The sinusoidal modeling is done in an analysis-by-synthesis overlapadd manner across the three scales by using a psychoacoustically weighted matching pursuits.The sinusoidal modeling residual at the first scale is passed to the smaller scales to allow for the modeling of various signal features at appropriate resolutions. This approach greatly helps to correct the pre-echo inherent in the sinusoidal model. This improves the perceptual audio quality upon our previous work of sinusoidal modeling while using the same number of sinusoids. The most obvious application for the SN model is in scalable, high fidelity audio coding and signal modification.展开更多
Determining the frequency range of derma nerve that responds to audio current is fundamental for the development of skin-hearing technology. Previous studies have shown that the range of derma nerve responding to audi...Determining the frequency range of derma nerve that responds to audio current is fundamental for the development of skin-hearing technology. Previous studies have shown that the range of derma nerve responding to audio current is 15-15 000 Hz, because audio amplification is not separated from the step-up transformer. Therefore, the present study used a signal generator which directly drives plane electrodes, simplified the original experimental environment for skin-hearing, measured lower limit voltage of frequency for derma nerve receiving pulse current signals, and revealed that the frequency range of human derma nerve response was as wide as 0.1-30 000 Hz. Results demonstrate that human derma nerve receives audio signals and infrasound within a wide frequency range.展开更多
随着中波转播台技术的发展,音频信号的高质量传输和监测变得至关重要。光电转换技术为信号传输带来高保真和低干扰的优势,但在传输过程中也可能遇到挑战,如介质不均匀性导致的细小干扰。音频比对监测技术能有效监控和保证广播内容的准...随着中波转播台技术的发展,音频信号的高质量传输和监测变得至关重要。光电转换技术为信号传输带来高保真和低干扰的优势,但在传输过程中也可能遇到挑战,如介质不均匀性导致的细小干扰。音频比对监测技术能有效监控和保证广播内容的准确性和真实性,利用先进的算法如梅尔频率倒谱系数(Mel Frequency Cepstral Coefficents,MFCC)来分析信号的质量。文章主要分析了音频比对监测的功能、原理及其在中波转播台的应用,以期通过中波转播平台进行音频监测来减少人为错误,降低故障率。展开更多
文摘A large part of our daily lives is spent with audio information. Massive obstacles are frequently presented by the colossal amounts of acoustic information and the incredibly quick processing times. This results in the need for applications and methodologies that are capable of automatically analyzing these contents. These technologies can be applied in automatic contentanalysis and emergency response systems. Breaks in manual communication usually occur in emergencies leading to accidents and equipment damage. The audio signal does a good job by sending a signal underground, which warrants action from an emergency management team at the surface. This paper, therefore, seeks to design and simulate an audio signal alerting and automatic control system using Unity Pro XL to substitute manual communication of emergencies and manual control of equipment. Sound data were trained using the neural network technique of machine learning. The metrics used are Fast Fourier transform magnitude, zero crossing rate, root mean square, and percentage error. Sounds were detected with an error of approximately 17%;thus, the system can detect sounds with an accuracy of 83%. With more data training, the system can detect sounds with minimal or no error. The paper, therefore, has critical policy implications about communication, safety, and health for underground mine.
文摘A state machine can make program designing quicker,simpler and more efficient. This paper describes in detail the model for a state machine and the idea for its designing and gives the design process of the state machine through an example of audio signal generator system based on Labview. The result shows that the introduction of the state machine can make complex design processes more clear and the revision of programs easier.
文摘Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still considered as a challenging task due to the difficulty of extracting and selecting the optimal audio features. Hence, this paper proposes an efficient approach for segmentation, feature extraction and classification of audio signals. Enhanced Mel Frequency Cepstral Coefficient (EMFCC)-Enhanced Power Normalized Cepstral Coefficients (EPNCC) based feature extraction is applied for the extraction of features from the audio signal. Then, multi-level classification is done to classify the audio signal as a musical or non-musical signal. The proposed approach achieves better performance in terms of precision, Normalized Mutual Information (NMI), F-score and entropy. The PNN classifier shows high False Rejection Rate (FRR), False Acceptance Rate (FAR), Genuine Acceptance rate (GAR), sensitivity, specificity and accuracy with respect to the number of classes.
基金This work was partially supported by the National Natural Science Foundation of China under Grants No.11161140319,No.61001188,the Specialized Research Fund for the Doctoral Program of Higher Education under Grant No.20101101110020,the Fund for Basic Research from Beijing Institute of Technology under Grant No.20120542011,the Fund for Beijing Higher Education Young Elite Teacher Project under Grant No.YETP1202
文摘Multichannel audio signal is more difficult to be compressed than mono and stereo ones.A novel multichannel audio signal compression method based on tensor representation and decomposition is proposed in this paper.The multichannel audio is represented with 3-order tensor space and is decomposed into core tensor with three factor matrices in the way of channel,time and frequency.Only the truncated core tensor is transmitted which will be multiplied by the pre-trained factor matrices to reconstruct the original tensor space.Objective and subjective experiments have been done to show a very noticeable compression capability with an acceptable output quality.The novelty of the proposed compression method is that it enables both high compression capability and backward compatibility with limited signal distortion to the hearing.
基金This research was partially supported by the National Natural Science Foundation of China under Grant 52105268Natural Science Foundation of Guangdong Province under Grant 2022A1515011409+2 种基金Key Platforms and Major Scientific Research Projects of Universities in Guangdong under Grants 2019KTSCX161 and 2019KTSCX165Key Projects of Natural Science Research Projects of Shaoguan University under Grants SZ2020KJ02 and SZ2021KJ04the Science and Technology Program of Shaoguan City of China under Grants 2019sn056,200811094530423,200811094530805,and 200811094530811.
文摘Audio signal separation is an open and challenging issue in the classical“Cocktail Party Problem”.Especially in a reverberation environment,the separation of mixed signals is more difficult separated due to the influence of reverberation and echo.To solve the problem,we propose a determined reverberant blind source separation algorithm.The main innovation of the algorithm focuses on the estimation of the mixing matrix.A new cost function is built to obtain the accurate demixing matrix,which shows the gap between the prediction and the actual data.Then,the update rule of the demixing matrix is derived using Newton gradient descent method.The identity matrix is employed as the initial demixing matrix for avoiding local optima problem.Through the real-time iterative update of the demixing matrix,frequency-domain sources are obtained.Then,time-domain sources can be obtained using an inverse short-time Fourier transform.Experi-mental results based on a series of source separation of speech and music mixing signals demonstrate that the proposed algorithm achieves better separation performance than the state-of-the-art methods.In particular,it has much better superiority in the highly reverberant environment.
基金Supported by the National Natural Science Foundation of China(No.69802007)Motorola China Research Center(No.B38300)Natural Science Foundation of Guangdong(No.011611)
文摘This work is concerned with the development and optimization of a signal model for scalable perceptual audio coding at low bit rates. A complementary two-part signal model consisting of Sines plus Noise (SN) is described. The paper presents essentially a fundamental enhancement to the sinusoidal modeling component. The enhancement involves an audio signal scheme based on carrying out overlap-add sinusoidal modeling at three successive time scales,large, medium, and small. The sinusoidal modeling is done in an analysis-by-synthesis overlapadd manner across the three scales by using a psychoacoustically weighted matching pursuits.The sinusoidal modeling residual at the first scale is passed to the smaller scales to allow for the modeling of various signal features at appropriate resolutions. This approach greatly helps to correct the pre-echo inherent in the sinusoidal model. This improves the perceptual audio quality upon our previous work of sinusoidal modeling while using the same number of sinusoids. The most obvious application for the SN model is in scalable, high fidelity audio coding and signal modification.
基金the National Natural Science Foundation of China,No.60672001the Special Fund of Education Department of Shaanxi Province,No.05JC0
文摘Determining the frequency range of derma nerve that responds to audio current is fundamental for the development of skin-hearing technology. Previous studies have shown that the range of derma nerve responding to audio current is 15-15 000 Hz, because audio amplification is not separated from the step-up transformer. Therefore, the present study used a signal generator which directly drives plane electrodes, simplified the original experimental environment for skin-hearing, measured lower limit voltage of frequency for derma nerve receiving pulse current signals, and revealed that the frequency range of human derma nerve response was as wide as 0.1-30 000 Hz. Results demonstrate that human derma nerve receives audio signals and infrasound within a wide frequency range.
文摘随着中波转播台技术的发展,音频信号的高质量传输和监测变得至关重要。光电转换技术为信号传输带来高保真和低干扰的优势,但在传输过程中也可能遇到挑战,如介质不均匀性导致的细小干扰。音频比对监测技术能有效监控和保证广播内容的准确性和真实性,利用先进的算法如梅尔频率倒谱系数(Mel Frequency Cepstral Coefficents,MFCC)来分析信号的质量。文章主要分析了音频比对监测的功能、原理及其在中波转播台的应用,以期通过中波转播平台进行音频监测来减少人为错误,降低故障率。