Abstract: A hierarchical method for scene analysis in audio sensor networks is proposed. This meth-od consists of two stages: element detection stage and audio scene analysis stage. In the former stage, the basic au...Abstract: A hierarchical method for scene analysis in audio sensor networks is proposed. This meth-od consists of two stages: element detection stage and audio scene analysis stage. In the former stage, the basic audio elements are modeled by the HMM models and trained by enough samples off-line, and we adaptively add or remove basic ele- ment from the targeted element pool according to the time, place and other environment parameters. In the latter stage, a data fusion algorithm is used to combine the sensory information of the same ar-ea, and then, a role-based method is employed to analyze the audio scene based on the fused data. We conduct some experiments to evaluate the per-formance of the proposed method that about 70% audio scenes can be detected correctly by this method. The experiment evaluations demonstrate that our method can achieve satisfactory results.展开更多
An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The a...An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The all-in-one machine of 3G audio and video network highly integrates all front-end devices used for audio and video collection, communication, power supply and information storage, and has advantages of wireless video transmission, clear two-way voice intercom with the command center, waterproof and dustproof function, simple operation, good portability, and long working hours. Compression code of the system is transmitted by dynamic bandwidth, and compression rate varies from 32 kbps to 4 Mbps under different network conditions. This system has forwarding mode, that is, monitoring information from each front-end monitoring point is trans- mitted to the server of the command center by 3G/ADSL, and the server codes'and decodes again, then beck-end users call images from the serv- er, which can address 3G network stoppage caused by many users calling front-end video at the same time. In addition, the system has been ap- plied in surface weather modification operation of Tai'an City, and has made a great contribution to transmitting operation orders in real time, monitoring, standardizing and recording operating process, and improving operating safety.展开更多
Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually gen...Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually generated by a duplication of the corresponding low frequencies and some parameters of high frequencies. However, the perception quality of coding will significantly degrade if the correlation between high frequencies and low frequencies becomes weak. In this paper, we quantitatively analyse the correlation via computing mutual information value. The analysis results show the correlation also exists in low frequency signal of the context dependent frames besides the current frame. In order to improve the perception quality of coding, we propose a novel method of high frequency coarse spectrum generation to improve the conventional replication method. In the proposed method, the coarse high frequency spectrums are generated by a nonlinear mapping model using deep recurrent neural network. The experiments confirm that the proposed method shows better performance than the reference methods.展开更多
In this paper, we present an approach to improve the accuracy of environmental sound event detection in a wireless acoustic sensor network for home monitoring. Wireless acoustic sensor nodes can capture sounds in the ...In this paper, we present an approach to improve the accuracy of environmental sound event detection in a wireless acoustic sensor network for home monitoring. Wireless acoustic sensor nodes can capture sounds in the home and simultaneously deliver them to a sink node for sound event detection. The proposed approach is mainly composed of three modules, including signal estimation, reliable sensor channel selection, and sound event detection. During signal estimation, lost packets are recovered to improve the signal quality. Next, reliable channels are selected using a multi-channel cross-correlation coefficient to improve the computational efficiency for distant sound event detection without sacrificing performance. Finally, the signals of the selected two channels are used for environmental sound event detection based on bidirectional gated recurrent neural networks using two-channel audio features. Experiments show that the proposed approach achieves superior performances compared to the baseline.展开更多
Over the past years, we have witnessed an explosive growth in the use of multimedia applications such as audio and video streaming with mobile and static devices. Multimedia streaming applications need new approaches ...Over the past years, we have witnessed an explosive growth in the use of multimedia applications such as audio and video streaming with mobile and static devices. Multimedia streaming applications need new approaches to multimedia transmissions to meet the growing volume demand and quality expectations of multimedia traffic. This paper studies network coding which is a promising paradigm that has the potential to improve the performance of networks for multimedia streaming applications in terms of packet delivery ratio (PDR), latency and jitter. This paper examines several network coding protocols for ad hoc wireless mesh networks and compares their performance on multimedia streaming applications with optimized broadcast protocols, e.g., BCast, Simplified Multicast Forwarding (SMF), and Partial Dominant Pruning (PDP). The results show that the performance increases significantly with the Random Linear Network Coding (RLNC) scheme.展开更多
With the development of multichannel audio systems, corresponding audio quality assessment techniques, especially the objective prediction models, have received increasing attention. Existing methods, such as PEAQ(Per...With the development of multichannel audio systems, corresponding audio quality assessment techniques, especially the objective prediction models, have received increasing attention. Existing methods, such as PEAQ(Perceptual Evaluation of Audio Quality) recommended by ITU, usually lead to poor results when assessing multichannel audio, which have little correlation with subjective scores. In this paper, a novel two-layer model based on Multiple Linear Regression(MLR) and Neural Network(NN) is proposed. Through the first layer, two indicators of multichannel audio, Audio Quality Score(AQS) and Spatial Perception Score(SPS) are derived, and through the second layer the overall score is output. The final results show that this model can not only improve the correlation with the subjective test score by 30.7% and decrease the Root Mean Square Error(RMSE) by 44.6%, but also add two new indicators: AQS and SPS, which can help reflect the multichannel audio quality more clearly.展开更多
Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ n...Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ needs as they only relay information through suggestive cries. Many teenagers tend to give birth at an early age, thereby exposing them to be the key monitors of their own babies. They tend not to have sufficient skills in monitoring the infant’s dire needs, more so during the early stages of infant development. Artificial intelligence has shown promising efficient predictive analytics from supervised, and unsupervised to reinforcement learning models. This study, therefore, seeks to develop an android app that could be used to discriminate the infant audio cries by leveraging the strength of convolution neural networks as a classifier model. Audio analytics from many kinds of literature is an untapped area by researchers as it’s attributed to messy and huge data generation. This study, therefore, strongly leverages convolution neural networks, a deep learning model that is capable of handling more than one-dimensional datasets. To achieve this, the audio data in form of a wave was converted to images through Mel spectrum frequencies which were classified using the computer vision CNN model. The Librosa library was used to convert the audio to Mel spectrum which was then presented as pixels serving as the input for classifying the audio classes such as sick, burping, tired, and hungry. The study goal was to incorporate the model as an android tool that can be utilized at the domestic level and hospital facilities for surveillance of the infant’s health and social needs status all time round.展开更多
【目的】为解决群养环境下生猪音频难以分离与识别的问题,提出基于欠定盲源分离与E C A-EfficientNetV2的生猪状态音频识别方法。【方法】以仿真群养环境下4类生猪音频信号作为观测信号,将信号稀疏表示后,通过层次聚类估计出信号混合矩...【目的】为解决群养环境下生猪音频难以分离与识别的问题,提出基于欠定盲源分离与E C A-EfficientNetV2的生猪状态音频识别方法。【方法】以仿真群养环境下4类生猪音频信号作为观测信号,将信号稀疏表示后,通过层次聚类估计出信号混合矩阵,并利用lp范数重构算法求解lp范数最小值以完成生猪音频信号重构。将重构信号转化为声谱图,分为进食声、咆哮声、哼叫声和发情声4类,利用ECA-EfficientNetV2网络模型识别音频,获取生猪状态。【结果】混合矩阵估计的归一化均方误差最低为3.266×10^(−4),分离重构的音频信噪比在3.254~4.267 dB之间。声谱图经ECA-EfficientNetV2识别检测,准确率高达98.35%;与经典卷积神经网络ResNet50和VGG16对比,准确率分别提升2.88和1.81个百分点;与原EfficientNetV2相比,准确率降低0.52个百分点,但模型参数量减少33.56%,浮点运算量(FLOPs)降低1.86 G,推理时间减少9.40 ms。【结论】基于盲源分离及改进EfficientNetV2的方法,轻量且高效地实现了分离与识别群养生猪音频信号。展开更多
基金This work was supported by the Projects of the National Nat-ura! Science Foundation of China under Crant No.U0835001 the Fundamental Research Funds for the Central Universities-2011PTB-00-28.
文摘Abstract: A hierarchical method for scene analysis in audio sensor networks is proposed. This meth-od consists of two stages: element detection stage and audio scene analysis stage. In the former stage, the basic audio elements are modeled by the HMM models and trained by enough samples off-line, and we adaptively add or remove basic ele- ment from the targeted element pool according to the time, place and other environment parameters. In the latter stage, a data fusion algorithm is used to combine the sensory information of the same ar-ea, and then, a role-based method is employed to analyze the audio scene based on the fused data. We conduct some experiments to evaluate the per-formance of the proposed method that about 70% audio scenes can be detected correctly by this method. The experiment evaluations demonstrate that our method can achieve satisfactory results.
基金Supported by the Integration and Application Project of Meteorological Key Technology of China Meteorological Administration(CMAGJ2012M30) Technology Development Projects of Tai'an Science and Technology Bureau in 2010 (201002045) and 2011
文摘An audio and video network monitoring system for weather modification operation transmitting information by 3G, ADSL and Internet has been developed and applied in weather modification operation of Tai'an City. The all-in-one machine of 3G audio and video network highly integrates all front-end devices used for audio and video collection, communication, power supply and information storage, and has advantages of wireless video transmission, clear two-way voice intercom with the command center, waterproof and dustproof function, simple operation, good portability, and long working hours. Compression code of the system is transmitted by dynamic bandwidth, and compression rate varies from 32 kbps to 4 Mbps under different network conditions. This system has forwarding mode, that is, monitoring information from each front-end monitoring point is trans- mitted to the server of the command center by 3G/ADSL, and the server codes'and decodes again, then beck-end users call images from the serv- er, which can address 3G network stoppage caused by many users calling front-end video at the same time. In addition, the system has been ap- plied in surface weather modification operation of Tai'an City, and has made a great contribution to transmitting operation orders in real time, monitoring, standardizing and recording operating process, and improving operating safety.
基金supported by the National Natural Science Foundation of China under Grant No. 61762005, 61231015, 61671335, 61702472, 61701194, 61761044, 61471271National High Technology Research and Development Program of China (863 Program) under Grant No. 2015AA016306+2 种基金 Hubei Province Technological Innovation Major Project under Grant No. 2016AAA015the Science Project of Education Department of Jiangxi Province under No. GJJ150585The Opening Project of Collaborative Innovation Center for Economics Crime Investigation and Prevention Technology, Jiangxi Province, under Grant No. JXJZXTCX-025
文摘Non-blind audio bandwidth extension is a standard technique within contemporary audio codecs to efficiently code audio signals at low bitrates. In existing methods, in most cases high frequencies signal is usually generated by a duplication of the corresponding low frequencies and some parameters of high frequencies. However, the perception quality of coding will significantly degrade if the correlation between high frequencies and low frequencies becomes weak. In this paper, we quantitatively analyse the correlation via computing mutual information value. The analysis results show the correlation also exists in low frequency signal of the context dependent frames besides the current frame. In order to improve the perception quality of coding, we propose a novel method of high frequency coarse spectrum generation to improve the conventional replication method. In the proposed method, the coarse high frequency spectrums are generated by a nonlinear mapping model using deep recurrent neural network. The experiments confirm that the proposed method shows better performance than the reference methods.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (NRF2015R1D1A1A01059804)the MSIP (Ministry of Science,ICT and Future Planning),Korea,under the ITRC(Information Technology Research Center) support program (IITP-2016-R2718-16-0011) supervised by the IITP(Institute for Information & communications Technology Promotion)the present Research has been conducted by the Research Grant of Kwangwoon University in 2017
文摘In this paper, we present an approach to improve the accuracy of environmental sound event detection in a wireless acoustic sensor network for home monitoring. Wireless acoustic sensor nodes can capture sounds in the home and simultaneously deliver them to a sink node for sound event detection. The proposed approach is mainly composed of three modules, including signal estimation, reliable sensor channel selection, and sound event detection. During signal estimation, lost packets are recovered to improve the signal quality. Next, reliable channels are selected using a multi-channel cross-correlation coefficient to improve the computational efficiency for distant sound event detection without sacrificing performance. Finally, the signals of the selected two channels are used for environmental sound event detection based on bidirectional gated recurrent neural networks using two-channel audio features. Experiments show that the proposed approach achieves superior performances compared to the baseline.
文摘Over the past years, we have witnessed an explosive growth in the use of multimedia applications such as audio and video streaming with mobile and static devices. Multimedia streaming applications need new approaches to multimedia transmissions to meet the growing volume demand and quality expectations of multimedia traffic. This paper studies network coding which is a promising paradigm that has the potential to improve the performance of networks for multimedia streaming applications in terms of packet delivery ratio (PDR), latency and jitter. This paper examines several network coding protocols for ad hoc wireless mesh networks and compares their performance on multimedia streaming applications with optimized broadcast protocols, e.g., BCast, Simplified Multicast Forwarding (SMF), and Partial Dominant Pruning (PDP). The results show that the performance increases significantly with the Random Linear Network Coding (RLNC) scheme.
基金supported by the National Natural Science Foundation of China (No.61571044,No.11590772,and No.61473041)
文摘With the development of multichannel audio systems, corresponding audio quality assessment techniques, especially the objective prediction models, have received increasing attention. Existing methods, such as PEAQ(Perceptual Evaluation of Audio Quality) recommended by ITU, usually lead to poor results when assessing multichannel audio, which have little correlation with subjective scores. In this paper, a novel two-layer model based on Multiple Linear Regression(MLR) and Neural Network(NN) is proposed. Through the first layer, two indicators of multichannel audio, Audio Quality Score(AQS) and Spatial Perception Score(SPS) are derived, and through the second layer the overall score is output. The final results show that this model can not only improve the correlation with the subjective test score by 30.7% and decrease the Root Mean Square Error(RMSE) by 44.6%, but also add two new indicators: AQS and SPS, which can help reflect the multichannel audio quality more clearly.
文摘Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ needs as they only relay information through suggestive cries. Many teenagers tend to give birth at an early age, thereby exposing them to be the key monitors of their own babies. They tend not to have sufficient skills in monitoring the infant’s dire needs, more so during the early stages of infant development. Artificial intelligence has shown promising efficient predictive analytics from supervised, and unsupervised to reinforcement learning models. This study, therefore, seeks to develop an android app that could be used to discriminate the infant audio cries by leveraging the strength of convolution neural networks as a classifier model. Audio analytics from many kinds of literature is an untapped area by researchers as it’s attributed to messy and huge data generation. This study, therefore, strongly leverages convolution neural networks, a deep learning model that is capable of handling more than one-dimensional datasets. To achieve this, the audio data in form of a wave was converted to images through Mel spectrum frequencies which were classified using the computer vision CNN model. The Librosa library was used to convert the audio to Mel spectrum which was then presented as pixels serving as the input for classifying the audio classes such as sick, burping, tired, and hungry. The study goal was to incorporate the model as an android tool that can be utilized at the domestic level and hospital facilities for surveillance of the infant’s health and social needs status all time round.