Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.Howeve...Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.However,due to the inaccurate voice activity detection(VAD),the residual music noise and enhanced performance still need to be further improved,especially in the low signal to noise ratio(SNR)scenarios.To address this issue,an improved frame iterative spectral subtraction in the STM domain(IMModSSub)is proposed.More specifically,with the inter-frame correlation,the noise subtraction is directly applied to handle the noisy signal for each frame in the STM domain.Then,the noisy signal is classified into speech or silence frames based on a predefined threshold of segmented SNR.With these classification results,a corresponding mask function is developed for noisy speech after noise subtraction.Finally,exploiting the increased sparsity of speech signal in the modulation domain,the orthogonal matching pursuit(OMP)technique is employed to the speech frames for improving the speech quality and intelligibility.The effectiveness of the proposed method is evaluated with three types of noise,including white noise,pink noise,and hfchannel noise.The obtained results show that the proposed method outperforms some established baselines at lower SNRs(-5 to +5 dB).展开更多
With the increasing intensive and large-scale development of the sika deer breeding industry,it is crucial to assess the health status of the sika deer by monitoring their behaviours.A machine vision-based method for ...With the increasing intensive and large-scale development of the sika deer breeding industry,it is crucial to assess the health status of the sika deer by monitoring their behaviours.A machine vision-based method for the behaviour recognition of sika deer is proposed in this paper.Google Inception Net(GoogLeNet)is used to optimise the model in this paper.First,the number of layers and size of the model were reduced.Then,the 5×5 convolution was changed to two 3×3 convolutions,which reduced the parameters and increased the nonlinearity of the model.A 5×5 convolution kernel was used to replace the original convolution for extracting coarse-grained features and improving the model’s extraction ability.A multi-scale module was added to the model to enhance the multi-faceted feature extraction capability of the model.Simultaneously,the Squeeze-and-Excitation Networks(SE-Net)module was included to increase the channel’s attention and improve the model’s accuracy.The dataset’s images were rotated to reduce overfitting.For image rotation,the angle wasmultiplied by 30°to obtain the dataset enhanced by rotation operations of 30°,60°,90°,120°and 150°.The experimental results showed that the recognition rate of this model in the behaviour of sika deer was 98.92%.Therefore,the model presented in this paper can be applied to the behaviour recognition of sika deer.The results will play an essential role in promoting animal behaviour recognition technology and animal health monitoring management.展开更多
基金National Natural Science Foundation of China(NSFC)(No.61671075)Major Program of National Natural Science Foundation of China(No.61631003)。
文摘Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.However,due to the inaccurate voice activity detection(VAD),the residual music noise and enhanced performance still need to be further improved,especially in the low signal to noise ratio(SNR)scenarios.To address this issue,an improved frame iterative spectral subtraction in the STM domain(IMModSSub)is proposed.More specifically,with the inter-frame correlation,the noise subtraction is directly applied to handle the noisy signal for each frame in the STM domain.Then,the noisy signal is classified into speech or silence frames based on a predefined threshold of segmented SNR.With these classification results,a corresponding mask function is developed for noisy speech after noise subtraction.Finally,exploiting the increased sparsity of speech signal in the modulation domain,the orthogonal matching pursuit(OMP)technique is employed to the speech frames for improving the speech quality and intelligibility.The effectiveness of the proposed method is evaluated with three types of noise,including white noise,pink noise,and hfchannel noise.The obtained results show that the proposed method outperforms some established baselines at lower SNRs(-5 to +5 dB).
基金This research is supported by the Science and Technology Department of Jilin Province[20210202128NC http://kjt.jl.gov.cn]The People’s Republic of China Ministry of Science and Technology[2018YFF0213606-03 http://www.most.gov.cn]+1 种基金Jilin Province Development and Reform Commission[2019C021 http://jldrc.jl.gov.cn]the Science and Technology Bureau of Changchun City[21ZGN27 http://kjj.changchun.gov.cn].
文摘With the increasing intensive and large-scale development of the sika deer breeding industry,it is crucial to assess the health status of the sika deer by monitoring their behaviours.A machine vision-based method for the behaviour recognition of sika deer is proposed in this paper.Google Inception Net(GoogLeNet)is used to optimise the model in this paper.First,the number of layers and size of the model were reduced.Then,the 5×5 convolution was changed to two 3×3 convolutions,which reduced the parameters and increased the nonlinearity of the model.A 5×5 convolution kernel was used to replace the original convolution for extracting coarse-grained features and improving the model’s extraction ability.A multi-scale module was added to the model to enhance the multi-faceted feature extraction capability of the model.Simultaneously,the Squeeze-and-Excitation Networks(SE-Net)module was included to increase the channel’s attention and improve the model’s accuracy.The dataset’s images were rotated to reduce overfitting.For image rotation,the angle wasmultiplied by 30°to obtain the dataset enhanced by rotation operations of 30°,60°,90°,120°and 150°.The experimental results showed that the recognition rate of this model in the behaviour of sika deer was 98.92%.Therefore,the model presented in this paper can be applied to the behaviour recognition of sika deer.The results will play an essential role in promoting animal behaviour recognition technology and animal health monitoring management.