Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great pro...Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.展开更多
In the complex orchard environment,the efficient and accurate detection of object fruit is the basic requirement to realize the orchard yield measurement and automatic harvesting.Sometimes it is hard to differentiate ...In the complex orchard environment,the efficient and accurate detection of object fruit is the basic requirement to realize the orchard yield measurement and automatic harvesting.Sometimes it is hard to differentiate between the object fruits and the background because of the similar color,and it is challenging due to the ambient light and camera angle by which the photos have been taken.These problems make it hard to detect green fruits in orchard environments.In this study,a two-stage dense to detection framework(D2D)was proposed to detect green fruits in orchard environments.The proposed model was based on multi-scale feature extraction of target fruit by using feature pyramid networks MobileNetV2+FPN structure and generated region proposal of target fruit by using Region Proposal Network(RPN)structure.In the regression branch,the offset of each local feature was calculated,and the positive and negative samples of the region proposals were predicted by a binary mask prediction to reduce the interference of the background to the prediction box.In the classification branch,features were extracted from each sub-region of the region proposal,and features with distinguishing information were obtained through adaptive weighted pooling to achieve accurate classification.The new proposed model adopted an anchor-free frame design,which improves the generalization ability,makes the model more robust,and reduces the storage requirements.The experimental results of persimmon and green apple datasets show that the new model has the best detection performance,which can provide theoretical reference for other green object detection.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61902158,61673108)the Science and Technology Program of Nantong(JC2018129,MS12018082)Top-notch Academic Programs Project of Jiangsu Higher Education Institu-tions(PPZY2015B135).
文摘Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.
基金the Natural Science Foundation of Shandong Province in China(Grant No.ZR2020MF076)the Focus on Research and Development Plan in Shandong Province(Grant No.2019GNC106115)+2 种基金the National Nature Science Foundation of China(Grant No.62072289)the Shandong Province Higher Educational Science and Technology Program(Grant No.J18KA308)the Taishan Scholar Program of Shandong Province of China.
文摘In the complex orchard environment,the efficient and accurate detection of object fruit is the basic requirement to realize the orchard yield measurement and automatic harvesting.Sometimes it is hard to differentiate between the object fruits and the background because of the similar color,and it is challenging due to the ambient light and camera angle by which the photos have been taken.These problems make it hard to detect green fruits in orchard environments.In this study,a two-stage dense to detection framework(D2D)was proposed to detect green fruits in orchard environments.The proposed model was based on multi-scale feature extraction of target fruit by using feature pyramid networks MobileNetV2+FPN structure and generated region proposal of target fruit by using Region Proposal Network(RPN)structure.In the regression branch,the offset of each local feature was calculated,and the positive and negative samples of the region proposals were predicted by a binary mask prediction to reduce the interference of the background to the prediction box.In the classification branch,features were extracted from each sub-region of the region proposal,and features with distinguishing information were obtained through adaptive weighted pooling to achieve accurate classification.The new proposed model adopted an anchor-free frame design,which improves the generalization ability,makes the model more robust,and reduces the storage requirements.The experimental results of persimmon and green apple datasets show that the new model has the best detection performance,which can provide theoretical reference for other green object detection.