The fingerprinting-based approach using the wireless local area network(WLAN)is widely used for indoor localization.However,the construction of the fingerprint database is quite time-consuming.Especially when the posi...The fingerprinting-based approach using the wireless local area network(WLAN)is widely used for indoor localization.However,the construction of the fingerprint database is quite time-consuming.Especially when the position of the access point(AP)or wall changes,updating the fingerprint database in real-time is difficult.An appropriate indoor localization approach,which has a low implementation cost,excellent real-time performance,and high localization accuracy and fully considers complex indoor environment factors,is preferred in location-based services(LBSs)applications.In this paper,we proposed a fine-grained grid computing(FGGC)model to achieve decimeter-level localization accuracy.Reference points(RPs)are generated in the grid by the FGGC model.Then,the received signal strength(RSS)values at each RP are calculated with the attenuation factors,such as the frequency band,three-dimensional propagation distance,and walls in complex environments.As a result,the fingerprint database can be established automatically without manual measurement,and the efficiency and cost that the FGGC model takes for the fingerprint database are superior to previous methods.The proposed indoor localization approach,which estimates the position step by step from the approximate grid location to the fine-grained location,can achieve higher real-time performance and localization accuracy simultaneously.The mean error of the proposed model is 0.36 m,far lower than that of previous approaches.Thus,the proposed model is feasible to improve the efficiency and accuracy of Wi-Fi indoor localization.It also shows high-accuracy performance with a fast running speed even under a large-size grid.The results indicate that the proposed method can also be suitable for precise marketing,indoor navigation,and emergency rescue.展开更多
The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregula...The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.展开更多
目的探讨不同噪声条件下健听人群水平声源定位能力的差异。方法选取2022年8月~2023年8月在我院检查的听力正常者78例,测试在安静、白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下,受试者应答扬声器与发声扬声器的均方根误差(root...目的探讨不同噪声条件下健听人群水平声源定位能力的差异。方法选取2022年8月~2023年8月在我院检查的听力正常者78例,测试在安静、白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下,受试者应答扬声器与发声扬声器的均方根误差(root mean square error,RMSE)和平均应答时间。结果不同噪声条件下受试者平均应答时间比较无显著差异(P>0.05);安静条件下受试者RMSE为10.21°±1.55°,明显低于白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下(P<0.05);白噪声40 dB SPL、言语噪声40 dB SPL条件下受试者RMSE分别为15.02°±2.22°和15.16°±2.06°,明显高于白噪声35 dB SPL(P<0.05);安静、白噪声35 dB SPL、白噪声40 dB SPL和言语噪声40 dB SPL条件下受试者对低频、中频和高频刺激声的平均应答时间无显著差异(P>0.05);白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下受试者对高频刺激声的RMSE均高于低频、中频刺激声(P<0.05),对中频刺激声的RMSE均高于低频刺激声(P<0.05);安静、白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下受试者对前方声源的RMSE明显低于其他方位(P<0.05);安静、白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下受试者对前方和其他方位声源的平均应答时间无显著差异(P>0.05);安静、白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下,不同性别、年龄受试者RMSE及平均应答时间无显著差异(P>0.05)。结论噪声对健听人群声源识别定位能力有明显影响,在不同噪声条件下,受试者更容易定位来自前方的声源。展开更多
In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projec...In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projections and graph embedding framework, a novel discriminant-cascading dimensionality reduction method is proposed, which is named discriminant-cascading locality preserving projections (DCLPP). The proposed method specifically utilizes supervised embedding graphs and it keeps the original space for the inner products of samples to maintain enough information for speech emotion recognition. Then, the kernel DCLPP (KDCLPP) is also proposed to extend the mapping form. Validated by the experiments on the corpus of EMO-DB and eNTERFACE'05, the proposed method can clearly outperform the existing common dimensionality reduction methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), local discriminant embedding (LDE), graph-based Fisher analysis (GbFA) and so on, with different categories of classifiers.展开更多
针对现有语音关键词检测方法定位精度低的问题,提出了一种基于多尺度距离矩阵的语音关键词检测与细粒度定位方法(spoken term detection and fine-grained localization method based on multi-scale distance matrices,MF-STD)。该方...针对现有语音关键词检测方法定位精度低的问题,提出了一种基于多尺度距离矩阵的语音关键词检测与细粒度定位方法(spoken term detection and fine-grained localization method based on multi-scale distance matrices,MF-STD)。该方法首先利用残差卷积网络提取特征并构建距离矩阵以建模输入之间的相关性;其次通过多尺度分割和解耦头学习不同尺度下的定位信息;最后根据多尺度加权定位损失、置信度损失和分类损失优化模型,实现对关键词存在性和时域边界的细粒度预测。在LibriSpeech数据集上的实验结果表明,MF-STD在集内词的检测中,精准率和交并比分别达到97.1%和88.6%;在集外词的检测中,精准率和交并比分别达到96.7%和88.2%。与现有的语音关键词检测与定位方法相比,MF-STD的检测准确率和定位精度显著提升,充分证明该方法的先进性,也证明了多尺度特征建模与细粒度定位约束在语音关键词检测任务中的有效性。展开更多
Based on W-disjoint orthogonality of speech mixtures, a space d,scnmlnative tunetlon was proposer1 to enumerate and localize competing speakers in the surrounding environments. Then, a Wiener-like postfiherer was deve...Based on W-disjoint orthogonality of speech mixtures, a space d,scnmlnative tunetlon was proposer1 to enumerate and localize competing speakers in the surrounding environments. Then, a Wiener-like postfiherer was developed to adaptively suppress interferences. Experimental results with a hands-free speech recognizer under various SNR and competing speakers settings show that nearly 69 % error reduction can be obtained with a two-channel small aperture microphone array against the conventional single microphone baseline system. Comparisons were made against traditional delay-and-sum and Griffiths-Jim adaptive beamforming techniques to further assess the effectiveness of this method.展开更多
In this paper, we propose a locally enhanced PCANet neural network for fine-grained classification of vehicles. The proposed method adopts the PCANet unsupervised network with a smaller number of layers and simple par...In this paper, we propose a locally enhanced PCANet neural network for fine-grained classification of vehicles. The proposed method adopts the PCANet unsupervised network with a smaller number of layers and simple parameters compared with the majority of state-of-the-art machine learning methods. It simplifies calculation steps and manual labeling, and enables vehicle types to be recognized without time-consuming training. Experimental results show that compared with the traditional pattern recognition methods and the multi-layer CNN methods, the proposed method achieves optimal balance in terms of varying scales of sample libraries, angle deviations, and training speed. It also indicates that introducing appropriate local features that have different scales from the general feature is very instrumental in improving recognition rate. The 7-angle in 180° (12-angle in 360°) classification modeling scheme is proven to be an effective approach, which can solve the problem of suffering decrease in recognition rate due to angle deviations, and add the recognition accuracy in practice.展开更多
基金the Open Project of Sichuan Provincial Key Laboratory of Philosophy and Social Science for Language Intelligence in Special Education under Grant No.YYZN-2023-4the Ph.D.Fund of Chengdu Technological University under Grant No.2020RC002.
文摘The fingerprinting-based approach using the wireless local area network(WLAN)is widely used for indoor localization.However,the construction of the fingerprint database is quite time-consuming.Especially when the position of the access point(AP)or wall changes,updating the fingerprint database in real-time is difficult.An appropriate indoor localization approach,which has a low implementation cost,excellent real-time performance,and high localization accuracy and fully considers complex indoor environment factors,is preferred in location-based services(LBSs)applications.In this paper,we proposed a fine-grained grid computing(FGGC)model to achieve decimeter-level localization accuracy.Reference points(RPs)are generated in the grid by the FGGC model.Then,the received signal strength(RSS)values at each RP are calculated with the attenuation factors,such as the frequency band,three-dimensional propagation distance,and walls in complex environments.As a result,the fingerprint database can be established automatically without manual measurement,and the efficiency and cost that the FGGC model takes for the fingerprint database are superior to previous methods.The proposed indoor localization approach,which estimates the position step by step from the approximate grid location to the fine-grained location,can achieve higher real-time performance and localization accuracy simultaneously.The mean error of the proposed model is 0.36 m,far lower than that of previous approaches.Thus,the proposed model is feasible to improve the efficiency and accuracy of Wi-Fi indoor localization.It also shows high-accuracy performance with a fast running speed even under a large-size grid.The results indicate that the proposed method can also be suitable for precise marketing,indoor navigation,and emergency rescue.
基金The support of this research was by Hubei Provincial Natural Science Foundation(2022CFB449)Science Research Foundation of Education Department of Hubei Province(B2020061),are gratefully acknowledged.
文摘The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.
文摘目的探讨不同噪声条件下健听人群水平声源定位能力的差异。方法选取2022年8月~2023年8月在我院检查的听力正常者78例,测试在安静、白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下,受试者应答扬声器与发声扬声器的均方根误差(root mean square error,RMSE)和平均应答时间。结果不同噪声条件下受试者平均应答时间比较无显著差异(P>0.05);安静条件下受试者RMSE为10.21°±1.55°,明显低于白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下(P<0.05);白噪声40 dB SPL、言语噪声40 dB SPL条件下受试者RMSE分别为15.02°±2.22°和15.16°±2.06°,明显高于白噪声35 dB SPL(P<0.05);安静、白噪声35 dB SPL、白噪声40 dB SPL和言语噪声40 dB SPL条件下受试者对低频、中频和高频刺激声的平均应答时间无显著差异(P>0.05);白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下受试者对高频刺激声的RMSE均高于低频、中频刺激声(P<0.05),对中频刺激声的RMSE均高于低频刺激声(P<0.05);安静、白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下受试者对前方声源的RMSE明显低于其他方位(P<0.05);安静、白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下受试者对前方和其他方位声源的平均应答时间无显著差异(P>0.05);安静、白噪声35 dB SPL、40 dB SPL和言语噪声40 dB SPL条件下,不同性别、年龄受试者RMSE及平均应答时间无显著差异(P>0.05)。结论噪声对健听人群声源识别定位能力有明显影响,在不同噪声条件下,受试者更容易定位来自前方的声源。
基金The National Natural Science Foundation of China(No.61231002,61273266)the Ph.D.Program Foundation of Ministry of Education of China(No.20110092130004)China Postdoctoral Science Foundation(No.2015M571637)
文摘In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projections and graph embedding framework, a novel discriminant-cascading dimensionality reduction method is proposed, which is named discriminant-cascading locality preserving projections (DCLPP). The proposed method specifically utilizes supervised embedding graphs and it keeps the original space for the inner products of samples to maintain enough information for speech emotion recognition. Then, the kernel DCLPP (KDCLPP) is also proposed to extend the mapping form. Validated by the experiments on the corpus of EMO-DB and eNTERFACE'05, the proposed method can clearly outperform the existing common dimensionality reduction methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), local discriminant embedding (LDE), graph-based Fisher analysis (GbFA) and so on, with different categories of classifiers.
文摘针对现有语音关键词检测方法定位精度低的问题,提出了一种基于多尺度距离矩阵的语音关键词检测与细粒度定位方法(spoken term detection and fine-grained localization method based on multi-scale distance matrices,MF-STD)。该方法首先利用残差卷积网络提取特征并构建距离矩阵以建模输入之间的相关性;其次通过多尺度分割和解耦头学习不同尺度下的定位信息;最后根据多尺度加权定位损失、置信度损失和分类损失优化模型,实现对关键词存在性和时域边界的细粒度预测。在LibriSpeech数据集上的实验结果表明,MF-STD在集内词的检测中,精准率和交并比分别达到97.1%和88.6%;在集外词的检测中,精准率和交并比分别达到96.7%和88.2%。与现有的语音关键词检测与定位方法相比,MF-STD的检测准确率和定位精度显著提升,充分证明该方法的先进性,也证明了多尺度特征建模与细粒度定位约束在语音关键词检测任务中的有效性。
文摘Based on W-disjoint orthogonality of speech mixtures, a space d,scnmlnative tunetlon was proposer1 to enumerate and localize competing speakers in the surrounding environments. Then, a Wiener-like postfiherer was developed to adaptively suppress interferences. Experimental results with a hands-free speech recognizer under various SNR and competing speakers settings show that nearly 69 % error reduction can be obtained with a two-channel small aperture microphone array against the conventional single microphone baseline system. Comparisons were made against traditional delay-and-sum and Griffiths-Jim adaptive beamforming techniques to further assess the effectiveness of this method.
文摘In this paper, we propose a locally enhanced PCANet neural network for fine-grained classification of vehicles. The proposed method adopts the PCANet unsupervised network with a smaller number of layers and simple parameters compared with the majority of state-of-the-art machine learning methods. It simplifies calculation steps and manual labeling, and enables vehicle types to be recognized without time-consuming training. Experimental results show that compared with the traditional pattern recognition methods and the multi-layer CNN methods, the proposed method achieves optimal balance in terms of varying scales of sample libraries, angle deviations, and training speed. It also indicates that introducing appropriate local features that have different scales from the general feature is very instrumental in improving recognition rate. The 7-angle in 180° (12-angle in 360°) classification modeling scheme is proven to be an effective approach, which can solve the problem of suffering decrease in recognition rate due to angle deviations, and add the recognition accuracy in practice.