As a crucial component of terrestrial ecosystems,urban forests play a pivotal role in protecting urban biodiversity by providing suitable habitats for acoustic spaces.Previous studies note that vegetation structure is...As a crucial component of terrestrial ecosystems,urban forests play a pivotal role in protecting urban biodiversity by providing suitable habitats for acoustic spaces.Previous studies note that vegetation structure is a key factor influencing bird sounds in urban forests;hence,adjusting the frequency composition may be a strategy for birds to avoid anthropogenic noise to mask their songs.However,it is unknown whether the response mechanisms of bird vocalizations to vegetation structure remain consistent despite being impacted by anthropogenic noise.It was hypothesized that anthropogenic noise in urban forests occupies the low-frequency space of bird songs,leading to a possible reshaping of the acoustic niches of forests,and the vegetation structure of urban forests is the critical factor that shapes the acoustic space for bird vocalization.Passive acoustic monitoring in various urban forests was used to monitor natural and anthropogenic noises,and sounds were classified into three acoustic scenes(bird sounds,human sounds,and bird-human sounds)to determine interconnections between bird sounds,anthropogenic noise,and vegetation structure.Anthropogenic noise altered the acoustic niche of urban forests by intruding into the low-frequency space used by birds,and vegetation structures related to volume(trunk volume and branch volume)and density(number of branches and leaf area index)significantly impact the diversity of bird sounds.Our findings indicate that the response to low and high frequency signals to vegetation structure is distinct.By clarifying this relationship,our results contribute to understanding of how vegetation structure influences bird sounds in urban forests impacted by anthropogenic noise.展开更多
Investigations into the aerodynamic properties of vertical sound barriers exposed to high-speed operations employ computational fluid dynamics.The primary focus of this research is to evaluate the influence of train s...Investigations into the aerodynamic properties of vertical sound barriers exposed to high-speed operations employ computational fluid dynamics.The primary focus of this research is to evaluate the influence of train speed and the distance(D)from the track centerline under various operating conditions.The findings elucidate a marked elevation in the aerodynamic effect amplitude on sound barriers as train speeds increase.In single-train passages,the aerodynamic effect amplitude manifests a direct relationship with the square of the train speed.When two trains pass each other,the aerodynamic amplitude intensifies due to an additional aerodynamic increment on the sound barrier.This increment exhibits an approximate quadratic correlation with the retrograde train speed.Notably,the impact of high-speed trains on sound barrier aerodynamics surpasses that of low-speed trains,and this discrepancy amplifies with larger speed differentials between trains.Moreover,the train-induced aerodynamic effect diminishes significantly with greater distance(D),with occurrences of pressure coefficient(CP)exceeding the standard thresholds during dual-train passages.This study culminates in the formulation of universal equations for quantifying the influence of train speed and distance(D)on sound barrier aerodynamic characteristics across various operational scenarios.展开更多
Purpose – The vibration of the rails is a significant source of railway rolling noise, often forming the dominantcomponent of noise in the important frequency region between 400 and 2000 Hz. The purpose of the paper ...Purpose – The vibration of the rails is a significant source of railway rolling noise, often forming the dominantcomponent of noise in the important frequency region between 400 and 2000 Hz. The purpose of the paper is toinvestigate the influence of the ground profile and the presence of the train body on the sound radiation fromthe rail.Design/methodology/approach – Two-dimensional boundary element calculations are used, in which therail vibration is the source. The ground profile and various different shapes of train body are introduced in themodel, and results are observed in terms of sound power and sound pressure. Comparisons are also made withvibro-acoustic measurements performed with and without a train present.Findings – The sound radiated by the rail in the absence of the train body is strongly attenuated by shieldingdue to the ballast shoulder. When the train body is present, the sound from the vertical rail motion is reflectedback down toward the track where it is partly absorbed by the ballast. Nevertheless, the sound pressure at thetrackside is increased by typically 0–5 dB. For the lateral vibration of the rail, the effects are much smaller. Oncethe sound power is known, the sound pressure with the train present can be approximated reasonably well withsimple line source directivities.Originality/value – Numerical models used to predict the sound radiation from railway rails have generallyneglected the influence of the ground profile and reflections from the underside of the train body on the soundpower and directivity of the rail. These effects are studied in a systematic way including comparisons with measurements.展开更多
With the development of ultra-wide coverage technology,multibeam echo-sounder(MBES)system has put forward higher requirements for localization accuracy and computational efficiency of ray tracing method.The classical ...With the development of ultra-wide coverage technology,multibeam echo-sounder(MBES)system has put forward higher requirements for localization accuracy and computational efficiency of ray tracing method.The classical equivalent sound speed profile(ESSP)method replaces the measured sound velocity profile(SVP)with a simple constant gradient SVP,reducing the computational workload of beam positioning.However,in deep-sea environment,the depth measurement error of this method rapidly increases from the central beam to the edge beam.By analyzing the positioning error of the ESSP method at edge beam,it is discovered that the positioning error increases monotonically with the incident angle,and the relationship between them could be expressed by polynomial function.Therefore,an error correction algorithm based on polynomial fitting is obtained.The simulation experiment conducted on an inclined seafloor shows that the proposed algorithm exhibits comparable efficiency to the original ESSP method,while significantly improving bathymetry accuracy by nearly eight times in the edge beam.展开更多
Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,...Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.展开更多
Dear Editor,This letter proposes a high-precision seafloor transponder positioning method based on the correction of sound speed profile(SSP)temporal variation.In the proposed method,the ocean sound speed error is mod...Dear Editor,This letter proposes a high-precision seafloor transponder positioning method based on the correction of sound speed profile(SSP)temporal variation.In the proposed method,the ocean sound speed error is modeled as the temporal variation of a background SSP,and the linearized expression of the acoustic travel time with respect to the sound speed coefficient is derived based on the ray acoustic model.Moreover,the proposed method introduces the constraint of acoustic ranging observations between seafloor transponders and determines the weights of travel time and ranging observations using Akaike’s Bayesian information criterion(ABIC)to reduce the positioning error caused by the correlation between sound speed and position parameters.The experimental results in the South China Sea show that the proposed method performs better than the global navigation satellite system-acoustic ranging combined positioning solver(GARPOS)[1],in terms of rigid distance errors and long baseline positioning accuracy.展开更多
At present,GNSS-Acoustic(GNSS-A)combined technology is widely used in positioning for seafloor geodetic stations.Based on Sound Velocity Profiles(SVPs)data,the equal gradient acoustic ray-tracing method is applied in ...At present,GNSS-Acoustic(GNSS-A)combined technology is widely used in positioning for seafloor geodetic stations.Based on Sound Velocity Profiles(SVPs)data,the equal gradient acoustic ray-tracing method is applied in high-precision position inversion.However,because of the discreteness of the SVPs used in the forementioned method,it ignores the continuous variation of sound velocity structure in time domain,which worsens the positioning accuracy.In this study,the time-domain variation of Sound Speed Structure(SSS)has been considered,and the cubic B-spline function is applied to characterize the perturbed sound velocity.Based on the ray-tracing theory,an inversion model of“stepwise iteration&progressive corrections”for both positioning and sound speed information is proposed,which conducts the gradual correction of seafloor geodetic station coordinates and disturbed sound velocity.The practical data was used to test the effectiveness of our method.The results show that the Root Mean Square(RMS)errors of the residual values of the traditional methods without sound velocity correction,based on quadratic polynomial correction and based on cubic B-spline function correction are 1.43 ms,0.44 ms and 0.21 ms,respectively.The inversion model with sound velocity correction can effectively eliminate the systematic error caused by the change of SSS,and significantly improve the positioning accuracy of the seafloor geodetic stations.展开更多
Video analytics is an integral part of surveillance cameras. Comparedto video analytics, audio analytics offers several benefits, includingless expensive equipment and upkeep expenses. Additionally, the volume ofthe a...Video analytics is an integral part of surveillance cameras. Comparedto video analytics, audio analytics offers several benefits, includingless expensive equipment and upkeep expenses. Additionally, the volume ofthe audio datastream is substantially lower than the video camera datastream,especially concerning real-time operating systems, which makes it lessdemanding of the data channel’s bandwidth needs. For instance, automaticlive video streaming from the site of an explosion and gunshot to the policeconsole using audio analytics technologies would be exceedingly helpful forurban surveillance. Technologies for audio analytics may also be used toanalyze video recordings and identify occurrences. This research proposeda deep learning model based on the combination of convolutional neuralnetwork (CNN) and recurrent neural network (RNN) known as the CNNRNNapproach. The proposed model focused on automatically identifyingpulse sounds that indicate critical situations in audio sources. The algorithm’saccuracy ranged from 95% to 81% when classifying noises from incidents,including gunshots, explosions, shattered glass, sirens, cries, and dog barking.The proposed approach can be applied to provide security for citizens in openand closed locations, like stadiums, underground areas, shopping malls, andother places.展开更多
Similar to air reverberation chambers, non-anechoic water tanks are important acoustic measurement devices that can be used to measure the sound power radiated from complex underwater sound sources using diffusion fie...Similar to air reverberation chambers, non-anechoic water tanks are important acoustic measurement devices that can be used to measure the sound power radiated from complex underwater sound sources using diffusion field theory. However,the problem of the poor applicability of low-frequency measurements in these tanks has not yet been solved. Therefore,we propose a low-frequency acoustic measurement method based on sound-field correction(SFC) in an enclosed space that effectively solves the problem of measuring the sound power from complex sound sources below the Schroeder cutoff frequency in a non-anechoic tank. Using normal mode theory, the transfer relationship between the mean-square sound pressure in an underwater enclosed space and the free-field sound power of the sound source is established, and this is regarded as a correction term for the sound field between this enclosed space and the free field. This correction term can be obtained based on previous measurements of a known sound source. This term can then be used to correct the mean-square sound pressure excited by any sound source to be tested in this enclosed space and equivalently obtain its free-field sound power. Experiments were carried out in a non-anechoic water tank(9.0 m × 3.1 m × 1.7 m) to confirm the validity of the SFC method. Through measurements with a spherical sound source(whose free-field radiation characteristics are known),the correction term of the sound field between this water tank and the free field was obtained. On this basis, the sound power radiated from a cylindrical shell model under the action of mechanical excitation was measured. The measurement results were found to have a maximum deviation of 2.9 d B from the free-field results. These results show that the SFC method has good applicability in the frequency band above the first-order resonant frequency in a non-anechoic tank. This greatly expands the potential low-frequency applications of non-anechoic tanks.展开更多
Over the past two decades,research on the subject of noise pollution and urban soundscapes has seen significant growth[1,2].The goal of these studies was to gain a better understanding of the urban acoustic environmen...Over the past two decades,research on the subject of noise pollution and urban soundscapes has seen significant growth[1,2].The goal of these studies was to gain a better understanding of the urban acoustic environment by employing various methodologies and techniques to delve into the complexity of this topic.These research efforts have primarily revolved around two fundamental axes[3].On one hand,the first axis focused on combating noise pollution[4–6],emphasizing the reduction of unwanted sounds and compliance with sound levels set by environmental and health protection organizations[7,8].展开更多
The high-fidelity reconstruction of sound speeds is crucial for predicting acoustic propagation in shallow water where internal solitary waves(ISWs)are prevalent.Mapping temperatures from time series to spatial fields...The high-fidelity reconstruction of sound speeds is crucial for predicting acoustic propagation in shallow water where internal solitary waves(ISWs)are prevalent.Mapping temperatures from time series to spatial fields is an approach widely used to reproduce the sound speed perturbed by deformed internal waves.However,wave-shape distortions are inherent in the modeling results.This paper analyzes the formation mechanism and dynamic behavior of the distorted waveform that is shown to arise from the mismatch between the modeled and real propagation speeds of individual solitons within an ISW packet.To mitigate distortions,a reconstruction method incorporating the dispersion property of an ISW train is proposed here.The principle is to assign each soliton a real speed observed in the experiment.Then,the modeled solitons propagate at their intrinsic speeds,and the packet disperses naturally with time.The method is applied to reconstruct the sound speed perturbed by ISWs in the South China Sea.The mean and median of the root-mean-square error between the reconstructed and measured sound speeds are below 2 m/s.The modeled shape deformations and packet dispersion agree well with observations,and the waveform distortion is reduced compared with the original method.This work ensures the high fidelity of waveguide-environment reconstructions and facilitates the investigation of sound propagation in the future.展开更多
With the consumption of terrestrial metal resources,the exploitation of deep-sea polymetallic nodule minerals has been widely concerned around the world.Therefore,the environmental impact of deep-sea polymetallic nodu...With the consumption of terrestrial metal resources,the exploitation of deep-sea polymetallic nodule minerals has been widely concerned around the world.Therefore,the environmental impact of deep-sea polymetallic nodule mining cannot be ignored.However,duo to the lacks in stable and safe deep-sea(the depth>1000 m)vertical profile observation systems and consequently in long-term in-situ observation data,the sound speed and dissolved oxygen and the other water environment factors in the deposition areas of polymetallic nodules remains poorly understood.In this study,a deep-sea in-situ observation system was designed and deployed,and the water environment data of the polymetallic nodule deposition area were collected and analyzed.Result shows that the dissolved oxygen in the depth of 0–600 m was mainly affected by biological factors,while that in the area deeper than 600 m was affected by physical factors.The sound speed in the water body was mainly affected by temperature and pressure.At depths below 840 m,the sound speed is mainly controlled by temperature,and at depths between 840 m and 5700 m,the sound speed is mainly controlled by pressure.The correlations of sound speed vs.pressure and vs.temperature were regressed into equation.The resuspension of sediments rich in various metals may result in the reduction of dissolved oxygen and the improvement of redox potential.This environmental impact caused by a single sediment resuspension could last for 24 h or more.These findings enrich the understanding of the background value of the water environment in the polymetallic nodule deposition area.展开更多
The time-domain inverse technique based on the time-domain rotating equivalent source method has been proposed to localize and quantify rotating sound sources. However, this technique encounters two problems to be add...The time-domain inverse technique based on the time-domain rotating equivalent source method has been proposed to localize and quantify rotating sound sources. However, this technique encounters two problems to be addressed: one is the time-consuming process of solving the transcendental equation at each time step, and the other is the difculty of controlling the instability problem due to the time-varying transfer matrix. In view of that, an improved technique is proposed in this paper to resolve these two problems. In the improved technique, a de-Dopplerization method in the time-domain rotating reference frame is frst applied to eliminate the Doppler efect caused by the source rotation in the measured pressure signals, and then the restored pressure signals without the Doppler efect are used as the inputs of the time-domain stationary equivalent source method to locate and quantify sound sources. Compared with the original technique, the improved technique can avoid solving the transcendental equation at each time step, and facilitate the treatment of the instability problem because the transfer matrix does not change with time. Numerical simulation and experimental results show that the improved technique can eliminate the Doppler efect efectively, and then localize and quantify the rotating nonstationary or broadband sources accurately. The results also demonstrate that the improved technique can guarantee a more stable reconstruction and compute more efciently than the original one.展开更多
The neuromorphic systems for sound perception is under highly demanding for the future bioinspired electronics and humanoid robots.However,the sound perception based on volume,tone and timbre remains unknown.Herein,or...The neuromorphic systems for sound perception is under highly demanding for the future bioinspired electronics and humanoid robots.However,the sound perception based on volume,tone and timbre remains unknown.Herein,organic optoelectronic synapses(OOSs)are constructed for unprecedented sound recognition.The volume,tone and timbre of sound can be regulated appropriately by the input signal of voltages,frequencies and light intensities of OOSs,according to the amplitude,frequency,and waveform of the sound.The quantitative relation between recognition factor(ζ)and postsynaptic current(I=I_(light)−I_(dark))is established to achieve sound perception.Interestingly,the bell sound for University of Chinese Academy of Sciences is recognized with an accuracy of 99.8%.The mechanism studies reveal that the impedance of the interfacial layers play a critical role in the synaptic performances.This contribution presents unprecedented artificial synapses for sound perception at hardware levels.展开更多
Due to the large amount of unused and unexplored spectrum resources, the so-called subTerahertz(sub-THz) frequency bands from 100 to 300 GHz are seen as promising bands for the next generation of wireless communicatio...Due to the large amount of unused and unexplored spectrum resources, the so-called subTerahertz(sub-THz) frequency bands from 100 to 300 GHz are seen as promising bands for the next generation of wireless communication systems. Channel modeling at sub-THz bands is essential for the design and deployment of future wireless communication systems. Channel measurement is a widely adopted method to obtain channel characteristics and establish mathematical channel models. Channel measurements depend on the design and construction of channel sounders. Thus, reliable channel sounding techniques and accurate channel measurements are required. In this paper, the requirements of an ideal channel sounder are discussed and the main channel sounding techniques are described for the subTHz frequency bands. The state-of-the-art sub-THz channel sounders reported in the literature and respective channel measurements are presented. Moreover, a vector network analyzer(VNA) based channel sounder, which supports frequency bands from 220 to330 GHz is presented and its performance capability and limitation are evaluated. This paper also discussed the challenge and future outlook of the sub-THz channel sounders and measurements.展开更多
Respiratory infections in children increase the risk of fatal lung disease,making effective identification and analysis of breath sounds essential.However,most studies have focused on adults ignoring pediatric patient...Respiratory infections in children increase the risk of fatal lung disease,making effective identification and analysis of breath sounds essential.However,most studies have focused on adults ignoring pediatric patients whose lungs are more vulnerable due to an imperfect immune system,and the scarcity of medical data has limited the development of deep learning methods toward reliability and high classification accuracy.In this work,we collected three types of breath sounds from children with normal(120 recordings),bronchitis(120 recordings),and pneumonia(120 recordings)at the posterior chest position using an off-the-shelf 3M electronic stethoscope.Three features were extracted from the wavelet denoised signal:spectrogram,mel-frequency cepstral coefficients(MFCCs),and Delta MFCCs.The recog-nition model is based on transfer learning techniques and combines fine-tuned MobileNetV2 and modified ResNet50 to classify breath sounds,along with software for displaying analysis results.Extensive experiments on a real dataset demonstrate the effectiveness and superior performance of the proposed model,with average accuracy,precision,recall,specificity and F1 scores of 97.96%,97.83%,97.89%,98.89%and 0.98,respectively,achieving superior performance with a small dataset.The proposed detection system,with a high-performance model and software,can help parents perform lung screening at home and also has the potential for a vast screening of children for lung disease.展开更多
It is essential to ac quire sound speed profiles(SSPs)in high-precision spatiotemporal resolution for undersea acoustic activities.However,conventional observation methods cannot obtain high-resolution SSPs.Besides,S ...It is essential to ac quire sound speed profiles(SSPs)in high-precision spatiotemporal resolution for undersea acoustic activities.However,conventional observation methods cannot obtain high-resolution SSPs.Besides,S SPs are complex and changeable in time and space,especially in coastal areas.We proposed a new space-time multigrid three-dimensional variational method with weak constraint term(referred to as STC-MG3DVar)to construct high-precision spatiotemporal resolution SSPs in coastal areas,in which sound velocity is defined as the analytical variable,and the Chen-Millero sound velocity empirical formula is introduced as a weak constraint term into the cost function of the STC-MG3DVar.The spatiotemporal correlation of sound velocity observations is taken into account in the STC-MG3DVar method,and the multi-scale information of sound velocity observations from long waves to short waves can be successively extracted.The weak constraint term can optimize sound velocity by the physical relationship between sound velocity and temperature-salinity to obtain more reasonable and accurate SSPs.To verify the accuracy of the STC-MG3DVar,SSPs observations and CTD observations(temperature observations,salinity observations)are obtained from field experiments in the northern coastal area of the Shandong Peninsula.The average root mean square error(RMSE)of the STC-MG3DVar-constructed SSPs is 0.132 m/s,and the STC-MG3DVar method can improve the SSPs construction accuracy over the space-time multigrid 3DVar without weak constraint term(ST-MG3DVar)by 10.14%and over the spatial multigrid 3DVar with weak constraint term(SC-MG3DVar)by 44.19%.With the advantage of the constraint term and the spatiotemporal correlation information,the proposed STC-MG3DVar method works better than the ST-MG3DVar and the SCMG3DVar in constructing high-precision spatiotemporal re solution SSPs.展开更多
Nearfield acoustic holography(NAH)is a powerful tool for realizing source identification and sound field reconstruction.The wave superposition(WS)-based NAH is appropriate for the spatially extended sources and does n...Nearfield acoustic holography(NAH)is a powerful tool for realizing source identification and sound field reconstruction.The wave superposition(WS)-based NAH is appropriate for the spatially extended sources and does not require the complex numerical integrals.Equivalent source method(ESM),as a classical WS approach,is widely used due to its simplicity and efficiency.In the ESM,a virtual source surface is introduced,on which the virtual point sources are taken as the assumed sources,and an optimal retreat distance needs to be considered.A newly proposed WS-based approach,the element radiation superposition method(ERSM),uses piston surface source as the assumed source with no need to choose a virtual source surface.To satisfy the application conditions of piston pressure formula,the sizes of pistons are assumed to be as small as possible,which results in a large number of pistons and sampling points.In this paper,transfer matrix modes(TMMs),which are composed of the singular vectors of the vibro-acoustic transfer matrix,are used as the sparse basis of piston normal velocities.Then,the compressive ERSM based on TMMs is proposed.Compared with the conventional ERSM,the proposed method maintains a good pressure reconstruction when the number of sampling points and pistons are both reduced.Besides,the proposed method is compared with the compressive ESM in a mathematical sense.Both simulations and experiments for a rectangular plate demonstrate the advantage of the proposed method over the existing methods.展开更多
Lattice structures have drawn much attention in engineering applications due to their lightweight and multi-functional properties.In this work,a mathematical design approach for functionally graded(FG)and helicoidal l...Lattice structures have drawn much attention in engineering applications due to their lightweight and multi-functional properties.In this work,a mathematical design approach for functionally graded(FG)and helicoidal lattice structures with triply periodic minimal surfaces is proposed.Four types of lattice structures including uniform,helicoidal,FG,and combined FG and helicoidal are fabricated by the additive manufacturing technology.The deformation behaviors,mechanical properties,energy absorption,and acoustic properties of lattice samples are thoroughly investigated.The load-bearing capability of helicoidal lattice samples is gradually improved in the plateau stage,leading to the plateau stress and total energy absorption improved by over 26.9%and 21.2%compared to the uniform sample,respectively.This phenomenon was attributed to the helicoidal design reduces the gap in unit cells and enhances fracture resistance.For acoustic properties,the design of helicoidal reduces the resonance frequency and improves the peak of absorption coefficient,while the FG design mainly influences the peak of absorption coefficient.Across broad range of frequency from 1000 to 6300 Hz,the maximum value of absorption coefficient is improved by18.6%-30%,and the number of points higher than 0.6 increased by 55.2%-61.7%by combining the FG and helicoidal designs.This study provides a novel strategy to simultaneously improve energy absorption and sound absorption properties by controlling the internal architecture of lattice structures.展开更多
Environmental sound classification(ESC)involves the process of distinguishing an audio stream associated with numerous environmental sounds.Some common aspects such as the framework difference,overlapping of different...Environmental sound classification(ESC)involves the process of distinguishing an audio stream associated with numerous environmental sounds.Some common aspects such as the framework difference,overlapping of different sound events,and the presence of various sound sources during recording make the ESC task much more complicated and complex.This research is to propose a deep learning model to improve the recognition rate of environmental sounds and reduce the model training time under limited computation resources.In this research,the performance of transformer and convolutional neural networks(CNN)are investigated.Seven audio features,chromagram,Mel-spectrogram,tonnetz,Mel-Frequency Cepstral Coefficients(MFCCs),delta MFCCs,delta-delta MFCCs and spectral contrast,are extracted fromtheUrbanSound8K,ESC-50,and ESC-10,databases.Moreover,this research also employed three data enhancement methods,namely,white noise,pitch tuning,and time stretch to reduce the risk of overfitting issue due to the limited audio clips.The evaluation of various experiments demonstrates that the best performance was achieved by the proposed transformer model using seven audio features on enhanced database.For UrbanSound8K,ESC-50,and ESC-10,the highest attained accuracies are 0.98,0.94,and 0.97 respectively.The experimental results reveal that the proposed technique can achieve the best performance for ESC problems.展开更多
基金the National Natural Science Foundation of China(32201338)Science Technology Program from the Forestry Administration of Guangdong Province(2021KJCX017)+1 种基金Guangzhou Municipal Science and Technology Bureau Program(2023A04J0086)Shenzhen Key Laboratory of Southern Subtropical Plant Diversity。
文摘As a crucial component of terrestrial ecosystems,urban forests play a pivotal role in protecting urban biodiversity by providing suitable habitats for acoustic spaces.Previous studies note that vegetation structure is a key factor influencing bird sounds in urban forests;hence,adjusting the frequency composition may be a strategy for birds to avoid anthropogenic noise to mask their songs.However,it is unknown whether the response mechanisms of bird vocalizations to vegetation structure remain consistent despite being impacted by anthropogenic noise.It was hypothesized that anthropogenic noise in urban forests occupies the low-frequency space of bird songs,leading to a possible reshaping of the acoustic niches of forests,and the vegetation structure of urban forests is the critical factor that shapes the acoustic space for bird vocalization.Passive acoustic monitoring in various urban forests was used to monitor natural and anthropogenic noises,and sounds were classified into three acoustic scenes(bird sounds,human sounds,and bird-human sounds)to determine interconnections between bird sounds,anthropogenic noise,and vegetation structure.Anthropogenic noise altered the acoustic niche of urban forests by intruding into the low-frequency space used by birds,and vegetation structures related to volume(trunk volume and branch volume)and density(number of branches and leaf area index)significantly impact the diversity of bird sounds.Our findings indicate that the response to low and high frequency signals to vegetation structure is distinct.By clarifying this relationship,our results contribute to understanding of how vegetation structure influences bird sounds in urban forests impacted by anthropogenic noise.
基金This study was supported in part by the National Natural Science Foundation of China under Grant Nos.52278463,52208505,and 52202422.
文摘Investigations into the aerodynamic properties of vertical sound barriers exposed to high-speed operations employ computational fluid dynamics.The primary focus of this research is to evaluate the influence of train speed and the distance(D)from the track centerline under various operating conditions.The findings elucidate a marked elevation in the aerodynamic effect amplitude on sound barriers as train speeds increase.In single-train passages,the aerodynamic effect amplitude manifests a direct relationship with the square of the train speed.When two trains pass each other,the aerodynamic amplitude intensifies due to an additional aerodynamic increment on the sound barrier.This increment exhibits an approximate quadratic correlation with the retrograde train speed.Notably,the impact of high-speed trains on sound barrier aerodynamics surpasses that of low-speed trains,and this discrepancy amplifies with larger speed differentials between trains.Moreover,the train-induced aerodynamic effect diminishes significantly with greater distance(D),with occurrences of pressure coefficient(CP)exceeding the standard thresholds during dual-train passages.This study culminates in the formulation of universal equations for quantifying the influence of train speed and distance(D)on sound barrier aerodynamic characteristics across various operational scenarios.
基金supported by the TRANSIT project(funded by EU Horizon 2020 and the Europe’s Rail Joint Undertaking under Grant Agreement 881771).
文摘Purpose – The vibration of the rails is a significant source of railway rolling noise, often forming the dominantcomponent of noise in the important frequency region between 400 and 2000 Hz. The purpose of the paper is toinvestigate the influence of the ground profile and the presence of the train body on the sound radiation fromthe rail.Design/methodology/approach – Two-dimensional boundary element calculations are used, in which therail vibration is the source. The ground profile and various different shapes of train body are introduced in themodel, and results are observed in terms of sound power and sound pressure. Comparisons are also made withvibro-acoustic measurements performed with and without a train present.Findings – The sound radiated by the rail in the absence of the train body is strongly attenuated by shieldingdue to the ballast shoulder. When the train body is present, the sound from the vertical rail motion is reflectedback down toward the track where it is partly absorbed by the ballast. Nevertheless, the sound pressure at thetrackside is increased by typically 0–5 dB. For the lateral vibration of the rail, the effects are much smaller. Oncethe sound power is known, the sound pressure with the train present can be approximated reasonably well withsimple line source directivities.Originality/value – Numerical models used to predict the sound radiation from railway rails have generallyneglected the influence of the ground profile and reflections from the underside of the train body on the soundpower and directivity of the rail. These effects are studied in a systematic way including comparisons with measurements.
基金The Natural Science Foundation of Shandong Province of China under contract Nos ZR2022MA051 and ZR2020MA090the National Natural Science Foundation of China under contract No.U22A2012+2 种基金China Postdoctoral Science Foundation under contract No.2020M670891the SDUST Research Fund under contract No.2019TDJH103the Talent Introduction Plan for Youth Innovation Team in universities of Shandong Province(innovation team of satellite positioning and navigation)。
文摘With the development of ultra-wide coverage technology,multibeam echo-sounder(MBES)system has put forward higher requirements for localization accuracy and computational efficiency of ray tracing method.The classical equivalent sound speed profile(ESSP)method replaces the measured sound velocity profile(SVP)with a simple constant gradient SVP,reducing the computational workload of beam positioning.However,in deep-sea environment,the depth measurement error of this method rapidly increases from the central beam to the edge beam.By analyzing the positioning error of the ESSP method at edge beam,it is discovered that the positioning error increases monotonically with the incident angle,and the relationship between them could be expressed by polynomial function.Therefore,an error correction algorithm based on polynomial fitting is obtained.The simulation experiment conducted on an inclined seafloor shows that the proposed algorithm exhibits comparable efficiency to the original ESSP method,while significantly improving bathymetry accuracy by nearly eight times in the edge beam.
基金supported by the National Natural Science Foundation of China(61877067)the Foundation of Science and Technology on Near-Surface Detection Laboratory(TCGZ2019A002,TCGZ2021C003,6142414200511)the Natural Science Basic Research Program of Shaanxi(2021JZ-19)。
文摘Acoustic source localization(ASL)and sound event detection(SED)are two widely pursued independent research fields.In recent years,in order to achieve a more complete spatial and temporal representation of sound field,sound event localization and detection(SELD)has become a very active research topic.This paper presents a deep learning-based multioverlapping sound event localization and detection algorithm in three-dimensional space.Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features.These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively.The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features.Finally,a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm.Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.
基金This work was supported by Wenhai Program of the S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology(Qingdao)(2021WHZZB 1003).
文摘Dear Editor,This letter proposes a high-precision seafloor transponder positioning method based on the correction of sound speed profile(SSP)temporal variation.In the proposed method,the ocean sound speed error is modeled as the temporal variation of a background SSP,and the linearized expression of the acoustic travel time with respect to the sound speed coefficient is derived based on the ray acoustic model.Moreover,the proposed method introduces the constraint of acoustic ranging observations between seafloor transponders and determines the weights of travel time and ranging observations using Akaike’s Bayesian information criterion(ABIC)to reduce the positioning error caused by the correlation between sound speed and position parameters.The experimental results in the South China Sea show that the proposed method performs better than the global navigation satellite system-acoustic ranging combined positioning solver(GARPOS)[1],in terms of rigid distance errors and long baseline positioning accuracy.
基金National Natural Science Foundation of China(Nos.41931076,42174020)Laoshan Laboratory(No.LSKJ202205101)State Key Laboratory of Geo-Information Engineering(No.SKLGIE2020-M-1-1)。
文摘At present,GNSS-Acoustic(GNSS-A)combined technology is widely used in positioning for seafloor geodetic stations.Based on Sound Velocity Profiles(SVPs)data,the equal gradient acoustic ray-tracing method is applied in high-precision position inversion.However,because of the discreteness of the SVPs used in the forementioned method,it ignores the continuous variation of sound velocity structure in time domain,which worsens the positioning accuracy.In this study,the time-domain variation of Sound Speed Structure(SSS)has been considered,and the cubic B-spline function is applied to characterize the perturbed sound velocity.Based on the ray-tracing theory,an inversion model of“stepwise iteration&progressive corrections”for both positioning and sound speed information is proposed,which conducts the gradual correction of seafloor geodetic station coordinates and disturbed sound velocity.The practical data was used to test the effectiveness of our method.The results show that the Root Mean Square(RMS)errors of the residual values of the traditional methods without sound velocity correction,based on quadratic polynomial correction and based on cubic B-spline function correction are 1.43 ms,0.44 ms and 0.21 ms,respectively.The inversion model with sound velocity correction can effectively eliminate the systematic error caused by the change of SSS,and significantly improve the positioning accuracy of the seafloor geodetic stations.
基金funded by the project,“Design and implementation of real-time safety ensuring system in the indoor environment by applying machine learning techniques”.IRN:AP14971555.
文摘Video analytics is an integral part of surveillance cameras. Comparedto video analytics, audio analytics offers several benefits, includingless expensive equipment and upkeep expenses. Additionally, the volume ofthe audio datastream is substantially lower than the video camera datastream,especially concerning real-time operating systems, which makes it lessdemanding of the data channel’s bandwidth needs. For instance, automaticlive video streaming from the site of an explosion and gunshot to the policeconsole using audio analytics technologies would be exceedingly helpful forurban surveillance. Technologies for audio analytics may also be used toanalyze video recordings and identify occurrences. This research proposeda deep learning model based on the combination of convolutional neuralnetwork (CNN) and recurrent neural network (RNN) known as the CNNRNNapproach. The proposed model focused on automatically identifyingpulse sounds that indicate critical situations in audio sources. The algorithm’saccuracy ranged from 95% to 81% when classifying noises from incidents,including gunshots, explosions, shattered glass, sirens, cries, and dog barking.The proposed approach can be applied to provide security for citizens in openand closed locations, like stadiums, underground areas, shopping malls, andother places.
基金the National Natural Science Foundation of China (Grant No. 11874131)Open Fund Project of Key Laboratory of Underwater Acoustic Countermeasures Technology (Grant No. 2021-JCJQ-LB033-05)。
文摘Similar to air reverberation chambers, non-anechoic water tanks are important acoustic measurement devices that can be used to measure the sound power radiated from complex underwater sound sources using diffusion field theory. However,the problem of the poor applicability of low-frequency measurements in these tanks has not yet been solved. Therefore,we propose a low-frequency acoustic measurement method based on sound-field correction(SFC) in an enclosed space that effectively solves the problem of measuring the sound power from complex sound sources below the Schroeder cutoff frequency in a non-anechoic tank. Using normal mode theory, the transfer relationship between the mean-square sound pressure in an underwater enclosed space and the free-field sound power of the sound source is established, and this is regarded as a correction term for the sound field between this enclosed space and the free field. This correction term can be obtained based on previous measurements of a known sound source. This term can then be used to correct the mean-square sound pressure excited by any sound source to be tested in this enclosed space and equivalently obtain its free-field sound power. Experiments were carried out in a non-anechoic water tank(9.0 m × 3.1 m × 1.7 m) to confirm the validity of the SFC method. Through measurements with a spherical sound source(whose free-field radiation characteristics are known),the correction term of the sound field between this water tank and the free field was obtained. On this basis, the sound power radiated from a cylindrical shell model under the action of mechanical excitation was measured. The measurement results were found to have a maximum deviation of 2.9 d B from the free-field results. These results show that the SFC method has good applicability in the frequency band above the first-order resonant frequency in a non-anechoic tank. This greatly expands the potential low-frequency applications of non-anechoic tanks.
文摘Over the past two decades,research on the subject of noise pollution and urban soundscapes has seen significant growth[1,2].The goal of these studies was to gain a better understanding of the urban acoustic environment by employing various methodologies and techniques to delve into the complexity of this topic.These research efforts have primarily revolved around two fundamental axes[3].On one hand,the first axis focused on combating noise pollution[4–6],emphasizing the reduction of unwanted sounds and compliance with sound levels set by environmental and health protection organizations[7,8].
基金Project supported by the National Natural Science Foundation of China (Grant Nos.11534009,11904342,and 12274348)。
文摘The high-fidelity reconstruction of sound speeds is crucial for predicting acoustic propagation in shallow water where internal solitary waves(ISWs)are prevalent.Mapping temperatures from time series to spatial fields is an approach widely used to reproduce the sound speed perturbed by deformed internal waves.However,wave-shape distortions are inherent in the modeling results.This paper analyzes the formation mechanism and dynamic behavior of the distorted waveform that is shown to arise from the mismatch between the modeled and real propagation speeds of individual solitons within an ISW packet.To mitigate distortions,a reconstruction method incorporating the dispersion property of an ISW train is proposed here.The principle is to assign each soliton a real speed observed in the experiment.Then,the modeled solitons propagate at their intrinsic speeds,and the packet disperses naturally with time.The method is applied to reconstruct the sound speed perturbed by ISWs in the South China Sea.The mean and median of the root-mean-square error between the reconstructed and measured sound speeds are below 2 m/s.The modeled shape deformations and packet dispersion agree well with observations,and the waveform distortion is reduced compared with the original method.This work ensures the high fidelity of waveguide-environment reconstructions and facilitates the investigation of sound propagation in the future.
基金Supported by the National Natural Science Foundation of China(No.42107157)the Laboratory for Marine Geology,Qingdao National Laboratory for Marine Science and Technology(No.MGQNLM-KF202101)+1 种基金the Fundamental Research Funds for the Central Universities,SCUT(No.21CX06016A)the Harbin Engineering University at Qingdao(No.2022-SXZN-CXJJ-04-06+01)。
文摘With the consumption of terrestrial metal resources,the exploitation of deep-sea polymetallic nodule minerals has been widely concerned around the world.Therefore,the environmental impact of deep-sea polymetallic nodule mining cannot be ignored.However,duo to the lacks in stable and safe deep-sea(the depth>1000 m)vertical profile observation systems and consequently in long-term in-situ observation data,the sound speed and dissolved oxygen and the other water environment factors in the deposition areas of polymetallic nodules remains poorly understood.In this study,a deep-sea in-situ observation system was designed and deployed,and the water environment data of the polymetallic nodule deposition area were collected and analyzed.Result shows that the dissolved oxygen in the depth of 0–600 m was mainly affected by biological factors,while that in the area deeper than 600 m was affected by physical factors.The sound speed in the water body was mainly affected by temperature and pressure.At depths below 840 m,the sound speed is mainly controlled by temperature,and at depths between 840 m and 5700 m,the sound speed is mainly controlled by pressure.The correlations of sound speed vs.pressure and vs.temperature were regressed into equation.The resuspension of sediments rich in various metals may result in the reduction of dissolved oxygen and the improvement of redox potential.This environmental impact caused by a single sediment resuspension could last for 24 h or more.These findings enrich the understanding of the background value of the water environment in the polymetallic nodule deposition area.
基金Supported by National Natural Science Foundation of China(Grant Nos.51875147,12174082,51675149)。
文摘The time-domain inverse technique based on the time-domain rotating equivalent source method has been proposed to localize and quantify rotating sound sources. However, this technique encounters two problems to be addressed: one is the time-consuming process of solving the transcendental equation at each time step, and the other is the difculty of controlling the instability problem due to the time-varying transfer matrix. In view of that, an improved technique is proposed in this paper to resolve these two problems. In the improved technique, a de-Dopplerization method in the time-domain rotating reference frame is frst applied to eliminate the Doppler efect caused by the source rotation in the measured pressure signals, and then the restored pressure signals without the Doppler efect are used as the inputs of the time-domain stationary equivalent source method to locate and quantify sound sources. Compared with the original technique, the improved technique can avoid solving the transcendental equation at each time step, and facilitate the treatment of the instability problem because the transfer matrix does not change with time. Numerical simulation and experimental results show that the improved technique can eliminate the Doppler efect efectively, and then localize and quantify the rotating nonstationary or broadband sources accurately. The results also demonstrate that the improved technique can guarantee a more stable reconstruction and compute more efciently than the original one.
基金supported by the NSFC(51925306 and 21774130)National Key R&D Program of China(2018FYA 0305800)+2 种基金Key Research Program of the Chinese Academy of Sciences(XDPB08-2)the Strategic Priority Research Program of Chinese Academy of Sciences(XDB28000000)University of Chinese Academy of Sciences.
文摘The neuromorphic systems for sound perception is under highly demanding for the future bioinspired electronics and humanoid robots.However,the sound perception based on volume,tone and timbre remains unknown.Herein,organic optoelectronic synapses(OOSs)are constructed for unprecedented sound recognition.The volume,tone and timbre of sound can be regulated appropriately by the input signal of voltages,frequencies and light intensities of OOSs,according to the amplitude,frequency,and waveform of the sound.The quantitative relation between recognition factor(ζ)and postsynaptic current(I=I_(light)−I_(dark))is established to achieve sound perception.Interestingly,the bell sound for University of Chinese Academy of Sciences is recognized with an accuracy of 99.8%.The mechanism studies reveal that the impedance of the interfacial layers play a critical role in the synaptic performances.This contribution presents unprecedented artificial synapses for sound perception at hardware levels.
基金supported by the EURAMET European Partnership on Metrology(EPM),under the 21NRM03 Metrology for Emerging Wireless Standards(MEWS)projectThe project(21NRM03 MEWS)has received funding from the EPM,co-financed from the European Union’s Horizon Europe Research and Innovation Programme,and by the Participating States。
文摘Due to the large amount of unused and unexplored spectrum resources, the so-called subTerahertz(sub-THz) frequency bands from 100 to 300 GHz are seen as promising bands for the next generation of wireless communication systems. Channel modeling at sub-THz bands is essential for the design and deployment of future wireless communication systems. Channel measurement is a widely adopted method to obtain channel characteristics and establish mathematical channel models. Channel measurements depend on the design and construction of channel sounders. Thus, reliable channel sounding techniques and accurate channel measurements are required. In this paper, the requirements of an ideal channel sounder are discussed and the main channel sounding techniques are described for the subTHz frequency bands. The state-of-the-art sub-THz channel sounders reported in the literature and respective channel measurements are presented. Moreover, a vector network analyzer(VNA) based channel sounder, which supports frequency bands from 220 to330 GHz is presented and its performance capability and limitation are evaluated. This paper also discussed the challenge and future outlook of the sub-THz channel sounders and measurements.
基金funded by the Scientific Research Starting Foundation of Hainan University(KYQD1882)the Flexible Introduction Scientific Research Starting Foundation of Hainan University(2020.11-2025.10).
文摘Respiratory infections in children increase the risk of fatal lung disease,making effective identification and analysis of breath sounds essential.However,most studies have focused on adults ignoring pediatric patients whose lungs are more vulnerable due to an imperfect immune system,and the scarcity of medical data has limited the development of deep learning methods toward reliability and high classification accuracy.In this work,we collected three types of breath sounds from children with normal(120 recordings),bronchitis(120 recordings),and pneumonia(120 recordings)at the posterior chest position using an off-the-shelf 3M electronic stethoscope.Three features were extracted from the wavelet denoised signal:spectrogram,mel-frequency cepstral coefficients(MFCCs),and Delta MFCCs.The recog-nition model is based on transfer learning techniques and combines fine-tuned MobileNetV2 and modified ResNet50 to classify breath sounds,along with software for displaying analysis results.Extensive experiments on a real dataset demonstrate the effectiveness and superior performance of the proposed model,with average accuracy,precision,recall,specificity and F1 scores of 97.96%,97.83%,97.89%,98.89%and 0.98,respectively,achieving superior performance with a small dataset.The proposed detection system,with a high-performance model and software,can help parents perform lung screening at home and also has the potential for a vast screening of children for lung disease.
基金Supported by the National Natural Science Foundation of China(No.41876014)the Open Project of Tianjin Key Laboratory of Oceanic Meteorology(No.2020TKLOMYB04)。
文摘It is essential to ac quire sound speed profiles(SSPs)in high-precision spatiotemporal resolution for undersea acoustic activities.However,conventional observation methods cannot obtain high-resolution SSPs.Besides,S SPs are complex and changeable in time and space,especially in coastal areas.We proposed a new space-time multigrid three-dimensional variational method with weak constraint term(referred to as STC-MG3DVar)to construct high-precision spatiotemporal resolution SSPs in coastal areas,in which sound velocity is defined as the analytical variable,and the Chen-Millero sound velocity empirical formula is introduced as a weak constraint term into the cost function of the STC-MG3DVar.The spatiotemporal correlation of sound velocity observations is taken into account in the STC-MG3DVar method,and the multi-scale information of sound velocity observations from long waves to short waves can be successively extracted.The weak constraint term can optimize sound velocity by the physical relationship between sound velocity and temperature-salinity to obtain more reasonable and accurate SSPs.To verify the accuracy of the STC-MG3DVar,SSPs observations and CTD observations(temperature observations,salinity observations)are obtained from field experiments in the northern coastal area of the Shandong Peninsula.The average root mean square error(RMSE)of the STC-MG3DVar-constructed SSPs is 0.132 m/s,and the STC-MG3DVar method can improve the SSPs construction accuracy over the space-time multigrid 3DVar without weak constraint term(ST-MG3DVar)by 10.14%and over the spatial multigrid 3DVar with weak constraint term(SC-MG3DVar)by 44.19%.With the advantage of the constraint term and the spatiotemporal correlation information,the proposed STC-MG3DVar method works better than the ST-MG3DVar and the SCMG3DVar in constructing high-precision spatiotemporal re solution SSPs.
基金Project supported by the National Natural Science Foundation of China(Grant No.61701133)。
文摘Nearfield acoustic holography(NAH)is a powerful tool for realizing source identification and sound field reconstruction.The wave superposition(WS)-based NAH is appropriate for the spatially extended sources and does not require the complex numerical integrals.Equivalent source method(ESM),as a classical WS approach,is widely used due to its simplicity and efficiency.In the ESM,a virtual source surface is introduced,on which the virtual point sources are taken as the assumed sources,and an optimal retreat distance needs to be considered.A newly proposed WS-based approach,the element radiation superposition method(ERSM),uses piston surface source as the assumed source with no need to choose a virtual source surface.To satisfy the application conditions of piston pressure formula,the sizes of pistons are assumed to be as small as possible,which results in a large number of pistons and sampling points.In this paper,transfer matrix modes(TMMs),which are composed of the singular vectors of the vibro-acoustic transfer matrix,are used as the sparse basis of piston normal velocities.Then,the compressive ERSM based on TMMs is proposed.Compared with the conventional ERSM,the proposed method maintains a good pressure reconstruction when the number of sampling points and pistons are both reduced.Besides,the proposed method is compared with the compressive ESM in a mathematical sense.Both simulations and experiments for a rectangular plate demonstrate the advantage of the proposed method over the existing methods.
基金supported by the NUS R&G Postdoc Fellowship Program (No.A-0000065-76-00)the China Scholarship Council (No.202006050088)。
文摘Lattice structures have drawn much attention in engineering applications due to their lightweight and multi-functional properties.In this work,a mathematical design approach for functionally graded(FG)and helicoidal lattice structures with triply periodic minimal surfaces is proposed.Four types of lattice structures including uniform,helicoidal,FG,and combined FG and helicoidal are fabricated by the additive manufacturing technology.The deformation behaviors,mechanical properties,energy absorption,and acoustic properties of lattice samples are thoroughly investigated.The load-bearing capability of helicoidal lattice samples is gradually improved in the plateau stage,leading to the plateau stress and total energy absorption improved by over 26.9%and 21.2%compared to the uniform sample,respectively.This phenomenon was attributed to the helicoidal design reduces the gap in unit cells and enhances fracture resistance.For acoustic properties,the design of helicoidal reduces the resonance frequency and improves the peak of absorption coefficient,while the FG design mainly influences the peak of absorption coefficient.Across broad range of frequency from 1000 to 6300 Hz,the maximum value of absorption coefficient is improved by18.6%-30%,and the number of points higher than 0.6 increased by 55.2%-61.7%by combining the FG and helicoidal designs.This study provides a novel strategy to simultaneously improve energy absorption and sound absorption properties by controlling the internal architecture of lattice structures.
基金the Taif University Researchers Supporting Project number(TURSP-2020/36),Taif University,Taif,Saudi Arabia.
文摘Environmental sound classification(ESC)involves the process of distinguishing an audio stream associated with numerous environmental sounds.Some common aspects such as the framework difference,overlapping of different sound events,and the presence of various sound sources during recording make the ESC task much more complicated and complex.This research is to propose a deep learning model to improve the recognition rate of environmental sounds and reduce the model training time under limited computation resources.In this research,the performance of transformer and convolutional neural networks(CNN)are investigated.Seven audio features,chromagram,Mel-spectrogram,tonnetz,Mel-Frequency Cepstral Coefficients(MFCCs),delta MFCCs,delta-delta MFCCs and spectral contrast,are extracted fromtheUrbanSound8K,ESC-50,and ESC-10,databases.Moreover,this research also employed three data enhancement methods,namely,white noise,pitch tuning,and time stretch to reduce the risk of overfitting issue due to the limited audio clips.The evaluation of various experiments demonstrates that the best performance was achieved by the proposed transformer model using seven audio features on enhanced database.For UrbanSound8K,ESC-50,and ESC-10,the highest attained accuracies are 0.98,0.94,and 0.97 respectively.The experimental results reveal that the proposed technique can achieve the best performance for ESC problems.