A simple and efficient algorithm is presented to separate concurrent speeches. The parameters of mixed speeches are estimated by searching in the neighbor area of given pitches to minimize the error between the origin...A simple and efficient algorithm is presented to separate concurrent speeches. The parameters of mixed speeches are estimated by searching in the neighbor area of given pitches to minimize the error between the original and the synthetic spectrums. The effectiveness of the proposed algorithm to separate close frequencies is demonstrated.展开更多
Abstract: A hierarchical method for scene analysis in audio sensor networks is proposed. This meth-od consists of two stages: element detection stage and audio scene analysis stage. In the former stage, the basic au...Abstract: A hierarchical method for scene analysis in audio sensor networks is proposed. This meth-od consists of two stages: element detection stage and audio scene analysis stage. In the former stage, the basic audio elements are modeled by the HMM models and trained by enough samples off-line, and we adaptively add or remove basic ele- ment from the targeted element pool according to the time, place and other environment parameters. In the latter stage, a data fusion algorithm is used to combine the sensory information of the same ar-ea, and then, a role-based method is employed to analyze the audio scene based on the fused data. We conduct some experiments to evaluate the per-formance of the proposed method that about 70% audio scenes can be detected correctly by this method. The experiment evaluations demonstrate that our method can achieve satisfactory results.展开更多
Crowded scene analysis is currently a hot and challenging topic in computer vision field. The ability to analyze motion patterns from videos is a difficult, but critical part of this problem. In this paper, we propose...Crowded scene analysis is currently a hot and challenging topic in computer vision field. The ability to analyze motion patterns from videos is a difficult, but critical part of this problem. In this paper, we propose a novel approach for the analysis of motion patterns by clustering the tracklets using an unsupervised hierarchical clustering algorithm, where the similarity between tracklets is measured by the Longest Common Subsequences. The tracklets are obtained by tracking dense points under three effective rules, therefore enabling it to capture the motion patterns in crowded scenes. The analysis of motion patterns is implemented in a completely unsupervised way, and the tracklets are clustered automatically through hierarchical clustering algorithm based on a graphic model. To validate the performance of our approach, we conducted experimental evaluations on two datasets. The results reveal the precise distributions of motion patterns in current crowded videos and demonstrate the effectiveness of our approach.展开更多
We introduce a novel end-to-end deeplearning solution for rapidly estimating a dense spherical depth map of an indoor environment.Our input is a single equirectangular image registered with a sparse depth map,as provi...We introduce a novel end-to-end deeplearning solution for rapidly estimating a dense spherical depth map of an indoor environment.Our input is a single equirectangular image registered with a sparse depth map,as provided by a variety of common capture setups.Depth is inferred by an efficient and lightweight single-branch network,which employs a dynamic gating system to process together dense visual data and sparse geometric data.We exploit the characteristics of typical man-made environments to efficiently compress multiresolution features and find short-and long-range relations among scene parts.Furthermore,we introduce a new augmentation strategy to make the model robust to different types of sparsity,including those generated by various structured light sensors and LiDAR setups.The experimental results demonstrate that our method provides interactive performance and outperforms stateof-the-art solutions in computational efficiency,adaptivity to variable depth sparsity patterns,and prediction accuracy for challenging indoor data,even when trained solely on synthetic data without any fine tuning.展开更多
This letter proposes a new method for concurrent voiced speech separation. Firstly the Wrapped Discrete Fourier Transform (WDFT) is used to decompose the harmonic spectra of the mixed speeches. Then the individual spe...This letter proposes a new method for concurrent voiced speech separation. Firstly the Wrapped Discrete Fourier Transform (WDFT) is used to decompose the harmonic spectra of the mixed speeches. Then the individual speech is reconstructed by using the sinusoidal speech model. By taking advantage of the non-uniform frequency resolution of WDFT, harmonic spectra parameters can be estimated and separated accurately. Experimental results on mixed vowels separation show that the proposed method can recover the original speeches effectively.展开更多
Detailed information is provided for the design and construction of nitrogen drilling in a coal seam.Two prototype wells are considered.The Guo model is used to calculate the required minimum gas injection rate,while ...Detailed information is provided for the design and construction of nitrogen drilling in a coal seam.Two prototype wells are considered.The Guo model is used to calculate the required minimum gas injection rate,while the Finnie,Sommerfeld,and Tulsa models are exploited to estimate the ensuing erosion occurring in pipe strings.The calculated minimum gas injection rates are 67.4 m^(3)/min(with water)and 49.4 m^(3)/min(without water),and the actual field of use is 90–120 m^(3)/min.The difference between the calculated injection pressure and the field value is 6.5%–15.2%(formation with water)and 0.65%–7.32%(formation without water).The results show that the Guo model can more precisely represent the situation of the no water formation in the nitrogen drilling of a coal seam.The Finnie,Sommerfeld,and Tulsa models have different sensitivities to cutting densities,particle size,impact velocity and angle,and pipe string hardness.展开更多
The optimal configuration of battery energy storage system is key to the designing of a microgrid.In this paper,a optimal configuration method of energy storage in grid-connected microgrid is proposed.Firstly,the two-...The optimal configuration of battery energy storage system is key to the designing of a microgrid.In this paper,a optimal configuration method of energy storage in grid-connected microgrid is proposed.Firstly,the two-layer decision model to allocate the capacity of storage is established.The decision variables in outer programming model are the capacity and power of the storage system.The objective is the least investment on the battery energy storage system.The decision variable in inner programming model is the charging and discharging power of battery.The objective is the lowest power fluctuation on the connection line.Then a case containing a grid-connected microgrid with wind power,photovoltaic,battery energy storage and load is studied,and the multi-scenario probabilistic method is used.The last result of energy storage configuration is calculated through the probability of each scene.展开更多
The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of au...The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.展开更多
In the acoustic world, no sounds occur entirely in isolation; they always reach the ears in combination with other sounds. How any given sound is discriminated and perceived as an independent auditory object is a chal...In the acoustic world, no sounds occur entirely in isolation; they always reach the ears in combination with other sounds. How any given sound is discriminated and perceived as an independent auditory object is a challenging question in neu- roscience. Although our knowledge of neural processing in the auditory pathway has expanded over the years, no good theory ex- ists to explain how perception of auditory objects is achieved. A growing body of evidence suggests that the selectivity of neurons in the auditory forebrain is under dynamic modulation, and this plasticity may contribute to auditory object perception. We propose that stimulus-specific adaptation in the auditory forebrain of the songbird (and perhaps in other systems) may play an important role in modulating sensitivity in a way that aids discrimination, and thus can potentially contribute to auditory object perception.展开更多
The segmentation of moving and non-moving regions in an image within the field of crowd analysis is a crucial process in terms of understanding crowd behavior. In many studies, similar movements were segmented accordi...The segmentation of moving and non-moving regions in an image within the field of crowd analysis is a crucial process in terms of understanding crowd behavior. In many studies, similar movements were segmented according to the location, adjacency to each other, direction, and average speed. However, these segments may not in turn indicate the same types of behavior in each region. The purpose of this study is to better understand crowd behavior by locally measuring the degree of interaction/complexity within the segment. For this purpose, the flow of motion in the image is primarily represented as a series of trajectories. The image is divided into hexagonal cells and the finite time braid entropy(FTBE) values are calculated according to the different projection angles of each cell. These values depend on the complexity of the spiral structure that the trajectories generated throughout the movement and show the degree of interaction among pedestrians. In this study, behaviors of different complexities determined in segments are pictured as similar movements on the whole. This study has been tested on 49 different video sequences from the UCF and CUHK databases.展开更多
Underwater soundscapes have probably played an important role in the adaptation of ears and auditory systems of fishes throughout evolutionary time,and for all species.These sounds probably contain important informati...Underwater soundscapes have probably played an important role in the adaptation of ears and auditory systems of fishes throughout evolutionary time,and for all species.These sounds probably contain important information about the environment and about most objects and events that confront the receiving fish so that appropriate be-havior is possible.For example,the sounds from reefs appear to be used by at least some fishes for their orientation and migration.These sorts of environmental sounds should be considered much like“acoustic daylight,”that continuously bathes all environments and contain information that all organisms can potentially use to form a sort of image of the environment.At present,however,we are generally ignorant of the nature of ambient sound fields impinging on fishes,and the adaptive value of processing these fields to resolve the multiple sources of sound.Our field has focused almost exclusively on the adaptive value of processing species-specific communication sounds,and has not considered the informational value of ambient“noise.”Since all fishes can detect and process acoustic particle motion,including the directional characteristics of this motion,underwater sound fields are potentially more complex and information-rich than terrestrial acoustic environments.The capacities of one fish species(goldfish)to receive and make use of such sound source information have been demonstrated(sound source seg-regation and auditory scene analysis),and it is suggested that all vertebrate species have this capacity.A call is made to better understand underwater soundscapes,and the associated behaviors they determine in fishes.展开更多
基金Supported by the National Natural Science Foundation of China(No.60172048)
文摘A simple and efficient algorithm is presented to separate concurrent speeches. The parameters of mixed speeches are estimated by searching in the neighbor area of given pitches to minimize the error between the original and the synthetic spectrums. The effectiveness of the proposed algorithm to separate close frequencies is demonstrated.
基金This work was supported by the Projects of the National Nat-ura! Science Foundation of China under Crant No.U0835001 the Fundamental Research Funds for the Central Universities-2011PTB-00-28.
文摘Abstract: A hierarchical method for scene analysis in audio sensor networks is proposed. This meth-od consists of two stages: element detection stage and audio scene analysis stage. In the former stage, the basic audio elements are modeled by the HMM models and trained by enough samples off-line, and we adaptively add or remove basic ele- ment from the targeted element pool according to the time, place and other environment parameters. In the latter stage, a data fusion algorithm is used to combine the sensory information of the same ar-ea, and then, a role-based method is employed to analyze the audio scene based on the fused data. We conduct some experiments to evaluate the per-formance of the proposed method that about 70% audio scenes can be detected correctly by this method. The experiment evaluations demonstrate that our method can achieve satisfactory results.
基金supported in part by National Basic Research Program of China (973 Program) under Grant No. 2011CB302203the National Natural Science Foundation of China under Grant No. 61273285
文摘Crowded scene analysis is currently a hot and challenging topic in computer vision field. The ability to analyze motion patterns from videos is a difficult, but critical part of this problem. In this paper, we propose a novel approach for the analysis of motion patterns by clustering the tracklets using an unsupervised hierarchical clustering algorithm, where the similarity between tracklets is measured by the Longest Common Subsequences. The tracklets are obtained by tracking dense points under three effective rules, therefore enabling it to capture the motion patterns in crowded scenes. The analysis of motion patterns is implemented in a completely unsupervised way, and the tracklets are clustered automatically through hierarchical clustering algorithm based on a graphic model. To validate the performance of our approach, we conducted experimental evaluations on two datasets. The results reveal the precise distributions of motion patterns in current crowded videos and demonstrate the effectiveness of our approach.
基金funding from the Autonomous Region of Sardinia under project XDATA.Eva Almansa,Armando Sanchez,Giorgio Vassena,and Enrico Gobbetti received funding from the European Union's H2020 research and innovation programme under grant 813170(EVOCATION).
文摘We introduce a novel end-to-end deeplearning solution for rapidly estimating a dense spherical depth map of an indoor environment.Our input is a single equirectangular image registered with a sparse depth map,as provided by a variety of common capture setups.Depth is inferred by an efficient and lightweight single-branch network,which employs a dynamic gating system to process together dense visual data and sparse geometric data.We exploit the characteristics of typical man-made environments to efficiently compress multiresolution features and find short-and long-range relations among scene parts.Furthermore,we introduce a new augmentation strategy to make the model robust to different types of sparsity,including those generated by various structured light sensors and LiDAR setups.The experimental results demonstrate that our method provides interactive performance and outperforms stateof-the-art solutions in computational efficiency,adaptivity to variable depth sparsity patterns,and prediction accuracy for challenging indoor data,even when trained solely on synthetic data without any fine tuning.
基金Supported by the National Natural Science Foundation of China (No.60172048).
文摘This letter proposes a new method for concurrent voiced speech separation. Firstly the Wrapped Discrete Fourier Transform (WDFT) is used to decompose the harmonic spectra of the mixed speeches. Then the individual speech is reconstructed by using the sinusoidal speech model. By taking advantage of the non-uniform frequency resolution of WDFT, harmonic spectra parameters can be estimated and separated accurately. Experimental results on mixed vowels separation show that the proposed method can recover the original speeches effectively.
基金National Science and Technology Major Special Project,2016ZX05044CBM Development Technology and Pilot Test in East Yunnan and Western Guizhou.
文摘Detailed information is provided for the design and construction of nitrogen drilling in a coal seam.Two prototype wells are considered.The Guo model is used to calculate the required minimum gas injection rate,while the Finnie,Sommerfeld,and Tulsa models are exploited to estimate the ensuing erosion occurring in pipe strings.The calculated minimum gas injection rates are 67.4 m^(3)/min(with water)and 49.4 m^(3)/min(without water),and the actual field of use is 90–120 m^(3)/min.The difference between the calculated injection pressure and the field value is 6.5%–15.2%(formation with water)and 0.65%–7.32%(formation without water).The results show that the Guo model can more precisely represent the situation of the no water formation in the nitrogen drilling of a coal seam.The Finnie,Sommerfeld,and Tulsa models have different sensitivities to cutting densities,particle size,impact velocity and angle,and pipe string hardness.
基金The National Key Research and Development Plan(2017YFB0903504)Science and Technology Project of the SGCC(5210EF17001c).
文摘The optimal configuration of battery energy storage system is key to the designing of a microgrid.In this paper,a optimal configuration method of energy storage in grid-connected microgrid is proposed.Firstly,the two-layer decision model to allocate the capacity of storage is established.The decision variables in outer programming model are the capacity and power of the storage system.The objective is the least investment on the battery energy storage system.The decision variable in inner programming model is the charging and discharging power of battery.The objective is the lowest power fluctuation on the connection line.Then a case containing a grid-connected microgrid with wind power,photovoltaic,battery energy storage and load is studied,and the multi-scenario probabilistic method is used.The last result of energy storage configuration is calculated through the probability of each scene.
基金supported by the Tencent and Shanghai Jiao Tong University Joint Project
文摘The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.
文摘In the acoustic world, no sounds occur entirely in isolation; they always reach the ears in combination with other sounds. How any given sound is discriminated and perceived as an independent auditory object is a challenging question in neu- roscience. Although our knowledge of neural processing in the auditory pathway has expanded over the years, no good theory ex- ists to explain how perception of auditory objects is achieved. A growing body of evidence suggests that the selectivity of neurons in the auditory forebrain is under dynamic modulation, and this plasticity may contribute to auditory object perception. We propose that stimulus-specific adaptation in the auditory forebrain of the songbird (and perhaps in other systems) may play an important role in modulating sensitivity in a way that aids discrimination, and thus can potentially contribute to auditory object perception.
基金Project supported by the Gümüshane University Scientific Research Projects Coordination Department(No.15.B0311.02.01)
文摘The segmentation of moving and non-moving regions in an image within the field of crowd analysis is a crucial process in terms of understanding crowd behavior. In many studies, similar movements were segmented according to the location, adjacency to each other, direction, and average speed. However, these segments may not in turn indicate the same types of behavior in each region. The purpose of this study is to better understand crowd behavior by locally measuring the degree of interaction/complexity within the segment. For this purpose, the flow of motion in the image is primarily represented as a series of trajectories. The image is divided into hexagonal cells and the finite time braid entropy(FTBE) values are calculated according to the different projection angles of each cell. These values depend on the complexity of the spiral structure that the trajectories generated throughout the movement and show the degree of interaction among pedestrians. In this study, behaviors of different complexities determined in segments are pictured as similar movements on the whole. This study has been tested on 49 different video sequences from the UCF and CUHK databases.
文摘Underwater soundscapes have probably played an important role in the adaptation of ears and auditory systems of fishes throughout evolutionary time,and for all species.These sounds probably contain important information about the environment and about most objects and events that confront the receiving fish so that appropriate be-havior is possible.For example,the sounds from reefs appear to be used by at least some fishes for their orientation and migration.These sorts of environmental sounds should be considered much like“acoustic daylight,”that continuously bathes all environments and contain information that all organisms can potentially use to form a sort of image of the environment.At present,however,we are generally ignorant of the nature of ambient sound fields impinging on fishes,and the adaptive value of processing these fields to resolve the multiple sources of sound.Our field has focused almost exclusively on the adaptive value of processing species-specific communication sounds,and has not considered the informational value of ambient“noise.”Since all fishes can detect and process acoustic particle motion,including the directional characteristics of this motion,underwater sound fields are potentially more complex and information-rich than terrestrial acoustic environments.The capacities of one fish species(goldfish)to receive and make use of such sound source information have been demonstrated(sound source seg-regation and auditory scene analysis),and it is suggested that all vertebrate species have this capacity.A call is made to better understand underwater soundscapes,and the associated behaviors they determine in fishes.