In this article,a comprehensive survey of deep learning-based(DLbased)human pose estimation(HPE)that can help researchers in the domain of computer vision is presented.HPE is among the fastest-growing research domains...In this article,a comprehensive survey of deep learning-based(DLbased)human pose estimation(HPE)that can help researchers in the domain of computer vision is presented.HPE is among the fastest-growing research domains of computer vision and is used in solving several problems for human endeavours.After the detailed introduction,three different human body modes followed by the main stages of HPE and two pipelines of twodimensional(2D)HPE are presented.The details of the four components of HPE are also presented.The keypoints output format of two popular 2D HPE datasets and the most cited DL-based HPE articles from the year of breakthrough are both shown in tabular form.This study intends to highlight the limitations of published reviews and surveys respecting presenting a systematic review of the current DL-based solution to the 2D HPE model.Furthermore,a detailed and meaningful survey that will guide new and existing researchers on DL-based 2D HPE models is achieved.Finally,some future research directions in the field of HPE,such as limited data on disabled persons and multi-training DL-based models,are revealed to encourage researchers and promote the growth of HPE research.展开更多
Speech emotion recognition is essential for frictionless human-machine interaction,where machines respond to human instructions with context-aware actions.The properties of individuals’voices vary with culture,langua...Speech emotion recognition is essential for frictionless human-machine interaction,where machines respond to human instructions with context-aware actions.The properties of individuals’voices vary with culture,language,gender,and personality.These variations in speaker-specific properties may hamper the performance of standard representations in downstream tasks such as speech emotion recognition(SER).This study demonstrates the significance of speaker-specific speech characteristics and how considering them can be leveraged to improve the performance of SER models.In the proposed approach,two wav2vec-based modules(a speaker-identification network and an emotion classification network)are trained with the Arcface loss.The speaker-identification network has a single attention block to encode an input audio waveform into a speaker-specific representation.The emotion classification network uses a wav2vec 2.0-backbone as well as four attention blocks to encode the same input audio waveform into an emotion representation.These two representations are then fused into a single vector representation containing emotion and speaker-specific information.Experimental results showed that the use of speaker-specific characteristics improves SER performance.Additionally,combining these with an angular marginal loss such as the Arcface loss improves intra-class compactness while increasing inter-class separability,as demonstrated by the plots of t-distributed stochastic neighbor embeddings(t-SNE).The proposed approach outperforms previous methods using similar training strategies,with a weighted accuracy(WA)of 72.14%and unweighted accuracy(UA)of 72.97%on the Interactive Emotional Dynamic Motion Capture(IEMOCAP)dataset.This demonstrates its effectiveness and potential to enhance human-machine interaction through more accurate emotion recognition in speech.展开更多
Partial least squares (PLS) regression was applied to the Lunar Soft Characterization Consortium (LSCC) dataset for spectral estimation of TiO2. The LSCC dataset was split into a number of subsets including the lo...Partial least squares (PLS) regression was applied to the Lunar Soft Characterization Consortium (LSCC) dataset for spectral estimation of TiO2. The LSCC dataset was split into a number of subsets including the low-Ti, high-Ti, total mare soils, total highland, Apollo 16, and Apollo 14 soils to investigate the effects of interfering minerals and nonlinearity on the PLS performance. The PLS weight loading vectors were analyzed through stepwise multiple regression analysis (SMRA) to identify mineral species driving and interfering the PLS performance. PLS exhibits high performance for estimating TiO2 for the LSCC low-Ti and high-Ti mare samples and both groups analyzed together. The results suggest that while the dominant TiO2-bearing minerals are few, additional PLS factors are required to compensate the effects on the important PLS factors of minerals that are not highly corrected to TiO2, to accommodate nonlinear relationships between reflectance and TiO2, and to correct inconsistent mineral-TiO2 correlations between the high-Ti and iow-Ti mare samples. Analysis of the LSCC highland soil samples indicates that the Apollo 16 soils are responsible for the large errors of TiO2 estimates when the soils are modeled with other subgroups. For the LSCC Apollo 16 samples, the dominant spectral effects of plagioclase over other dark minerals are primarily responsible for large errors of estimated TiO2. For the Apollo 14 soils, more accurate estimation for TiO2 is attributed to the posi- tive correlation between a major TiOe-bearing component and TiO2, explaining why the Apollo 14 soils follow the regression trend when analyzed with other soils groups.展开更多
This study presents an approach for generating a global land mapping dataset of the satellite measurements of CO_(2)total column(XCO_(2))using spatio-temporal geostatistics,which makes full use of the joint spatial an...This study presents an approach for generating a global land mapping dataset of the satellite measurements of CO_(2)total column(XCO_(2))using spatio-temporal geostatistics,which makes full use of the joint spatial and temporal dependencies between observations.The mapping approach considers the latitude-zonal seasonal cycles and spatio-temporal correlation structure of XCO_(2),and obtains global land maps of XCO_(2),with a spatial grid resolution of 1°latitude by 1°longitude and temporal resolution of 3 days.We evaluate the accuracy and uncertainty of the mapping dataset in the following three ways:(1)in cross-validation,the mapping approach results in a high correlation coefficient of 0.94 between the predictions and observations,(2)in comparison with ground truth provided by the Total Carbon Column Observing Network(TCCON),the predicted XCO_(2)time series and those from TCCON sites are in good agreement,with an overall bias of 0.01 ppm and a standard deviation of the difference of 1.22 ppm and(3)in comparison with model simulations,the spatio-temporal variability of XCO_(2)between the mapping dataset and simulations from the CT2013 and GEOS-Chem are generally consistent.The generated mapping XCO_(2)data in this study provides a new global geospatial dataset in global understanding of greenhouse gases dynamics and global warming.展开更多
High resolution satellite images are becoming increasingly available for urban multi-temporal semantic understanding.However,few datasets can be used for land-use/land-cover(LULC)classification,binary change detection...High resolution satellite images are becoming increasingly available for urban multi-temporal semantic understanding.However,few datasets can be used for land-use/land-cover(LULC)classification,binary change detection(BCD)and semantic change detection(SCD)simultaneously because classification datasets always have one time phase and BCD datasets focus only on the changed location,ignoring the changed classes.Public SCD datasets are rare but much needed.To solve the above problems,a tri-temporal SCD dataset made up of Gaofen-2(GF-2)remote sensing imagery(with 11 LULC classes and 60 change directions)was built in this study,namely,the Wuhan Urban Semantic Understanding(WUSU)dataset.Popular deep learning based methods for LULC classification,BCD and SCD are tested to verify the reliability of WUSU.A Siamese-based multi-task joint framework with a multi-task joint loss(MJ loss)named ChangeMJ is proposed to restore the object boundaries and obtains the best results in LULC classification,BCD and SCD,compared to the state-of-the-art(SOTA)methods.Finally,a large spatial-scale mapping for Wuhan central urban area is carried out to verify that the WUsU dataset and the ChangeMJ framework have good application values.展开更多
基金State Key Laboratory of Pulsed Power Laser Technology Funds(No.13J1003)The Key Laboratory Funds of Optoelectronic Information Control and Security Technology of China(No.20100713-003)
基金supported by the[Universiti Sains Malaysia]under FRGS Grant Number[FRGS/1/2020/STG07/USM/02/12(203.PKOMP.6711930)]FRGS Grant Number[304PTEKIND.6316497.USM.].
文摘In this article,a comprehensive survey of deep learning-based(DLbased)human pose estimation(HPE)that can help researchers in the domain of computer vision is presented.HPE is among the fastest-growing research domains of computer vision and is used in solving several problems for human endeavours.After the detailed introduction,three different human body modes followed by the main stages of HPE and two pipelines of twodimensional(2D)HPE are presented.The details of the four components of HPE are also presented.The keypoints output format of two popular 2D HPE datasets and the most cited DL-based HPE articles from the year of breakthrough are both shown in tabular form.This study intends to highlight the limitations of published reviews and surveys respecting presenting a systematic review of the current DL-based solution to the 2D HPE model.Furthermore,a detailed and meaningful survey that will guide new and existing researchers on DL-based 2D HPE models is achieved.Finally,some future research directions in the field of HPE,such as limited data on disabled persons and multi-training DL-based models,are revealed to encourage researchers and promote the growth of HPE research.
基金supported by the Chung-Ang University Graduate Research Scholarship in 2021.
文摘Speech emotion recognition is essential for frictionless human-machine interaction,where machines respond to human instructions with context-aware actions.The properties of individuals’voices vary with culture,language,gender,and personality.These variations in speaker-specific properties may hamper the performance of standard representations in downstream tasks such as speech emotion recognition(SER).This study demonstrates the significance of speaker-specific speech characteristics and how considering them can be leveraged to improve the performance of SER models.In the proposed approach,two wav2vec-based modules(a speaker-identification network and an emotion classification network)are trained with the Arcface loss.The speaker-identification network has a single attention block to encode an input audio waveform into a speaker-specific representation.The emotion classification network uses a wav2vec 2.0-backbone as well as four attention blocks to encode the same input audio waveform into an emotion representation.These two representations are then fused into a single vector representation containing emotion and speaker-specific information.Experimental results showed that the use of speaker-specific characteristics improves SER performance.Additionally,combining these with an angular marginal loss such as the Arcface loss improves intra-class compactness while increasing inter-class separability,as demonstrated by the plots of t-distributed stochastic neighbor embeddings(t-SNE).The proposed approach outperforms previous methods using similar training strategies,with a weighted accuracy(WA)of 72.14%and unweighted accuracy(UA)of 72.97%on the Interactive Emotional Dynamic Motion Capture(IEMOCAP)dataset.This demonstrates its effectiveness and potential to enhance human-machine interaction through more accurate emotion recognition in speech.
基金supported by the Research Support Funds Grant (RSFG) program of Indiana University-Purdue University at Indianapolis
文摘Partial least squares (PLS) regression was applied to the Lunar Soft Characterization Consortium (LSCC) dataset for spectral estimation of TiO2. The LSCC dataset was split into a number of subsets including the low-Ti, high-Ti, total mare soils, total highland, Apollo 16, and Apollo 14 soils to investigate the effects of interfering minerals and nonlinearity on the PLS performance. The PLS weight loading vectors were analyzed through stepwise multiple regression analysis (SMRA) to identify mineral species driving and interfering the PLS performance. PLS exhibits high performance for estimating TiO2 for the LSCC low-Ti and high-Ti mare samples and both groups analyzed together. The results suggest that while the dominant TiO2-bearing minerals are few, additional PLS factors are required to compensate the effects on the important PLS factors of minerals that are not highly corrected to TiO2, to accommodate nonlinear relationships between reflectance and TiO2, and to correct inconsistent mineral-TiO2 correlations between the high-Ti and iow-Ti mare samples. Analysis of the LSCC highland soil samples indicates that the Apollo 16 soils are responsible for the large errors of TiO2 estimates when the soils are modeled with other subgroups. For the LSCC Apollo 16 samples, the dominant spectral effects of plagioclase over other dark minerals are primarily responsible for large errors of estimated TiO2. For the Apollo 14 soils, more accurate estimation for TiO2 is attributed to the posi- tive correlation between a major TiOe-bearing component and TiO2, explaining why the Apollo 14 soils follow the regression trend when analyzed with other soils groups.
基金Work at the Chinese University of Hong Kong(CUHK)was supported by the Open Research Fund of Key Laboratory of Digital Earth Science,Institute of Remote Sensing and Digital Earth,Chinese Academy of Sciences(CAS-RADI,No.2014LDE010)National Key Basic Research Program of China(2015CB954103)+2 种基金Work at the RADI-CAS was funded by the Strategic Priority Research Program-Climate Change:Carbon Budget and Relevant Issues of the Chinese Academy of Sciences(No.XDA05040401)Work at University of Toronto is supported by the global scholarship program for research excellent from CUHK to Z.-C.ZengThe TCCON Network is supported by NASA’s Carbon Cycle Science Program through a grant to the California Institute of Technology.TCCON data were obtained from the TCCON Data Archive,operated by the California Institute of Technology from the website at http://tccon.ipac.caltech.edu/.Measurement programs at Darwin and Wollongong are supported by the Australian Research Council under grants DP140101552,DP110103118,DP0879468352,LP0562346.A part of work for Saga site at JAXA was supported by the Environment Research and Technology Development Fund(A-1102)of the Ministry of the Environment,Japan.Four Corners TCCON site was funded by LANL’s LDRD Project(20110081DR).
文摘This study presents an approach for generating a global land mapping dataset of the satellite measurements of CO_(2)total column(XCO_(2))using spatio-temporal geostatistics,which makes full use of the joint spatial and temporal dependencies between observations.The mapping approach considers the latitude-zonal seasonal cycles and spatio-temporal correlation structure of XCO_(2),and obtains global land maps of XCO_(2),with a spatial grid resolution of 1°latitude by 1°longitude and temporal resolution of 3 days.We evaluate the accuracy and uncertainty of the mapping dataset in the following three ways:(1)in cross-validation,the mapping approach results in a high correlation coefficient of 0.94 between the predictions and observations,(2)in comparison with ground truth provided by the Total Carbon Column Observing Network(TCCON),the predicted XCO_(2)time series and those from TCCON sites are in good agreement,with an overall bias of 0.01 ppm and a standard deviation of the difference of 1.22 ppm and(3)in comparison with model simulations,the spatio-temporal variability of XCO_(2)between the mapping dataset and simulations from the CT2013 and GEOS-Chem are generally consistent.The generated mapping XCO_(2)data in this study provides a new global geospatial dataset in global understanding of greenhouse gases dynamics and global warming.
基金supported by National Key Research and Development Program of China under grant number 2022YFB3903404National Natural Science Foundation of China under grant number 42325105,42071350LIESMARS Special Research Funding.
文摘High resolution satellite images are becoming increasingly available for urban multi-temporal semantic understanding.However,few datasets can be used for land-use/land-cover(LULC)classification,binary change detection(BCD)and semantic change detection(SCD)simultaneously because classification datasets always have one time phase and BCD datasets focus only on the changed location,ignoring the changed classes.Public SCD datasets are rare but much needed.To solve the above problems,a tri-temporal SCD dataset made up of Gaofen-2(GF-2)remote sensing imagery(with 11 LULC classes and 60 change directions)was built in this study,namely,the Wuhan Urban Semantic Understanding(WUSU)dataset.Popular deep learning based methods for LULC classification,BCD and SCD are tested to verify the reliability of WUSU.A Siamese-based multi-task joint framework with a multi-task joint loss(MJ loss)named ChangeMJ is proposed to restore the object boundaries and obtains the best results in LULC classification,BCD and SCD,compared to the state-of-the-art(SOTA)methods.Finally,a large spatial-scale mapping for Wuhan central urban area is carried out to verify that the WUsU dataset and the ChangeMJ framework have good application values.