A novel no-reference(NR) image quality assessment(IQA) method is proposed for assessing image quality across multifarious distortion categories. The new method transforms distorted images into the shearlet domain usin...A novel no-reference(NR) image quality assessment(IQA) method is proposed for assessing image quality across multifarious distortion categories. The new method transforms distorted images into the shearlet domain using a non-subsample shearlet transform(NSST), and designs the image quality feature vector to describe images utilizing natural scenes statistical features: coefficient distribution, energy distribution and structural correlation(SC) across orientations and scales. The final image quality is achieved from distortion classification and regression models trained by a support vector machine(SVM). The experimental results on the LIVE2 IQA database indicate that the method can assess image quality effectively, and the extracted features are susceptive to the category and severity of distortion. Furthermore, our proposed method is database independent and has a higher correlation rate and lower root mean squared error(RMSE) with human perception than other high performance NR IQA methods.展开更多
The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of au...The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.展开更多
基金supported by the National Natural Science Foundation of China(No.61405191)the Jilin Province Science Foundation for Youths of China(No.20150520102JH)
文摘A novel no-reference(NR) image quality assessment(IQA) method is proposed for assessing image quality across multifarious distortion categories. The new method transforms distorted images into the shearlet domain using a non-subsample shearlet transform(NSST), and designs the image quality feature vector to describe images utilizing natural scenes statistical features: coefficient distribution, energy distribution and structural correlation(SC) across orientations and scales. The final image quality is achieved from distortion classification and regression models trained by a support vector machine(SVM). The experimental results on the LIVE2 IQA database indicate that the method can assess image quality effectively, and the extracted features are susceptive to the category and severity of distortion. Furthermore, our proposed method is database independent and has a higher correlation rate and lower root mean squared error(RMSE) with human perception than other high performance NR IQA methods.
基金supported by the Tencent and Shanghai Jiao Tong University Joint Project
文摘The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.