期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Random Subspace Learning Approach to High-Dimensional Outliers Detection 被引量:1
1
作者 Bohan Liu ernest fokoué 《Open Journal of Statistics》 2015年第6期618-630,共13页
We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-samp... We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned. 展开更多
关键词 HIGH-DIMENSIONAL Robust OUTLIER DETECTION Contamination Large p Small n Random Subspace Method Minimum COVARIANCE DETERMINANT
下载PDF
Probit Normal Correlated Topic Model
2
作者 Xingchen Yu ernest fokoué 《Open Journal of Statistics》 2014年第11期879-888,共10页
The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this... The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many authors have so far concentrated solely of the logistic model partly due to the formidable inefficiency of the multinomial probit model even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an adaptation of the diagonal orthant multinomial probit in the topic models context, resulting in the ability of our topic modeling scheme to handle corpuses with a large number of latent topics. An additional and very important benefit of our method lies in the fact that unlike with the logistic normal model whose non-conjugacy leads to the need for sophisticated sampling schemes, our approach exploits the natural conjugacy inherent in the auxiliary formulation of the probit model to achieve greater simplicity. The application of our proposed scheme to a well-known Associated Press corpus not only helps discover a large number of meaningful topics but also reveals the capturing of compellingly intuitive correlations among certain topics. Besides, our proposed approach lends itself to even further scalability thanks to various existing high performance algorithms and architectures capable of handling millions of documents. 展开更多
关键词 TOPIC Model Bayesian Gibbs SAMPLER Cumulative Distribution Function PROBIT LOGIT DIAGONAL Orthant Efficient Sampling Auxiliary Variable Correlation Structure TOPIC Vocabulary Conjugate DIRICHLET Gaussian
下载PDF
A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs
3
作者 Zichen Ma ernest fokoué 《Open Journal of Statistics》 2014年第4期258-266,共9页
An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC... An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is analyzed. By comparison, k-nearest neighbors yield the highest average test accuracy, after using a cross-validation of size 500, and least time being used in the computation. 展开更多
关键词 SPEAKER ACCENT RECOGNITION Mel-Frequency Cepstral Coefficients (MFCCs) DISCRIMINANT Analysis Support Vector Machines (SVMs) k-Nearest NEIGHBORS
下载PDF
Nonnegative Matrix Factorization with Zellner Penalty
4
作者 Matthew A. Corsetti ernest fokoué 《Open Journal of Statistics》 2015年第7期777-786,共10页
Nonnegative matrix factorization (NMF) is a relatively new unsupervised learning algorithm that decomposes a nonnegative data matrix into a parts-based, lower dimensional, linear representation of the data. NMF has ap... Nonnegative matrix factorization (NMF) is a relatively new unsupervised learning algorithm that decomposes a nonnegative data matrix into a parts-based, lower dimensional, linear representation of the data. NMF has applications in image processing, text mining, recommendation systems and a variety of other fields. Since its inception, the NMF algorithm has been modified and explored by numerous authors. One such modification involves the addition of auxiliary constraints to the objective function of the factorization. The purpose of these auxiliary constraints is to impose task-specific penalties or restrictions on the objective function. Though many auxiliary constraints have been studied, none have made use of data-dependent penalties. In this paper, we propose Zellner nonnegative matrix factorization (ZNMF), which uses data-dependent auxiliary constraints. We assess the facial recognition performance of the ZNMF algorithm and several other well-known constrained NMF algorithms using the Cambridge ORL database. 展开更多
关键词 NONNEGATIVE Matrix FACTORIZATION Zellner g-Prior AUXILIARY Constraints REGULARIZATION PENALTY Classification Image Processing Feature Extraction
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部