Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream p...Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream parameters, which has been used to quantify the traffic conditions. Previous studies have shown that multi-modal probability distribution of speeds gives excellent results when simultaneously evaluating congested and free-flow traffic conditions. However, most of these previous analytical studies do not incorporate the influencing factors in characterizing these conditions. This study evaluates the impact of traffic occupancy on the multi-state speed distribution using the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM). Further, the study estimates the speed cut-point values of traffic states, which separate them into homogeneous groups using Bayesian change-point detection (BCD) technique. The study used 2015 archived one-year traffic data collected on Florida’s Interstate 295 freeway corridor. Information criteria results revealed three traffic states, which were identified as free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The findings of the DPM-GLM indicated that in all estimated states, the traffic speed decreases when traffic occupancy increases. Comparison of the influence of traffic occupancy between traffic states showed that traffic occupancy has more impact on the free-flow and the congested state than on the transitional flow condition. With respect to estimating the threshold speed value, the results of the BCD model revealed promising findings in characterizing levels of traffic congestion.展开更多
Generalized linear mixed models (GLMMs) are typically constructed by incorporating random effects into the linear predictor. The random effects are usually assumed to be normally distributed with mean zero and varianc...Generalized linear mixed models (GLMMs) are typically constructed by incorporating random effects into the linear predictor. The random effects are usually assumed to be normally distributed with mean zero and variance-covariance identity matrix. In this paper, we propose to release random effects to non-normal distributions and discuss how to model the mean and covariance structures in GLMMs simultaneously. Parameter estimation is solved by using Quasi-Monte Carlo (QMC) method through iterative Newton-Raphson (NR) algorithm very well in terms of accuracy and stabilization, which is demonstrated by real binary salamander mating data analysis and simulation studies.展开更多
Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters fo...Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters for the acoustical features of source and target speaker using Non-Linear Canonical Correlation Analysis(NLCCA) based on jointed Gaussian mixture model.Speaker indi-viduality transformation was achieved mainly by altering vocal tract characteristics represented by Line Spectral Frequencies(LSF).To obtain the transformed speech which sounded more like the target voices,prosody modification is involved through residual prediction.Both objective and subjective evaluations were conducted.The experimental results demonstrated that our proposed algorithm was effective and outperformed the conventional conversion method utilized by the Minimum Mean Square Error(MMSE) estimation.展开更多
The paper studies a generalized linear model(GLM)yt = h(xt^T β) + εt,t = l,2,...,n,where ε1 = η1,ε1 =ρεt +ηt,t = 2,3,...;n,h is a continuous differentiable function,ηt's are independent and identically...The paper studies a generalized linear model(GLM)yt = h(xt^T β) + εt,t = l,2,...,n,where ε1 = η1,ε1 =ρεt +ηt,t = 2,3,...;n,h is a continuous differentiable function,ηt's are independent and identically distributed random errors with zero mean and finite variance σ^2.Firstly,the quasi-maximum likelihood(QML) estimators of β,p and σ^2 are given.Secondly,under mild conditions,the asymptotic properties(including the existence,weak consistency and asymptotic distribution) of the QML estimators are investigated.Lastly,the validity of method is illuminated by a simulation example.展开更多
Interest in automated data classification and identification systems has increased over the past years in conjunction with the high demand for artificial intelligence and security applications.In particular,recognizin...Interest in automated data classification and identification systems has increased over the past years in conjunction with the high demand for artificial intelligence and security applications.In particular,recognizing human activities with accurate results have become a topic of high interest.Although the current tools have reached remarkable successes,it is still a challenging problem due to various uncontrolled environments and conditions.In this paper two statistical frameworks based on nonparametric hierarchical Bayesian models and Gamma distribution are proposed to solve some realworld applications.In particular,two nonparametric hierarchical Bayesian models based on Dirichlet process and Pitman-Yor process are developed.These models are then applied to address the problem of modelling grouped data where observations are organized into groups and these groups are statistically linked by sharing mixture components.The choice of the Gamma mixtures is motivated by its flexibility for modelling heavy-tailed distributions.In addition,deploying the Dirichlet process prior is justified by its advantage of automatically finding the right number of components and providing nice properties.Moreover,a learning step via variational Bayesian setting is presented in a flexible way.The priors over the parameters are selected appropriately and the posteriors are approximated effectively in a closed form.Experimental results based on a real-life applications that concerns texture classification and human actions recognition show the capabilities and effectiveness of the proposed framework.展开更多
自动的心电异常识别是一个多标签分类问题,多通过对每个标签训练一个二分类器来实现异常识别。由于异常数目多,特征和异常间以及不同异常间的相关性复杂,自动检测的效果并不理想。为了充分利用异常和特征间的依存关系,提出了一种基于异...自动的心电异常识别是一个多标签分类问题,多通过对每个标签训练一个二分类器来实现异常识别。由于异常数目多,特征和异常间以及不同异常间的相关性复杂,自动检测的效果并不理想。为了充分利用异常和特征间的依存关系,提出了一种基于异常标签共现和特征局部相关(Label Co-occurrence and Feature’s local Pertinence,LCFP)的心电异常识别方法。首先,根据标签共现性和特征局部相关性,为标签构建包含宏特征和微特征的联合特征空间。宏特征采用狄利克雷过程混合模型聚类构建,以区分不同的共现标签集;微特征是原始特征空间的一个子集,用于区分共现标签集中的各个标签。进而,在联合特征空间为每个异常训练一个一对多(One-Versus-All)的概率分类器。其次,为充分利用异常的关联,提出在概率分类器排序基础上区分相关和非相关标签,采用Beta分布自适应地学习锚阈值和相关度阈值,以确定实例的相关标签集。LCFP是一种检测多种心电异常的通用方法,提高了心电异常识别的精度。在两个真实数据集上,F1指标分别提高了4%和22.4%,验证了所提方法的有效性。展开更多
识别虚假评论有着重要的理论意义与现实价值.先前工作集中于启发式策略和传统的全监督学习算法.最近研究表明:人类无法通过先验知识有效识别虚假评论,手工标注的数据集必定存在一定数量的误例,因此简单使用传统的全监督学习算法识别虚...识别虚假评论有着重要的理论意义与现实价值.先前工作集中于启发式策略和传统的全监督学习算法.最近研究表明:人类无法通过先验知识有效识别虚假评论,手工标注的数据集必定存在一定数量的误例,因此简单使用传统的全监督学习算法识别虚假评论并不合理.容易被错误标注的样例称为间谍样例,如何确定这些样例的类别标签将直接影响分类器的性能.基于少量的真实评论和大量的未标注评论,提出一种创新的PU(positive and unlabeled)学习框架来识别虚假评论.首先,从无标注数据集中识别出少量可信度较高的负例.其次,通过整合LDA(latent Dirichlet allocation)和K-means,分别计算出多个代表性的正例和负例.接着,基于狄利克雷过程混合模型(Dirichlet process mixture model,DPMM),对所有间谍样例进行聚类,混合种群性和个体性策略来确定间谍样例的类别标签.最后,多核学习算法被用来训练最终的分类器.数值实验证实了所提算法的有效性,超过当前的基准.展开更多
文摘Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream parameters, which has been used to quantify the traffic conditions. Previous studies have shown that multi-modal probability distribution of speeds gives excellent results when simultaneously evaluating congested and free-flow traffic conditions. However, most of these previous analytical studies do not incorporate the influencing factors in characterizing these conditions. This study evaluates the impact of traffic occupancy on the multi-state speed distribution using the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM). Further, the study estimates the speed cut-point values of traffic states, which separate them into homogeneous groups using Bayesian change-point detection (BCD) technique. The study used 2015 archived one-year traffic data collected on Florida’s Interstate 295 freeway corridor. Information criteria results revealed three traffic states, which were identified as free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The findings of the DPM-GLM indicated that in all estimated states, the traffic speed decreases when traffic occupancy increases. Comparison of the influence of traffic occupancy between traffic states showed that traffic occupancy has more impact on the free-flow and the congested state than on the transitional flow condition. With respect to estimating the threshold speed value, the results of the BCD model revealed promising findings in characterizing levels of traffic congestion.
文摘Generalized linear mixed models (GLMMs) are typically constructed by incorporating random effects into the linear predictor. The random effects are usually assumed to be normally distributed with mean zero and variance-covariance identity matrix. In this paper, we propose to release random effects to non-normal distributions and discuss how to model the mean and covariance structures in GLMMs simultaneously. Parameter estimation is solved by using Quasi-Monte Carlo (QMC) method through iterative Newton-Raphson (NR) algorithm very well in terms of accuracy and stabilization, which is demonstrated by real binary salamander mating data analysis and simulation studies.
基金Supported by the National High Technology Research and Development Program of China (863 Program,No.2006AA010102)
文摘Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters for the acoustical features of source and target speaker using Non-Linear Canonical Correlation Analysis(NLCCA) based on jointed Gaussian mixture model.Speaker indi-viduality transformation was achieved mainly by altering vocal tract characteristics represented by Line Spectral Frequencies(LSF).To obtain the transformed speech which sounded more like the target voices,prosody modification is involved through residual prediction.Both objective and subjective evaluations were conducted.The experimental results demonstrated that our proposed algorithm was effective and outperformed the conventional conversion method utilized by the Minimum Mean Square Error(MMSE) estimation.
基金Supported by National Natural Science Foundation of China(Grant Nos.11071022,11471105)Science and Technology Research Projects of the Educational Department of Hubei Province(Grant No.Q20132505)
文摘The paper studies a generalized linear model(GLM)yt = h(xt^T β) + εt,t = l,2,...,n,where ε1 = η1,ε1 =ρεt +ηt,t = 2,3,...;n,h is a continuous differentiable function,ηt's are independent and identically distributed random errors with zero mean and finite variance σ^2.Firstly,the quasi-maximum likelihood(QML) estimators of β,p and σ^2 are given.Secondly,under mild conditions,the asymptotic properties(including the existence,weak consistency and asymptotic distribution) of the QML estimators are investigated.Lastly,the validity of method is illuminated by a simulation example.
基金The authors would like to thank Taif University Researchers Supporting Project number(TURSP-2020/26),Taif University,Taif,Saudi ArabiaThey would like also to thank Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R40),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Interest in automated data classification and identification systems has increased over the past years in conjunction with the high demand for artificial intelligence and security applications.In particular,recognizing human activities with accurate results have become a topic of high interest.Although the current tools have reached remarkable successes,it is still a challenging problem due to various uncontrolled environments and conditions.In this paper two statistical frameworks based on nonparametric hierarchical Bayesian models and Gamma distribution are proposed to solve some realworld applications.In particular,two nonparametric hierarchical Bayesian models based on Dirichlet process and Pitman-Yor process are developed.These models are then applied to address the problem of modelling grouped data where observations are organized into groups and these groups are statistically linked by sharing mixture components.The choice of the Gamma mixtures is motivated by its flexibility for modelling heavy-tailed distributions.In addition,deploying the Dirichlet process prior is justified by its advantage of automatically finding the right number of components and providing nice properties.Moreover,a learning step via variational Bayesian setting is presented in a flexible way.The priors over the parameters are selected appropriately and the posteriors are approximated effectively in a closed form.Experimental results based on a real-life applications that concerns texture classification and human actions recognition show the capabilities and effectiveness of the proposed framework.
文摘自动的心电异常识别是一个多标签分类问题,多通过对每个标签训练一个二分类器来实现异常识别。由于异常数目多,特征和异常间以及不同异常间的相关性复杂,自动检测的效果并不理想。为了充分利用异常和特征间的依存关系,提出了一种基于异常标签共现和特征局部相关(Label Co-occurrence and Feature’s local Pertinence,LCFP)的心电异常识别方法。首先,根据标签共现性和特征局部相关性,为标签构建包含宏特征和微特征的联合特征空间。宏特征采用狄利克雷过程混合模型聚类构建,以区分不同的共现标签集;微特征是原始特征空间的一个子集,用于区分共现标签集中的各个标签。进而,在联合特征空间为每个异常训练一个一对多(One-Versus-All)的概率分类器。其次,为充分利用异常的关联,提出在概率分类器排序基础上区分相关和非相关标签,采用Beta分布自适应地学习锚阈值和相关度阈值,以确定实例的相关标签集。LCFP是一种检测多种心电异常的通用方法,提高了心电异常识别的精度。在两个真实数据集上,F1指标分别提高了4%和22.4%,验证了所提方法的有效性。
文摘识别虚假评论有着重要的理论意义与现实价值.先前工作集中于启发式策略和传统的全监督学习算法.最近研究表明:人类无法通过先验知识有效识别虚假评论,手工标注的数据集必定存在一定数量的误例,因此简单使用传统的全监督学习算法识别虚假评论并不合理.容易被错误标注的样例称为间谍样例,如何确定这些样例的类别标签将直接影响分类器的性能.基于少量的真实评论和大量的未标注评论,提出一种创新的PU(positive and unlabeled)学习框架来识别虚假评论.首先,从无标注数据集中识别出少量可信度较高的负例.其次,通过整合LDA(latent Dirichlet allocation)和K-means,分别计算出多个代表性的正例和负例.接着,基于狄利克雷过程混合模型(Dirichlet process mixture model,DPMM),对所有间谍样例进行聚类,混合种群性和个体性策略来确定间谍样例的类别标签.最后,多核学习算法被用来训练最终的分类器.数值实验证实了所提算法的有效性,超过当前的基准.