Background: Despite humans being exposed to complex chemical mixtures, much of the available research continues to focus on a single compound or metabolite or a select subgroup of compounds inconsistent with the natur...Background: Despite humans being exposed to complex chemical mixtures, much of the available research continues to focus on a single compound or metabolite or a select subgroup of compounds inconsistent with the nature of human exposure. Uncertainty regarding how best to model chemical mixtures coupled with few analytic approaches remains a formidable challenge and served as the impetus for the study. Objectives: To identify the polychlorinated biphenyl (PCB) congener(s) within a chemical mixture that was most associated with an endometriosis diagnosis using novel graphical modeling techniques. Methods: Bayesian Belief Network (BBN) models were developed and empirically assessed in a cohort comprising 84 women aged 18 - 40 years who underwent a laparoscopy or laparotomy between 1999 and 2000;79 (94%) women had serum concentrations for 68 PCB congeners quantified. Adjusted odds ratios (AOR) for endometriosis were estimated for individual PCB congeners using BBN models. Results: PCB congeners #114 (AOR = 3.01;95% CI = 2.25, 3.77) and #136 (AOR = 1.79;95% CI = 1.03, 2.55) were associated with an endometriosis diagnosis. Combinations of mixtures inclusive of PCB #114 were all associated with higher odds of endometriosis, underscoring its potential relation with endometriosis. Conclusions: BBN models identified PCB congener 114 as the most influential congener for the odds of an endometriosis diagnosis in the context of a 68 congener chemical mixture. BBN models offer investigators the opportunity to assess which compounds within a mixture may drive a human health effect.展开更多
For the fault detection and diagnosis problem in largescale industrial systems, there are two important issues: the missing data samples and the non-Gaussian property of the data. However, most of the existing data-d...For the fault detection and diagnosis problem in largescale industrial systems, there are two important issues: the missing data samples and the non-Gaussian property of the data. However, most of the existing data-driven methods cannot be able to handle both of them. Thus, a new Bayesian network classifier based fault detection and diagnosis method is proposed. At first, a non-imputation method is presented to handle the data incomplete samples, with the property of the proposed Bayesian network classifier, and the missing values can be marginalized in an elegant manner. Furthermore, the Gaussian mixture model is used to approximate the non-Gaussian data with a linear combination of finite Gaussian mixtures, so that the Bayesian network can process the non-Gaussian data in an effective way. Therefore, the entire fault detection and diagnosis method can deal with the high-dimensional incomplete process samples in an efficient and robust way. The diagnosis results are expressed in the manner of probability with the reliability scores. The proposed approach is evaluated with a benchmark problem called the Tennessee Eastman process. The simulation results show the effectiveness and robustness of the proposed method in fault detection and diagnosis for large-scale systems with missing measurements.展开更多
提出了一种新的嵌入高斯混合模型(GMM,Gaussian Mixture Model)遥感影像朴素贝叶斯网络模型GMM-NBC(GMMbased Na ve Bayesian Classifier)。针对连续型朴素贝叶斯网络分类器中假设地物服从单一高斯分布的缺点,该方法将地物在特征空间的...提出了一种新的嵌入高斯混合模型(GMM,Gaussian Mixture Model)遥感影像朴素贝叶斯网络模型GMM-NBC(GMMbased Na ve Bayesian Classifier)。针对连续型朴素贝叶斯网络分类器中假设地物服从单一高斯分布的缺点,该方法将地物在特征空间的分布用高斯混合模型来模拟,用改进EM算法自动获取高斯混合模型的参数;高斯混合模型整体作为一个子节点嵌入朴素贝叶斯网络中,将其输出作为节点(特征)的中间类后验概率,在朴素贝叶斯网络的框架下进行融合获得最终的类后验概率。对多光谱和高光谱数据的分类实验结果表明,该方法较传统贝叶斯分类器分类效果要好,且有较强的鲁棒性。展开更多
针对高斯混合模型在模型训练之前无法确定最佳采样点组合方式以及无法确定最佳分布元个数的问题,提出一种基于GMM-Boost的WLAN室内定位方法。首先,采用第二类斯特林数枚举采样点组合方式,比较不同组合方式下高斯混合模型平均定位准确度...针对高斯混合模型在模型训练之前无法确定最佳采样点组合方式以及无法确定最佳分布元个数的问题,提出一种基于GMM-Boost的WLAN室内定位方法。首先,采用第二类斯特林数枚举采样点组合方式,比较不同组合方式下高斯混合模型平均定位准确度,进而确定最佳采样点组合方式。其次,针对每一种样本标签数,采用贝叶斯信息准则(Bayesian Information Criterion,BIC)选择高斯混合模型最优分布元个数。最后,结合Adaboost算法对高斯混合模型进行定位准确度提升。分析结果表明,该算法在定位误差为2 m时定位准确度为71.2%,在小样本量情况下可以获得较低的平均定位误差。与其他算法相比,该算法具有较好的定位准确度和泛化能力。展开更多
One paper in a preceding issue of this journal has introduced the Bayesian Ying-Yang(BYY)harmony learning from a perspective of problem solving,parameter learning,and model selection.In a complementary role,the paper ...One paper in a preceding issue of this journal has introduced the Bayesian Ying-Yang(BYY)harmony learning from a perspective of problem solving,parameter learning,and model selection.In a complementary role,the paper provides further insights from another perspective that a co-dimensional matrix pair(shortly co-dim matrix pair)forms a building unit and a hierarchy of such building units sets up the BYY system.The BYY harmony learning is re-examined via exploring the nature of a co-dim matrix pair,which leads to improved learning performance with refined model selection criteria and a modified mechanism that coordinates automatic model selection and sparse learning.Besides updating typical algorithms of factor analysis(FA),binary FA(BFA),binary matrix factorization(BMF),and nonnegative matrix factorization(NMF)to share such a mechanism,we are also led to(a)a new parametrization that embeds a de-noise nature to Gaussian mixture and local FA(LFA);(b)an alternative formulation of graph Laplacian based linear manifold learning;(c)a codecomposition of data and covariance for learning regularization and data integration;and(d)a co-dim matrix pair based generalization of temporal FA and state space model.Moreover,with help of a co-dim matrix pair in Hadamard product,we are led to a semi-supervised formation for regression analysis and a semi-blind learning formation for temporal FA and state space model.Furthermore,we address that these advances provide with new tools for network biology studies,including learning transcriptional regulatory,Protein-Protein Interaction network alignment,and network integration.展开更多
文摘Background: Despite humans being exposed to complex chemical mixtures, much of the available research continues to focus on a single compound or metabolite or a select subgroup of compounds inconsistent with the nature of human exposure. Uncertainty regarding how best to model chemical mixtures coupled with few analytic approaches remains a formidable challenge and served as the impetus for the study. Objectives: To identify the polychlorinated biphenyl (PCB) congener(s) within a chemical mixture that was most associated with an endometriosis diagnosis using novel graphical modeling techniques. Methods: Bayesian Belief Network (BBN) models were developed and empirically assessed in a cohort comprising 84 women aged 18 - 40 years who underwent a laparoscopy or laparotomy between 1999 and 2000;79 (94%) women had serum concentrations for 68 PCB congeners quantified. Adjusted odds ratios (AOR) for endometriosis were estimated for individual PCB congeners using BBN models. Results: PCB congeners #114 (AOR = 3.01;95% CI = 2.25, 3.77) and #136 (AOR = 1.79;95% CI = 1.03, 2.55) were associated with an endometriosis diagnosis. Combinations of mixtures inclusive of PCB #114 were all associated with higher odds of endometriosis, underscoring its potential relation with endometriosis. Conclusions: BBN models identified PCB congener 114 as the most influential congener for the odds of an endometriosis diagnosis in the context of a 68 congener chemical mixture. BBN models offer investigators the opportunity to assess which compounds within a mixture may drive a human health effect.
基金supported by the National Natural Science Foundation of China(61202473)the Fundamental Research Funds for Central Universities(JUSRP111A49)+1 种基金"111 Project"(B12018)the Priority Academic Program Development of Jiangsu Higher Education Institutions
文摘For the fault detection and diagnosis problem in largescale industrial systems, there are two important issues: the missing data samples and the non-Gaussian property of the data. However, most of the existing data-driven methods cannot be able to handle both of them. Thus, a new Bayesian network classifier based fault detection and diagnosis method is proposed. At first, a non-imputation method is presented to handle the data incomplete samples, with the property of the proposed Bayesian network classifier, and the missing values can be marginalized in an elegant manner. Furthermore, the Gaussian mixture model is used to approximate the non-Gaussian data with a linear combination of finite Gaussian mixtures, so that the Bayesian network can process the non-Gaussian data in an effective way. Therefore, the entire fault detection and diagnosis method can deal with the high-dimensional incomplete process samples in an efficient and robust way. The diagnosis results are expressed in the manner of probability with the reliability scores. The proposed approach is evaluated with a benchmark problem called the Tennessee Eastman process. The simulation results show the effectiveness and robustness of the proposed method in fault detection and diagnosis for large-scale systems with missing measurements.
文摘提出了一种新的嵌入高斯混合模型(GMM,Gaussian Mixture Model)遥感影像朴素贝叶斯网络模型GMM-NBC(GMMbased Na ve Bayesian Classifier)。针对连续型朴素贝叶斯网络分类器中假设地物服从单一高斯分布的缺点,该方法将地物在特征空间的分布用高斯混合模型来模拟,用改进EM算法自动获取高斯混合模型的参数;高斯混合模型整体作为一个子节点嵌入朴素贝叶斯网络中,将其输出作为节点(特征)的中间类后验概率,在朴素贝叶斯网络的框架下进行融合获得最终的类后验概率。对多光谱和高光谱数据的分类实验结果表明,该方法较传统贝叶斯分类器分类效果要好,且有较强的鲁棒性。
文摘针对高斯混合模型在模型训练之前无法确定最佳采样点组合方式以及无法确定最佳分布元个数的问题,提出一种基于GMM-Boost的WLAN室内定位方法。首先,采用第二类斯特林数枚举采样点组合方式,比较不同组合方式下高斯混合模型平均定位准确度,进而确定最佳采样点组合方式。其次,针对每一种样本标签数,采用贝叶斯信息准则(Bayesian Information Criterion,BIC)选择高斯混合模型最优分布元个数。最后,结合Adaboost算法对高斯混合模型进行定位准确度提升。分析结果表明,该算法在定位误差为2 m时定位准确度为71.2%,在小样本量情况下可以获得较低的平均定位误差。与其他算法相比,该算法具有较好的定位准确度和泛化能力。
基金supported by the General Research Fund from Research Grant Council of Hong Kong(Project No.CUHK4180/10E)the National Basic Research Program of China(973 Program)(No.2009CB825404).
文摘One paper in a preceding issue of this journal has introduced the Bayesian Ying-Yang(BYY)harmony learning from a perspective of problem solving,parameter learning,and model selection.In a complementary role,the paper provides further insights from another perspective that a co-dimensional matrix pair(shortly co-dim matrix pair)forms a building unit and a hierarchy of such building units sets up the BYY system.The BYY harmony learning is re-examined via exploring the nature of a co-dim matrix pair,which leads to improved learning performance with refined model selection criteria and a modified mechanism that coordinates automatic model selection and sparse learning.Besides updating typical algorithms of factor analysis(FA),binary FA(BFA),binary matrix factorization(BMF),and nonnegative matrix factorization(NMF)to share such a mechanism,we are also led to(a)a new parametrization that embeds a de-noise nature to Gaussian mixture and local FA(LFA);(b)an alternative formulation of graph Laplacian based linear manifold learning;(c)a codecomposition of data and covariance for learning regularization and data integration;and(d)a co-dim matrix pair based generalization of temporal FA and state space model.Moreover,with help of a co-dim matrix pair in Hadamard product,we are led to a semi-supervised formation for regression analysis and a semi-blind learning formation for temporal FA and state space model.Furthermore,we address that these advances provide with new tools for network biology studies,including learning transcriptional regulatory,Protein-Protein Interaction network alignment,and network integration.