The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o...The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.展开更多
In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity condi...In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.展开更多
Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients d...Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients drop out of the study, making the data missing, so a method for estimating the mean of the response variable with missing values for the ultra-high dimensional datasets is needed. In this paper, we propose a two-stage ultra-high dimensional variable screening method, RF-SIS, based on random forest regression, which effectively solves the problem of estimating missing values due to excessive data dimension. After the dimension reduction process by applying RF-SIS, mean interpolation is executed on the missing responses. The results of the simulated data show that compared with the estimation method of directly deleting missing observations, the estimation results of RF-SIS-MI have significant advantages in terms of the proportion of intervals covered, the average length of intervals, and the average absolute deviation.展开更多
Objective: To assess the missed opportunities from the diagnosis of bacilliferous pulmonary tuberculosis by optical microscopy compared to GeneXpert MTB/RIF between 2015 and 2019. Methods: This is a retrospective anal...Objective: To assess the missed opportunities from the diagnosis of bacilliferous pulmonary tuberculosis by optical microscopy compared to GeneXpert MTB/RIF between 2015 and 2019. Methods: This is a retrospective analysis of the diagnostic results of bacilliferous pulmonary tuberculosis in patients suspected of pulmonary tuberculosis at their first episode during the period. GeneXpert MTB/RIF (GeneXpert) and optical microscopy (OM) after Ziehl-Neelsen stained smear were performed on each patient’s sputum or gastric tubing fluid sample. Results: Among 341 patients suspected of pulmonary tuberculosis, 229 patients were declared bacilliferous tuberculosis by the two tests (67%), 220 patients by GeneXpert and 95 patients by OM, i.e. 64.5% versus 28% (p i.e. 58.5% of the positive cases detected by the two tests (134/229 patients) and 39.3% of the patients suspected of tuberculosis (134/341 patients). On the other hand, among 95 patients declared positive by OM, the GeneXpert ignored 9 (9.5%), i.e. 4% of all the positive cases detected by the two diagnostic tests (9/229 patients) and 3% of the patients suspected of tuberculosis (9/341 patients). The differences observed between the results of the two tests were statistically significant at the 5% threshold (p Conclusion: This study reveals missed diagnostic opportunities for bacilliferous pulmonary mycobacteriosis, statistically significant with optical microscopy than GeneXpert. The GeneXpert/optical microscopy couple could be a good contribution to the strategies for the elimination of pulmonary tuberculosis in sub-Saharan Africa.展开更多
In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are o...In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.展开更多
针对工业机械设备实时监测中不可控因素导致的振动信号数据缺失问题,提出一种基于自适应二次临近项交替方向乘子算法(adaptive quadratic proximity-alternating direction method of multipliers, AQ-ADMM)的压缩感知缺失信号重构方法...针对工业机械设备实时监测中不可控因素导致的振动信号数据缺失问题,提出一种基于自适应二次临近项交替方向乘子算法(adaptive quadratic proximity-alternating direction method of multipliers, AQ-ADMM)的压缩感知缺失信号重构方法。AQ-ADMM算法在经典交替方向乘子算法算法迭代过程中添加二次临近项,且能够自适应选取惩罚参数。首先在数据中心建立信号参考数据库用于构造初始字典,然后将K-奇异值分解(K-singular value decomposition, K-SVD)字典学习算法和AQ-ADMM算法结合重构缺失信号。对仿真信号和两种真实轴承信号数据集添加高斯白噪声后作为样本,试验结果表明当信号压缩率在50%~70%时,所提方法性能指标明显优于其它传统方法,在重构信号的同时实现了对含缺失数据机械振动信号的快速精确修复。展开更多
目的:基于网络药理学方法分析五味子防治稽留流产的潜在作用机制,并通过实验进行验证。方法:运用系统药理学方法查找并从中药系统药理学数据库和分析平台(Traditional Chinese Medicine Systems Pharmacology Database and Analysis Pla...目的:基于网络药理学方法分析五味子防治稽留流产的潜在作用机制,并通过实验进行验证。方法:运用系统药理学方法查找并从中药系统药理学数据库和分析平台(Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform,TCMSP,https://old.tcmsp-e.com/tcmsp.php)筛选五味子的活性成分及靶点,查找稽留流产相关基因,确定五味子防治稽留流产的靶点。利用Cytoscape构建“药物成分-靶点”网络,筛选关键化合物。利用String建立蛋白质-蛋白质相互作用(protein-protein interaction,PPI)网络,并通过Cytoscape-CytoNCA拓扑分析筛选核心靶点。通过对靶基因进行基因本体(gene ontology,GO)和京都基因与基因组百科全书(kyoto encyclopedia of genes and genomes,KEGG)分析,以预测其可能的信号通路及机制。最后借助人绒毛膜外滋养层细胞系(HTR-8/SVneo)利用实时荧光定量聚合酶链反应(real-time fluorescence quantitative polymerase chain reaction,qRT-PCR)检测和蛋白质印迹(Western blotting)检测进行机制验证。结果:从TCMSP数据库中鉴定了7种五味子活性成分。PPI网络表明雄激素受体(androgen receptor,AR)、雌激素受体(estrogen receptor,ER)、环氧合酶-2(cyclooxygenase-2,COX-2)、核受体辅激活因子2(nuclear receptor coactivator 2,NCOA2)和乙酰胆碱酯酶(acetylcholine esterase,ACHE)可能是五味子防治稽留流产的核心作用靶点。GO富集分析获得44个细胞生物学过程,KEGG途径富集分析获得2个相关信号通路,主要包括甲状腺激素信号通路和雌激素信号通路;在利用细胞模型验证机制过程中发现,二羟环氧苯并芘[benzo(a)pyren-7,8-dihydrodiol-9,10-epoxide,BPDE]染毒后的细胞中AR、ER和COX-2的蛋白质和(或)mRNA水平高于正常细胞,而先使用五味子乙素(Schisandrin B,Sch B)预处理后再染毒BPDE的细胞中AR、ER和COX-2的蛋白质和(或)mRNA水平均较未用Sch B预处理的细胞降低,而NCOA2和ACHE的蛋白质和(或)mRNA水平变化不明显。结论:五味子可通过抗炎和调节AR、ER发挥防治稽留流产的作用,这一保护作用可能是通过雌激素信号通路来实现的。展开更多
文摘The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method.
文摘In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.
文摘Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients drop out of the study, making the data missing, so a method for estimating the mean of the response variable with missing values for the ultra-high dimensional datasets is needed. In this paper, we propose a two-stage ultra-high dimensional variable screening method, RF-SIS, based on random forest regression, which effectively solves the problem of estimating missing values due to excessive data dimension. After the dimension reduction process by applying RF-SIS, mean interpolation is executed on the missing responses. The results of the simulated data show that compared with the estimation method of directly deleting missing observations, the estimation results of RF-SIS-MI have significant advantages in terms of the proportion of intervals covered, the average length of intervals, and the average absolute deviation.
文摘Objective: To assess the missed opportunities from the diagnosis of bacilliferous pulmonary tuberculosis by optical microscopy compared to GeneXpert MTB/RIF between 2015 and 2019. Methods: This is a retrospective analysis of the diagnostic results of bacilliferous pulmonary tuberculosis in patients suspected of pulmonary tuberculosis at their first episode during the period. GeneXpert MTB/RIF (GeneXpert) and optical microscopy (OM) after Ziehl-Neelsen stained smear were performed on each patient’s sputum or gastric tubing fluid sample. Results: Among 341 patients suspected of pulmonary tuberculosis, 229 patients were declared bacilliferous tuberculosis by the two tests (67%), 220 patients by GeneXpert and 95 patients by OM, i.e. 64.5% versus 28% (p i.e. 58.5% of the positive cases detected by the two tests (134/229 patients) and 39.3% of the patients suspected of tuberculosis (134/341 patients). On the other hand, among 95 patients declared positive by OM, the GeneXpert ignored 9 (9.5%), i.e. 4% of all the positive cases detected by the two diagnostic tests (9/229 patients) and 3% of the patients suspected of tuberculosis (9/341 patients). The differences observed between the results of the two tests were statistically significant at the 5% threshold (p Conclusion: This study reveals missed diagnostic opportunities for bacilliferous pulmonary mycobacteriosis, statistically significant with optical microscopy than GeneXpert. The GeneXpert/optical microscopy couple could be a good contribution to the strategies for the elimination of pulmonary tuberculosis in sub-Saharan Africa.
文摘In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.
文摘针对工业机械设备实时监测中不可控因素导致的振动信号数据缺失问题,提出一种基于自适应二次临近项交替方向乘子算法(adaptive quadratic proximity-alternating direction method of multipliers, AQ-ADMM)的压缩感知缺失信号重构方法。AQ-ADMM算法在经典交替方向乘子算法算法迭代过程中添加二次临近项,且能够自适应选取惩罚参数。首先在数据中心建立信号参考数据库用于构造初始字典,然后将K-奇异值分解(K-singular value decomposition, K-SVD)字典学习算法和AQ-ADMM算法结合重构缺失信号。对仿真信号和两种真实轴承信号数据集添加高斯白噪声后作为样本,试验结果表明当信号压缩率在50%~70%时,所提方法性能指标明显优于其它传统方法,在重构信号的同时实现了对含缺失数据机械振动信号的快速精确修复。