Background:Hepatocellular carcinoma(HCC)has a poor long-term prognosis.The competition of circular RNAs(circRNAs)with endogenous RNA is a novel tool for predicting HCC prognosis.Based on the alterations of circRNA reg...Background:Hepatocellular carcinoma(HCC)has a poor long-term prognosis.The competition of circular RNAs(circRNAs)with endogenous RNA is a novel tool for predicting HCC prognosis.Based on the alterations of circRNA regulatory networks,the analysis of gene modules related to HCC is feasible.Methods:Multiple expression datasets and RNA element targeting prediction tools were used to construct a circRNA-microRNA-mRNA network in HCC.Gene function,pathway,and protein interaction analyses were performed for the differentially expressed genes(DEGs)in this regulatory network.In the proteinprotein interaction network,hub genes were identified and subjected to regression analysis,producing an optimized four-gene signature for prognostic risk stratification in HCC patients.Anti-HCC drugs were excavated by assessing the DEGs between the low-and high-risk groups.A circRNA-microRNA-hub gene subnetwork was constructed,in which three hallmark genes,KIF4A,CCNA2,and PBK,were subjected to functional enrichment analysis.Results:A four-gene signature(KIF4A,CCNA2,PBK,and ZWINT)that effectively estimated the overall survival and aided in prognostic risk assessment in the The Cancer Genome Atlas(TCGA)cohort and International Cancer Genome Consortium(ICGC)cohort was developed.CDK inhibitors,PI3K inhibitors,HDAC inhibitors,and EGFR inhibitors were predicted as four potential mechanisms of drug action(MOA)in high-risk HCC patients.Subsequent analysis has revealed that PBK,CCNA2,and KIF4A play a crucial role in regulating the tumor microenvironment by promoting immune cell invasion,regulating microsatellite instability(MSI),and exerting an impact on HCC progression.Conclusions:The present study highlights the role of the circRNA-related regulatory network,identifies a four-gene prognostic signature and biomarkers,and further identifies novel therapy for HCC.展开更多
Triple-negative breast cancer(TNBC)poses a significant challenge due to the lack of reliable prognostic gene signatures and an understanding of its immune behavior.Methods:We analyzed clinical information and mRNA exp...Triple-negative breast cancer(TNBC)poses a significant challenge due to the lack of reliable prognostic gene signatures and an understanding of its immune behavior.Methods:We analyzed clinical information and mRNA expression data from 162 TNBC patients in TCGA-BRCA and 320 patients in METABRIC-BRCA.Utilizing weighted gene coexpression network analysis,we pinpointed 34 TNBC immune genes linked to survival.The least absolute shrinkage and selection operator Cox regression method identified key TNBC immune candidates for prognosis prediction.We calculated chemotherapy sensitivity scores using the“pRRophetic”package in R software and assessed immunotherapy response using the Tumor Immune Dysfunction and Exclusion algorithm.Results:In this study,34 survival-related TNBC immune gene expression profiles were identified.A least absolute shrinkage and selection operator-Cox regression model was used and 15 candidates were prioritized,with a concomitant establishment of a robust risk immune classifier.The high-risk TNBC immune groups showed increased sensitivity to therapeutic agents like RO-3306,Tamoxifen,Sunitinib,JNK Inhibitor VIII,XMD11-85h,BX-912,and Tivozanib.An analysis of the Search Tool for Interaction of Chemicals database revealed the associations between the high-risk group and signaling pathways,such as those involving Rap1,Ras,and PI3K-Akt.The low-risk group showed a higher immunotherapy response rate,as observed through the tumor immune dysfunction and exclusion analysis in the TCGA-TNBC and METABRIC-TNBC cohorts.Conclusion:This study provides insights into the immune complexities of TNBC,paving the way for novel diagnostic approaches and precision treatment methods that exploit its immunological intricacies,thus offering hope for improved management and outcomes of this challenging disease.展开更多
BACKGROUND Single-cell sequencing technology provides the capability to analyze changes in specific cell types during the progression of disease.However,previous single-cell sequencing studies on gastric cancer(GC)hav...BACKGROUND Single-cell sequencing technology provides the capability to analyze changes in specific cell types during the progression of disease.However,previous single-cell sequencing studies on gastric cancer(GC)have largely focused on immune cells and stromal cells,and further elucidation is required regarding the alterations that occur in gastric epithelial cells during the development of GC.AIM To create a GC prediction model based on single-cell and bulk RNA sequencing(bulk RNA-seq)data.METHODS In this study,we conducted a comprehensive analysis by integrating three singlecell RNA sequencing(scRNA-seq)datasets and ten bulk RNA-seq datasets.Our analysis mainly focused on determining cell proportions and identifying differentially expressed genes(DEGs).Specifically,we performed differential expression analysis among epithelial cells in GC tissues and normal gastric tissues(NAGs)and utilized both single-cell and bulk RNA-seq data to establish a prediction model for GC.We further validated the accuracy of the GC prediction model in bulk RNA-seq data.We also used Kaplan–Meier plots to verify the correlation between genes in the prediction model and the prognosis of GC.RESULTS By analyzing scRNA-seq data from a total of 70707 cells from GC tissue,NAG,and chronic gastric tissue,10 cell types were identified,and DEGs in GC and normal epithelial cells were screened.After determining the DEGs in GC and normal gastric samples identified by bulk RNA-seq data,a GC predictive classifier was constructed using the Least absolute shrinkage and selection operator(LASSO)and random forest methods.The LASSO classifier showed good performance in both validation and model verification using The Cancer Genome Atlas and Genotype-Tissue Expression(GTEx)datasets[area under the curve(AUC)_min=0.988,AUC_1se=0.994],and the random forest model also achieved good results with the validation set(AUC=0.92).Genes TIMP1,PLOD3,CKS2,TYMP,TNFRSF10B,CPNE1,GDF15,BCAP31,and CLDN7 were identified to have high importance values in multiple GC predictive models,and KM-PLOTTER analysis showed their relevance to GC prognosis,suggesting their potential for use in GC diagnosis and treatment.CONCLUSION A predictive classifier was established based on the analysis of RNA-seq data,and the genes in it are expected to serve as auxiliary markers in the clinical diagnosis of GC.展开更多
目的分析宁夏回族自治区儿童青少年近视流行现状、影响因素及不同学段间的差异。方法采用分层整群随机抽样的方法,于2019年9月至12月,在宁夏回族自治区银川市、吴忠市、石嘴山市、固原市和中卫市,随机抽取8所小学、6所初中、6所高中、4...目的分析宁夏回族自治区儿童青少年近视流行现状、影响因素及不同学段间的差异。方法采用分层整群随机抽样的方法,于2019年9月至12月,在宁夏回族自治区银川市、吴忠市、石嘴山市、固原市和中卫市,随机抽取8所小学、6所初中、6所高中、4所大学的学生为研究对象,小学每个年级抽取5个班级,初中至大学每个年级抽取4个班级,以抽取班级的全体学生作为研究对象,共抽取学生14211人,对其进行问卷调查、体格检查和视力测量。不同学段儿童近视的影响因素采用最小绝对收缩和选择算子(LASSO)联合Logistic回归进行分析,选择贝叶斯信息准则(Bayesian information criterion,BIC)最小的模型为最优模型。结果宁夏回族自治区儿童青少年近视检出率为70.3%,女生高于男生,城市高于乡镇,差异均有统计学意义(均为P<0.001);按学段分层后,随着年级的增加,近视检出率随之升高,小学最低,大学最高,不同学段近视检出率差异有统计学意义(P<0.001)。近视影响因素的LASSO-Logistic回归分析表明,城乡、性别、年龄、目前是否配戴眼镜、每日课间操节数、是否积极参加体力活动和过去6个月是否保持规律活动是小学生近视的影响因素(均为P<0.05);性别、目前是否配戴眼镜是初中生和高中生近视的影响因素(均为P<0.05);目前是否配戴眼镜是大学生近视的影响因素(P<0.05)。结论宁夏回族自治区儿童青少年近视检出率高,不同学段儿童青少年近视影响因素差异明显。配戴眼镜是控制近视的保护因素。应根据儿童青少年所处学段开展有针对性的视力相关知识的健康教育,增强其健康保健意识,提高儿童青少年视力。展开更多
目的构建跟距联合畸形(talocalcaneal coalition)的X线影像组学模型,并检验其对跟距联合畸形的筛查诊断能力。方法回顾性分析2019年1月至2023年3月吉林大学中日联谊医院放射线科200例行踝关节或足部X线检查的患者临床放射资料(跟距联合...目的构建跟距联合畸形(talocalcaneal coalition)的X线影像组学模型,并检验其对跟距联合畸形的筛查诊断能力。方法回顾性分析2019年1月至2023年3月吉林大学中日联谊医院放射线科200例行踝关节或足部X线检查的患者临床放射资料(跟距联合阳性及阴性各100例),手动勾画跟距联合畸形所在影像学区域,基于Python-pyradiomics库初步提取影像组学特征,通过曼-惠特尼U检验及最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)算法实现数据降维和特征筛选,用支持向量机(support vector machine,SVM)对筛选得到的影像组学特征分类建模,最终以受试者工作特征(receiver operating characteristic,ROC)曲线的曲线下面积(area under the curve,AUC)、精确度、召回率、敏感度、特异度及F1分数评价模型的诊断效能。结果从X线图像中初步提取到105个组学特征,经曼-惠特尼U检验及LASSO算法筛选出7个强相关性特征,最终以SVM分类器所构建模型的测试集AUC值为0.93,精确度、召回率、敏感度、特异度和F1分数分别为88%、85%、93%、92%、88%,对跟距联合畸形有良好的筛查诊断能力。结论基于X线的影像组学模型可作为筛查诊断跟距联合畸形的一种准确高效的无创性工具,帮助临床医师诊断跟距联合畸形。展开更多
Background:Colon adenocarcinoma(COAD)is a gastrointestinal malignancy with a high mortality rate.Studies have confirmed the role of immunogenic cell death(ICD)in different cancer types.However,there is a lack of resea...Background:Colon adenocarcinoma(COAD)is a gastrointestinal malignancy with a high mortality rate.Studies have confirmed the role of immunogenic cell death(ICD)in different cancer types.However,there is a lack of research on ICD-related genes(ICD-RGs)in COAD.This study aimed to examine the impact of ICD-RGs on COAD and their interaction with the immune microenvironment.Methods:Using data from The Cancer Genome Atlas and Gene Expression Omnibus databases,we identified 107 ICD-RGs in COAD.Using a one-way Cox regression analysis,we examined the relationship between these ICD-RGs and overall survival in COAD.Results:Following the regression analyses,we identified 14 overall survival-related genes.Furthermore,we examined the predictive impact of the ICD-RGs using the least absolute shrinkage and selection operator regression analysis and developed a nine-genes prognostic model.The Cancer Genome Atlas and Gene Expression Omnibus datasets were used for training and validation.Kaplan-Meier analysis was used to confirm that the high-risk group had a lower survival rate than the low-risk group.Finally,following a multifactorial analysis,we created a prognostic nomogram that integrated clinical data and risk scores.Conclusions:The nine-genes model exhibits robust stability and can provide valuable insights for guiding the development of tumor immunotherapy strategies and personalized drug selection for patients with COAD.展开更多
This study is intended to explore the chemical differences of Acori Tatarinowii Rhizoma (ATR) samples collected from two habitats, Sichuan and Anhui provinces, China. Gas chromatography-mass spectrometry (GC-MS) w...This study is intended to explore the chemical differences of Acori Tatarinowii Rhizoma (ATR) samples collected from two habitats, Sichuan and Anhui provinces, China. Gas chromatography-mass spectrometry (GC-MS) was applied to establishing the quantitative chemical fingerprints of ATRs. A total of 104 volatile compounds were identified and quantified with the information of mass spectra and retention index (RI). Furthermore, least absolute shrinkage and selection operator (LASSO), a sparse regularization method, combined with subsampling was employed to improve the classification ability of partial least squares-discriminant analysis (PLS-DA). After variable selection by LASSO, three chemical markers,β-elemene, α-selinene and α-asarone, were identified for the discrimination of ATRs from two habitats, and the total classification correct rate was increased from 82.76% to 96.55%. The proposed LASSO-PLS-DA method can serve as an efficient strategy for screening marked chemical components and geo-herbalism research of traditional Chinese medicines.展开更多
BACKGROUND Gastric cancer(GC)is one of the most frequently diagnosed gastrointestinal cancers throughout the world.Novel prognostic biomarkers are required to predict the prognosis of GC.AIM To identify a multi-long n...BACKGROUND Gastric cancer(GC)is one of the most frequently diagnosed gastrointestinal cancers throughout the world.Novel prognostic biomarkers are required to predict the prognosis of GC.AIM To identify a multi-long noncoding RNA(lncRNA)prognostic model for GC.METHODS Transcriptome data and clinical data were downloaded from The Cancer Genome Atlas.COX and least absolute shrinkage and selection operator regression analyses were performed to screen for prognosis associated lncRNAs.Receiver operating characteristic curve and Kaplan-Meier survival analyses were applied to evaluate the effectiveness of the model.RESULTS The prediction model was established based on the expression of AC007991.4,AC079385.3,and AL109615.2 Based on the model,GC patients were divided into“high risk”and“low risk”groups to compare the differences in survival.The model was re-evaluated with the clinical data of our center.CONCLUSION The 3-lncRNA combination model is an independent prognostic factor for GC.展开更多
Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates havi...Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates having high variance if the number of predictors is higher than the number of observations and if high multicollinearity exists among the predictor variables. To handle this problem, Elastic Net (ENet) estimator was introduced by combining LASSO and Ridge estimator (RE). The solutions of LASSO and ENet have been obtained using Least Angle Regression (LARS) and LARS-EN algorithms, respectively. In this article, we proposed an alternative algorithm to overcome the issues in LASSO that can be combined LASSO with other exiting biased estimators namely Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator. Further, we examine the performance of the proposed algorithm using a Monte-Carlo simulation study and real-world examples. The results showed that the LARS-rk and LARS-rd algorithms,?which are combined LASSO with r-k class estimator and r-d class estimator,?outperformed other algorithms under the moderated and severe multicollinearity.展开更多
基金This study was supported by grants from the High-level Preresearch Program of Zhejiang Shuren University 2019(KXJ121860)Zhejiang Provincial Natural Science Foundation of China(LQ20H190004)。
文摘Background:Hepatocellular carcinoma(HCC)has a poor long-term prognosis.The competition of circular RNAs(circRNAs)with endogenous RNA is a novel tool for predicting HCC prognosis.Based on the alterations of circRNA regulatory networks,the analysis of gene modules related to HCC is feasible.Methods:Multiple expression datasets and RNA element targeting prediction tools were used to construct a circRNA-microRNA-mRNA network in HCC.Gene function,pathway,and protein interaction analyses were performed for the differentially expressed genes(DEGs)in this regulatory network.In the proteinprotein interaction network,hub genes were identified and subjected to regression analysis,producing an optimized four-gene signature for prognostic risk stratification in HCC patients.Anti-HCC drugs were excavated by assessing the DEGs between the low-and high-risk groups.A circRNA-microRNA-hub gene subnetwork was constructed,in which three hallmark genes,KIF4A,CCNA2,and PBK,were subjected to functional enrichment analysis.Results:A four-gene signature(KIF4A,CCNA2,PBK,and ZWINT)that effectively estimated the overall survival and aided in prognostic risk assessment in the The Cancer Genome Atlas(TCGA)cohort and International Cancer Genome Consortium(ICGC)cohort was developed.CDK inhibitors,PI3K inhibitors,HDAC inhibitors,and EGFR inhibitors were predicted as four potential mechanisms of drug action(MOA)in high-risk HCC patients.Subsequent analysis has revealed that PBK,CCNA2,and KIF4A play a crucial role in regulating the tumor microenvironment by promoting immune cell invasion,regulating microsatellite instability(MSI),and exerting an impact on HCC progression.Conclusions:The present study highlights the role of the circRNA-related regulatory network,identifies a four-gene prognostic signature and biomarkers,and further identifies novel therapy for HCC.
文摘Triple-negative breast cancer(TNBC)poses a significant challenge due to the lack of reliable prognostic gene signatures and an understanding of its immune behavior.Methods:We analyzed clinical information and mRNA expression data from 162 TNBC patients in TCGA-BRCA and 320 patients in METABRIC-BRCA.Utilizing weighted gene coexpression network analysis,we pinpointed 34 TNBC immune genes linked to survival.The least absolute shrinkage and selection operator Cox regression method identified key TNBC immune candidates for prognosis prediction.We calculated chemotherapy sensitivity scores using the“pRRophetic”package in R software and assessed immunotherapy response using the Tumor Immune Dysfunction and Exclusion algorithm.Results:In this study,34 survival-related TNBC immune gene expression profiles were identified.A least absolute shrinkage and selection operator-Cox regression model was used and 15 candidates were prioritized,with a concomitant establishment of a robust risk immune classifier.The high-risk TNBC immune groups showed increased sensitivity to therapeutic agents like RO-3306,Tamoxifen,Sunitinib,JNK Inhibitor VIII,XMD11-85h,BX-912,and Tivozanib.An analysis of the Search Tool for Interaction of Chemicals database revealed the associations between the high-risk group and signaling pathways,such as those involving Rap1,Ras,and PI3K-Akt.The low-risk group showed a higher immunotherapy response rate,as observed through the tumor immune dysfunction and exclusion analysis in the TCGA-TNBC and METABRIC-TNBC cohorts.Conclusion:This study provides insights into the immune complexities of TNBC,paving the way for novel diagnostic approaches and precision treatment methods that exploit its immunological intricacies,thus offering hope for improved management and outcomes of this challenging disease.
文摘BACKGROUND Single-cell sequencing technology provides the capability to analyze changes in specific cell types during the progression of disease.However,previous single-cell sequencing studies on gastric cancer(GC)have largely focused on immune cells and stromal cells,and further elucidation is required regarding the alterations that occur in gastric epithelial cells during the development of GC.AIM To create a GC prediction model based on single-cell and bulk RNA sequencing(bulk RNA-seq)data.METHODS In this study,we conducted a comprehensive analysis by integrating three singlecell RNA sequencing(scRNA-seq)datasets and ten bulk RNA-seq datasets.Our analysis mainly focused on determining cell proportions and identifying differentially expressed genes(DEGs).Specifically,we performed differential expression analysis among epithelial cells in GC tissues and normal gastric tissues(NAGs)and utilized both single-cell and bulk RNA-seq data to establish a prediction model for GC.We further validated the accuracy of the GC prediction model in bulk RNA-seq data.We also used Kaplan–Meier plots to verify the correlation between genes in the prediction model and the prognosis of GC.RESULTS By analyzing scRNA-seq data from a total of 70707 cells from GC tissue,NAG,and chronic gastric tissue,10 cell types were identified,and DEGs in GC and normal epithelial cells were screened.After determining the DEGs in GC and normal gastric samples identified by bulk RNA-seq data,a GC predictive classifier was constructed using the Least absolute shrinkage and selection operator(LASSO)and random forest methods.The LASSO classifier showed good performance in both validation and model verification using The Cancer Genome Atlas and Genotype-Tissue Expression(GTEx)datasets[area under the curve(AUC)_min=0.988,AUC_1se=0.994],and the random forest model also achieved good results with the validation set(AUC=0.92).Genes TIMP1,PLOD3,CKS2,TYMP,TNFRSF10B,CPNE1,GDF15,BCAP31,and CLDN7 were identified to have high importance values in multiple GC predictive models,and KM-PLOTTER analysis showed their relevance to GC prognosis,suggesting their potential for use in GC diagnosis and treatment.CONCLUSION A predictive classifier was established based on the analysis of RNA-seq data,and the genes in it are expected to serve as auxiliary markers in the clinical diagnosis of GC.
文摘目的分析宁夏回族自治区儿童青少年近视流行现状、影响因素及不同学段间的差异。方法采用分层整群随机抽样的方法,于2019年9月至12月,在宁夏回族自治区银川市、吴忠市、石嘴山市、固原市和中卫市,随机抽取8所小学、6所初中、6所高中、4所大学的学生为研究对象,小学每个年级抽取5个班级,初中至大学每个年级抽取4个班级,以抽取班级的全体学生作为研究对象,共抽取学生14211人,对其进行问卷调查、体格检查和视力测量。不同学段儿童近视的影响因素采用最小绝对收缩和选择算子(LASSO)联合Logistic回归进行分析,选择贝叶斯信息准则(Bayesian information criterion,BIC)最小的模型为最优模型。结果宁夏回族自治区儿童青少年近视检出率为70.3%,女生高于男生,城市高于乡镇,差异均有统计学意义(均为P<0.001);按学段分层后,随着年级的增加,近视检出率随之升高,小学最低,大学最高,不同学段近视检出率差异有统计学意义(P<0.001)。近视影响因素的LASSO-Logistic回归分析表明,城乡、性别、年龄、目前是否配戴眼镜、每日课间操节数、是否积极参加体力活动和过去6个月是否保持规律活动是小学生近视的影响因素(均为P<0.05);性别、目前是否配戴眼镜是初中生和高中生近视的影响因素(均为P<0.05);目前是否配戴眼镜是大学生近视的影响因素(P<0.05)。结论宁夏回族自治区儿童青少年近视检出率高,不同学段儿童青少年近视影响因素差异明显。配戴眼镜是控制近视的保护因素。应根据儿童青少年所处学段开展有针对性的视力相关知识的健康教育,增强其健康保健意识,提高儿童青少年视力。
文摘目的构建跟距联合畸形(talocalcaneal coalition)的X线影像组学模型,并检验其对跟距联合畸形的筛查诊断能力。方法回顾性分析2019年1月至2023年3月吉林大学中日联谊医院放射线科200例行踝关节或足部X线检查的患者临床放射资料(跟距联合阳性及阴性各100例),手动勾画跟距联合畸形所在影像学区域,基于Python-pyradiomics库初步提取影像组学特征,通过曼-惠特尼U检验及最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)算法实现数据降维和特征筛选,用支持向量机(support vector machine,SVM)对筛选得到的影像组学特征分类建模,最终以受试者工作特征(receiver operating characteristic,ROC)曲线的曲线下面积(area under the curve,AUC)、精确度、召回率、敏感度、特异度及F1分数评价模型的诊断效能。结果从X线图像中初步提取到105个组学特征,经曼-惠特尼U检验及LASSO算法筛选出7个强相关性特征,最终以SVM分类器所构建模型的测试集AUC值为0.93,精确度、召回率、敏感度、特异度和F1分数分别为88%、85%、93%、92%、88%,对跟距联合畸形有良好的筛查诊断能力。结论基于X线的影像组学模型可作为筛查诊断跟距联合畸形的一种准确高效的无创性工具,帮助临床医师诊断跟距联合畸形。
文摘Background:Colon adenocarcinoma(COAD)is a gastrointestinal malignancy with a high mortality rate.Studies have confirmed the role of immunogenic cell death(ICD)in different cancer types.However,there is a lack of research on ICD-related genes(ICD-RGs)in COAD.This study aimed to examine the impact of ICD-RGs on COAD and their interaction with the immune microenvironment.Methods:Using data from The Cancer Genome Atlas and Gene Expression Omnibus databases,we identified 107 ICD-RGs in COAD.Using a one-way Cox regression analysis,we examined the relationship between these ICD-RGs and overall survival in COAD.Results:Following the regression analyses,we identified 14 overall survival-related genes.Furthermore,we examined the predictive impact of the ICD-RGs using the least absolute shrinkage and selection operator regression analysis and developed a nine-genes prognostic model.The Cancer Genome Atlas and Gene Expression Omnibus datasets were used for training and validation.Kaplan-Meier analysis was used to confirm that the high-risk group had a lower survival rate than the low-risk group.Finally,following a multifactorial analysis,we created a prognostic nomogram that integrated clinical data and risk scores.Conclusions:The nine-genes model exhibits robust stability and can provide valuable insights for guiding the development of tumor immunotherapy strategies and personalized drug selection for patients with COAD.
基金Project(21465016)supported by the National Natural Foundation of China
文摘This study is intended to explore the chemical differences of Acori Tatarinowii Rhizoma (ATR) samples collected from two habitats, Sichuan and Anhui provinces, China. Gas chromatography-mass spectrometry (GC-MS) was applied to establishing the quantitative chemical fingerprints of ATRs. A total of 104 volatile compounds were identified and quantified with the information of mass spectra and retention index (RI). Furthermore, least absolute shrinkage and selection operator (LASSO), a sparse regularization method, combined with subsampling was employed to improve the classification ability of partial least squares-discriminant analysis (PLS-DA). After variable selection by LASSO, three chemical markers,β-elemene, α-selinene and α-asarone, were identified for the discrimination of ATRs from two habitats, and the total classification correct rate was increased from 82.76% to 96.55%. The proposed LASSO-PLS-DA method can serve as an efficient strategy for screening marked chemical components and geo-herbalism research of traditional Chinese medicines.
基金Supported by Liaoning S&T Project,No.20180550971 and No.20180550999Shenyang Young and Middle-Aged Scientific&Technological Innovation Talents Support Plan,No.2018416017.
文摘BACKGROUND Gastric cancer(GC)is one of the most frequently diagnosed gastrointestinal cancers throughout the world.Novel prognostic biomarkers are required to predict the prognosis of GC.AIM To identify a multi-long noncoding RNA(lncRNA)prognostic model for GC.METHODS Transcriptome data and clinical data were downloaded from The Cancer Genome Atlas.COX and least absolute shrinkage and selection operator regression analyses were performed to screen for prognosis associated lncRNAs.Receiver operating characteristic curve and Kaplan-Meier survival analyses were applied to evaluate the effectiveness of the model.RESULTS The prediction model was established based on the expression of AC007991.4,AC079385.3,and AL109615.2 Based on the model,GC patients were divided into“high risk”and“low risk”groups to compare the differences in survival.The model was re-evaluated with the clinical data of our center.CONCLUSION The 3-lncRNA combination model is an independent prognostic factor for GC.
文摘Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates having high variance if the number of predictors is higher than the number of observations and if high multicollinearity exists among the predictor variables. To handle this problem, Elastic Net (ENet) estimator was introduced by combining LASSO and Ridge estimator (RE). The solutions of LASSO and ENet have been obtained using Least Angle Regression (LARS) and LARS-EN algorithms, respectively. In this article, we proposed an alternative algorithm to overcome the issues in LASSO that can be combined LASSO with other exiting biased estimators namely Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator. Further, we examine the performance of the proposed algorithm using a Monte-Carlo simulation study and real-world examples. The results showed that the LARS-rk and LARS-rd algorithms,?which are combined LASSO with r-k class estimator and r-d class estimator,?outperformed other algorithms under the moderated and severe multicollinearity.