摘要
目的探讨构建基于胸部CT影像组学特征的机器学习模型区分儿童肺炎支原体(MP)、MP合并其他病原体感染(Co-MP)的可行性。方法回顾性分析因肺炎入院,同时行支气管肺泡灌洗(BAL)术患儿的临床实验室、影像学资料。采用荧光定量PCR法检测支气管肺泡灌洗液(BALF)中14种病原体。采用大于64排探测器CT扫描仪进行胸部CT扫描。选取病灶最大层面用于图像分割。沿着病灶边缘进行图像手动分割。手动分割的特征分为三大类:(1)几何形状特征;(2)强度特征;(3)纹理特征。采用t检验、Mann-Whitney U检验、Spearman等级相关系数筛选特征,仅保留P值<0.05的影像组学特征,任意两个影像组学特征相关系数>0.9,仅保留其一。最小绝对收缩和选择算子(LASSO)回归、十折交叉验证用于特征降维,保留非零的相关系数特征用于回归模型拟合,构建影像组学标签。将LASSO回归筛选的影像组学特征输入到不同的机器学习模型中构建模型,采用5次交叉验证得到最终的影像组学标签。使用R软件[版本4.3.2,The Comprehensive R Archive Network(tsinghua.edu.cn)]、SPSS 26.0统计软件进行数据分析。定量资料比较采用t检验、Mann-Whitney U检验。计数资料采用χ^(2)检验或Fisher确切概率法。相关分析采用Spearman等级相关检验。采用受试者工作特征(ROC)曲线评估机器学习模型的诊断效能。P<0.05认为差异具有统计学意义。结果共纳入134例MP感染患儿,按照8∶2的比例将其随机分为训练集(107例)、验证集(27例)。验证集中,两组患儿的住院时间分别为11.12和13.50天,差异具有统计学意义(P=0.040)。本研究共提取出1834个影像组学特征,其中360个一阶的强度特征、14个形态特征以及1460个纹理特征。根据影像组学特征间P值<0.05、Spearman相关系数>0.9,LASSO回归降维、十折交叉验证,共筛选出26个非零相关系数特征用于构建影像组学得分(radscore)。筛选出的26个非零相关系数特征用于不同的机器学习模型训练。训练集中所有机器学习模型区分两者的曲线下面积均超过0.75,多层感知器(MLP)区分MP、Co-MP的效能最高,预测的曲线下面积、阳性预测值、阴性预测值分别为0.924、87.5%和84.2%。决策曲线分析结果表明,应用MLP模型区分MP、Co-MP具有明显的临床增益。样本预测直方图结果表明,MLP模型训练的影像组学标签具有较高的预测准确性。结论基于基线胸部CT影像组学特征的机器学习模型有助于区分儿童MP、Co-MP。
Objective To assess the feasibility of developing a machine learning model based on chest CT imaging radiomics for discriminating Mycoplasma pneumoniae(MP)infection from co-infection with other pathogens(Co-MP)among pediatric patients.Methods We conducted a retrospective analysis of clinical and imaging data of children hospitalized with pneumonia and undergoing bronchoalveolar lavage(BAL).Fourteen pathogens in BALF were detected using fluorescence quantitative PCR.Chest CT scans were performed using a high-resolution CT scanner with more than 64 detectors.The largest lesion level was selected for image segmentation,and manual segmentation was performed along the lesion edge.The segmented features were categorized into:(1)geometric shape;(2)intensity;and(3)texture.Features were screened using statistical tests,and only those with significant differences(P<0.05)were considered.Redundancy was addressed by removing highly correlated features(correlation coefficient>0.9).Least Absolute Shrinkage and Selection Operator(LASSO)regression and 10-fold cross-validation were used for feature reduction.The selected features were then input into various machine learning models to construct the final imaging radiomics label using 5-fold cross-validation.Data analysis was performed using R software(version 4.3.2)and SPSS 26.0.Quantitative data were compared using t-test and Mann-Whitney U test,while categorical data were analyzed using chi-square or Fisher's exact test.Correlation analysis was performed using Spearman's rank correlation test.The diagnostic performance of the machine learning model was evaluated using the receiver operating characteristic(ROC)curve.P<0.05 was considered statistically significant.Results A total of 134 children with MP infection were included in this study,and they were randomly divided into a training set(107 cases)and a test set(27 cases)in a ratio of 8:2.In the test set,the hospitalization time of the two groups of children was 11.12 and 13.50 days,respectively,with a statistically significant difference(P=0.040).A total of 1834 radiomics features were extracted in this study,including 360 first-order intensity features,14 morphological features,and 1460 texture features.Based on the P value<0.05,Spearman's correlation coefficient>0.9,Lasso regression dimension reduction,and 10-fold cross-validation,a total of 26 non-zero correlation coefficient features were screened out for constructing the imaging genomics score(Radscore).The 26 non-zero correlation coefficient features were used for training different machine learning models.The AUC of all machine learning models in the training set to distinguish between the two groups exceeded 0.75.The multi-layer perceptron(MLP)had the highest efficacy in distinguishing between MP and Co-MP,with an area under the curve,positive predictive value,and negative predictive value of 0.924,87.5%,and 84.2%,respectively.The DCA results showed that the application of the MLP model to distinguish between MP and Co-MP had significant clinical benifit.The sample prediction histogram results showed that the trained radiomics label of the MLP model had high prediction accuracy.Conclusion Machine learning models based on baseline chest CT radiomics features can help distinguish between MP and Co-MP in children.
作者
徐文北
刘晓涵
孟令建
王自豪
张贺
孙潇楠
孟闫凯
康海全
茅一萍
荣玉涛
胡春峰
徐凯
XU Wenbei;LIU Xiaohan;MENG Lingjian(Department of Radiology,the Affiliated Hospital of Xuzhou Medical University,Xuzhou,Jiangsu Province 221002,P.R.China)
出处
《临床放射学杂志》
北大核心
2024年第11期1974-1979,共6页
Journal of Clinical Radiology