四种机器学习模型对AECOPD病人再入院风险准确性比较及预测模型构建

    Comparison of the accuracy of four machine learning models on readmission risk of AECOPD patients and construction of predictive models

    • 摘要:
      目的: 基于机器学习(ML)模型和Shapley加性解释(SHAP)构建预测急性加重慢性阻塞性肺疾病(AECOPD)病人30 d内再入院风险的预测模型。
      方法: 选取2022年1月至2024年4月间收治的225例AECOPD病人作为测试集;选取2024年5月至2025年8月间收治的123例AECOPD病人作为验证集。测试集中,通过LASSO算法筛选AECOPD病人30 d内再入院风险变量,构建和训练四种ML模型包括线性判别分析(LDA)、混合判别分析(MDA)、灵活判别分析(FDA)和极端梯度提升(XGBoost)。使用校准曲线、精确–召回曲线(PRC),精确–召回增益曲线(PRGC)及ROC曲线评估四种ML模型预测性能。通过SHAP值附加解释ML模型并使用shiny等R包开发预测AECOPD病人30 d内再入院风险的在线应用程序。在验证集中,采用校正曲线、决策曲线、ROC对ML模型进行验证。
      结果: 225例AECOPD病人30 d内再入院率为18.7%(42/225)。LASSO算法筛选年龄、教育程度、糖尿病、吸烟史、白蛋白(Alb)、超敏–C反应蛋白(hs–CRP)、D–二聚体(D–D)和第1秒用力呼气容积占预计值的百分比(FEV1%pred)是AECOPD病人30 d内再入院的风险变量。四种ML算法中,校准曲线、PRC、PRGC和ROC证实XGBoost具有较高预测准确度。基于SHAP值附加解释和可视化XGBoost模型能以极高准确度预测AECOPD病人30 d内再入院风险,在线应用程序见https://per-dynamic.shinyapps.io/COPD_AE/。在验证集中,校正曲线显示C–index为0.812,XGBoost模型预测结果与实际观察结果具有较高一致性;决策曲线分析结果显示,当风险阈值在0.123 ~ 0.910,XGBoost模型具有显著临床净收益;ROC显示XGBoost模型的AUC为0.853,表明XGBoost模型具有优秀预测AECOPD病人30 d内再入院风险效能。
      结论: 年龄、FEV1%pred、Alb、D–D、hs–CRP、吸烟史、教育程度和糖尿病是AECOPD病人再入院的风险变量。基于SHAP值解释XGBoost模型的在线应用程序能精准和便捷预测AECOPD病人30 d内再入院风险。

       

      Abstract:
      Objective To construct of a predictive model to predict the risk of readmission within 30d in patients with acute exacerbation of chronic obstructive pulmonary disease (AECOPD) based on the machine learning (ML) modeling and Shapley's additive interpretation (SHAP).
      Methods A total of 225 patients with AECOPD who were admitted from January 2022 to April 2024 were selected as the test set, 123 patients with AECOPD who were admitted from May 2024 to August 2025 were selected as the validation set. In the test set, the LASSO algorithm was used to screen the risk variables for readmission of AECOPD patients within 30 days. Four ML models, including linear discriminant analysis (LDA), mixed discriminant analysis (MDA), flexible discriminant analysis (FDA) and extreme gradient boost (XGBoost) were constructed and trained. The predictive performance of four ML models were evaluated using calibration curves, precision-recall curves (PRC), precision-recall gain curves (PRGC) and ROC curves. The ML models were interpreted by attaching SHAP values, and the online application for predicting the risk of readmission within 30 days in patients with AECOPD was developed d using R packages such as shiny. In the validation set, the ML model was validated using the correction curve, decision curve and ROC.
      Results The readmission rate within 30 d in 225 patients with AECOPD was 18.7% (42/225). The LASSO algorithm screened for the age, education, diabetes mellitus, smoking history, albumin (Alb), ultrasensitive-C-reactive protein (hs-CRP), D-dimer (D-D) and forced expiratory volume in the first second as a percentage of predicted value (FEV1%pred) as the risk variables for readmission within 30d in AECOPD patients. Of the four ML algorithms, the high predictive accuracy of XGBoost was confirmed by calibration curves, PRC, PRGC and ROC. The XGBoost model based on SHAP values additional interpretation and visualization had very high accuracy in predicting the risk of readmission within 30 d in AECOPD patients, and the online application was available at https://per-dynamic.shinyapps.io/COPD AE/. In the validation set, the correction curve showed that the C-index was 0.812, and the prediction results of the XGBoost model had a high consistency with the actual observed results. The results of the decision curve analysis showed that when the risk threshold was between 0.123 and 0.910, the XGBoost model had significant clinical net benefits. The results of ROC showed that when the AUC of the XGBoost model was 0.853, the XGBoost model had excellent efficacy in predicting the risk of readmission within 30 days in patients with AECOPD.
      Conclusions The age, FEV1%pred, Alb, D-D, hs-CRP, smoking history, education and diabetes are the risk variables of readmission in AECOPD patients. An online application based on SHAP values interpreting the XGBoost model can accurately and conveniently predicts the risk of readmission within 30 d in AECOPD patients.

       

    /

    返回文章
    返回