基于机器学习算法的维持性血液透析病人继发缺血性脑卒中预测及影响因素分析

    Prediction of the secondary ischemic stroke in maintenance hemodialysis patients based on machine learning algorithms and analysis of influencing factors

    • 摘要:
      目的: 以不同机器学习算法对维持性血液透析(MHD)病人继发缺血性脑卒中预测并对相关因素进行分析。
      方法: 采用回顾性病例对照研究,选取151例MHD病人为研究对象,根据病人是否发生缺血性脑卒中分为缺血性脑卒中组和非缺血性脑卒中组,采用单因素和logistic回归分析风险因素,基于随机森林(RF)、支持向量机(SVM)、K-最近邻(KNN)、逻辑回归(LR)4个机器学习模型,构建MHD病人继发缺血性脑卒中预测模型。采用准确率、敏感度、特异度、F1、ROC曲线下面积 (AUC)评价模型稳健型及实用性。
      结果: 23例(15.2%)MHD病人发生缺血性脑卒中,多因素logistic回归分析结果显示,年龄(OR = 1.076,95%CI: 1.004~1.153)、血液透析年限(OR = 1.213,95%CI: 1.008~1.458)、房颤病史(OR = 14.016,95%CI: 1.664~37.994)、吸烟史(OR = 12.628,95%CI: 2.015~79.142)、尿酸(OR = 1.104,95%CI: 1.037~1.175)、血清白蛋白(OR = 0.781,95%CI: 0.643~0.947)是MHD病人继发缺血性脑卒中的独立影响因素。ROC曲线分析显示,在4个模型中RF模型对MHD病人继发缺血性脑卒中的预测效果最好(AUC = 0.932)。4种算法建立的MHD病人预测模型中,RF模型的综合预测效能最佳,准确率、敏感度、特异度、F1得分均最高。RF模型排名前6位特征中尿酸占比为最大(17.40%)、血清白蛋白(12.70%)、年龄(10.70%)、血液透析年限(9.90%)、吸烟史(9.30%)、房颤病史(6.60%)。
      结论: 尿酸、血清白蛋白、年龄、血液透析年限、吸烟史、房颤病史是MHD病人继发缺血性脑卒中的独立影响因素。机器学习模型可以作为预测MHD病人继发缺血性脑卒中的可靠工具,其中RF模型具有最佳的预测性能,有助于临床医务人员识别高危病人并早期实施干预以降低发生率。

       

      Abstract:
      Objective To predict the secondary ischemic stroke in maintenance hemodialysis (MHD) patients using different machine learning algorithms, and analyze its related factors.
      Methods A retrospective case-control study was conducted. 151 patients with MHD were selected as the research subjects. According to whether the patients had ischemic stroke, they were divided into the ischemic stroke group and non-ischemic stroke group. Univariate and logistic regression analyses were used to analyze the risk factors. Based on four machine learning models, namely Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression (LR), a prediction model for secondary ischemic stroke in MHD patients was constructed. The robustness and practicability of the model were evaluated by using accuracy rate, sensitivity, specificity, F1 and area under the ROC curve (AUC).
      Results The ischemic stroke occurred in 23 cases (15.2%). The results of multivariate logistic regression analysis showed that the age (OR = 1.076, 95%CI: 1.004–1.153) and years of hemodialysis (OR = 1.213, 95%CI: 1.008–1.458), history of atrial fibrillation (OR = 4.016, 95%CI: 1.664–37.994), smoking history (OR = 12.628, 95%CI: 2.015–79.142), uric acid (OR = 1.104, 95%CI: 1.037–1.175) and serum albumin (OR = 0.781, 95%CI: 0.643–0.947) were the independent influencing factors of secondary ischemic stroke in MHD patients. The ROC curve analysis showed that among the four models, the RF model had the best predictive effects on secondary ischemic stroke in MHD patients (AUC = 0.932). Among the MHD patient prediction models established by the four algorithms, the RF model had the best comprehensive prediction efficiency, with the highest accuracy rate, sensitivity, specificity and F1 score. Among the top 6 characteristics of the RF model, the proportion of uric acid was the largest (17.40%), followed by serum albumin (12.70%), age (10.70%), years of hemodialysis (9.90%), smoking history (9.30%) and history of atrial fibrillation (6.60%).
      Conclusions The uric acid, serum albumin, age, years of hemodialysis, smoking history and history of atrial fibrillation are the independent influencing factors of secondary ischemic stroke in MHD patients. Machine learning models can serve as reliable tools for predicting secondary ischemic stroke in MHD patients. The RF model has the best predictive performance, which helps clinical healthcare professionals identify high-risk patients and implement early interventions to reduce incidence.

       

    /

    返回文章
    返回