基于机器学习算法的卵巢癌风险列线图预测模型

    A nomogram risk prediction model for ovarian cancer based on machine learning algorithms

    • 摘要:
      目的 基于机器学习算法构建卵巢癌(OV)发生的风险列线图预测模型,为OV病人的早期筛查提供临床依据。
      方法 搜集卵巢肿瘤病人142例,其中OV病人71例,卵巢良性肿瘤的病人71例。收集所有病人的年龄、血型、病理诊断、术前血常规、生化常规、凝血功能、免疫过筛以及肿瘤五项等血清学指标。采用LASSO回归初步筛选影响OV的因素,再通过随机森林分析影响因素的重要性排序,并将其结果纳入多因素logistic回归,从而确定最终筛选的影响因素,并用其构建OV列线图预测模型。最后,通过ROC曲线、校准曲线和临床决策分析(DCA)曲线评估模型性能。
      结果 通过LASSO回归和随机森林筛选影响OV发生的变量重要性由高到低排序依次为糖类抗原125(CA125)、血浆D-二聚体、绝经前卵巢癌风险预测模型(PREM)、绝经后卵巢癌风险预测模型(POSTM)、纤维蛋白原、血小板计数、淋巴细胞百分比(LY%)。多因素logistic回归结果发现CA125(OR=1.037, 95%CI:1.012~1.063)、PREM(OR=1.158, 95%CI: 1.011~1.327)以及LY%(OR=0.910, 95%CI: 0.851~0.973)对OV的影响有统计学意义。而构建的OV风险预测列线图模型的ROC曲线的AUC为0.966,校准曲线和DCA曲线也均显示预测模型具有较好临床实用性。
      结论 基于机器学习算法筛选出影响OV的因素有CA125、PREM和LY%,据此构建的OV风险列线图模型对OV病人具有较高的预测价值。

       

      Abstract:
      Objective To construct a nomogram risk prediction model for ovarian cancer (OV) occurrence based on machine learning algorithms, providing clinical evidence for early screening of OV patients.
      Methods A total of 142 ovarian tumor patients were included, comprising 71 OV patients and 71 benign ovarian tumor patients.Clinical data such as age, blood type, pathological diagnosis, preoperative blood routine tests, biochemical tests, coagulation function, immunological screening, and serum tumor markers such as five-item tumor marker panel were collected.LASSO regression was used to screen factors influencing OV initially, followed by random forest analysis to rank the importance of these factors.The results were then incorporated into multivariate logistic regression to identify the final influencing factors, which were used to construct the OV nomogram prediction model.Finally, the model performance was evaluated using ROC curve, calibration curve, and decision curve analysis (DCA).
      Results The variables influencing OV occurrence, screened by LASSO regression and random forest algorithms, were ranked in descending order of importance as follows: carbohydrate antigen 125 (CA125), plasma D-dimer, premenopausal ovarian cancer risk prediction model (PREM), postmenopausal ovarian cancer risk prediction model (POSTM), fibrinogen, platelet count, and lymphocyte percentage (LY%).Multivariate logistic regression revealed that CA125 (OR=1.037, 95%CI: 1.012-1.063), PREM(OR=1.158, 95%CI: 1.011-1.327), and LY% (OR=0.910, 95%CI: 0.851-0.973) were statistically significant predictors of OV.The constructed nomogram risk prediction model for OV achieved an AUC of 0.966 in ROC analysis, and both calibration curve and DCA curve demonstrated strong clinical utility of the prediction model.
      Conclusions CA125, PREM, and LY% were identified as key factors influencing OV through machine learning algorithms.The constructed nomogram risk prediction model for OV based on these factors exhibits high predictive value for OV patients.

       

    /

    返回文章
    返回