基于Shapley加性解释的混合判别分析模型预测产后抑郁症风险临床研究

刘林静; 张小伟; 魏文辉; 史晓娟

基于Shapley加性解释的混合判别分析模型预测产后抑郁症风险临床研究

Clinical study on the mixed discriminant analysis model based on the the Shapley additive interpretation in predicting the risk of postpartum depression

摘要

摘要:
目的： 基于机器学习（ML）模型和Shapley加性解释（SHAP）构建预测产妇产后6 ~ 8周内发生产后抑郁症（PPD）风险模型。
方法： 采用爱丁堡产后抑郁量表量表评估201例产妇产后6 ~ 8周内PPD发生。使用Boruta算法筛选PPD风险因素。构建支持向量机、朴素贝叶斯模型、线性判别分析和混合判别分析（MDA）4种ML模型预测PPD风险。使用校准曲线、精确率–召回率曲线、精确率–召回率增益曲线以及接收者操作特征曲线（ROC）评估4种ML模型预测准确性，并通过SHAP解释和可视化最优ML模型。
结果： 201例产妇的PPD发生率为15. 92%（32/201）。Boruta算法筛选出匹兹堡睡眠质量指数（PSQI）评分、喂养方式、社会支持评定量表（SSRS）评分、受教育程度以及家庭人均月收入是PPD风险变量。4种ML模型中，MDA模型一致性指数、召回率、召回增益和ROC曲线的线下面积指标均优于其他3种ML模型。基于SHAP解释和可视化MDA模型能准确预测产妇PPD风险。
结论： SSRS评分、PSQI评分、受教育程度、喂养方式以及家庭人均月收入与产妇PPD风险相关。基于SHAP值解释和可视化的MDA模型能有效预测PPD风险。

Abstract:
Objective To construct a model for predicting the risk of postpartum depression (PPD) in parturients within 6 to 8 weeks after childbirth based on machine learning (ML) model and Shapley additive interpretation (SHAP).
Methods The Edinburgh Postnatal Depression Scale was used to evaluate the occurrence of PPD in 201 parturients within 6 to 8 weeks after delivery. The Boruta algorithm was used to screen the risk factors of PPD. Four ML models, namely support vector machine, Naive Bayes model, linear discriminant analysis and mixed discriminant analysis (MDA), were constructed to predict the risk of PPD. The prediction accuracy of four ML models was evaluated using the calibration curves, precision-recall curves, precision-recall gain curves and receiver operating characteristic curves (ROC), and the optimal ML model was interpreted and visualized through SHAP.
Results The incidence of PPD in 201 parturients was 15.92% (32/201). The Boruta algorithm screened out the Pittsburgh Sleep Quality Index (PSQI) score, feeding method, Social Support Rating Scale (SSRS) score, educational attainment and per capita monthly income of family were the risk variables of PPD. Among the four ML models, the consistency index, recall rate, recall gain and area below the ROC curve of the MDA model were all superior to those of the other three ML models. The SHAP interpretation and visualization of the MDA model could accurately predict the risk of PPD in parturients.
Conclusions The SSRS score, PSQI score, educational attainment, feeding style and per capita monthly household income are associated with the PPD risk. The MDA model based on the interpretation and visualization of SHAP values can effectively predict the risk of PPD.

HTML全文

参考文献(27)

施引文献

资源附件(0)