Skip to main content

Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding

Abstract

Background

The global outbreak of the coronavirus disease 2019 (COVID-19) has been enormously damaging, in which prolonged shedding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, previously 2019-nCoV) infection is a challenge in the prevention and treatment of COVID-19. However, there is still incomplete research on the risk factors that affect delayed shedding of SARS-CoV-2.

Methods

In a retrospective analysis of 56,878 hospitalized patients in the Fangcang Shelter Hospital (National Convention and Exhibition Center) in Shanghai, China, we compared patients with the duration of SARS-CoV-2 viral shedding > 12 days with those days < 12 days. The results of real-time polymerase chain reaction (RT-PCR) tests determined the duration of viral shedding from the first day of SARS-CoV-2 positivity to the day of SARS-CoV-2 negativity. The extreme gradient boosting (XGBoost) machine learning method was employed to establish a prediction model for prolonged SARS-CoV-2 shedding and analyze significant risk factors. Filtering features retraining and Shapley Additive Explanations (SHAP) techniques were followed to demonstrate and further explain the risk factors for long-term SARS-CoV-2 infection.

Results

We conducted an assessment of ten different features, including vaccination, hypertension, diabetes, admission cycle threshold (Ct) value, cardio-cerebrovascular disease, gender, age, occupation, symptom, and family accompaniment, to determine their impact on the prolonged SARS-CoV-2 shedding. This study involved a large cohort of 56,878 hospitalized patients, and we leveraged the XGBoost algorithm to establish a predictive model based on these features. Upon analysis, six of these ten features were significantly associated with the prolonged SARS-CoV-2 shedding, as determined by both the importance order of the model and our results obtained through model reconstruction. Specifically, vaccination, hypertension, admission Ct value, gender, age, and family accompaniment were identified as the key features associated with prolonged viral shedding.

Conclusions

We developed a predictive model and identified six risk factors associated with prolonged SARS-CoV-2 viral shedding. Our study contributes to identifying and screening individuals with potential long-term SARS-CoV-2 infections. Moreover, our research also provides a reference for future preventive control, optimizing medical resource allocation and guiding epidemiological prevention, and guidelines for personal protection against SARS-CoV-2.

Introduction

The coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, previously 2019-nCoV), appearing in December 2019, has caused worldwide outbreaks posing critical public health threats with more than 700 million confirmed cases. More than six million reported deaths according to WHO (as of April 19, 2023), and still increasing today [1, 2]. COVID-19 infection has the potential to cause damage to multiple organs, including the stomach, lungs, liver, and kidneys, and contribute to adverse outcomes in conjunction with cardiovascular comorbidities and hematological complications, as well as to cause psychiatric symptoms such as cognitive impairment, depression, and anxiety [3,4,5,6,7]. Outbreaks of COVID-19 pose a severe blow to public health security worldwide.

As the etiological agent of COVID-19, SARS-CoV-2 can be detected positively by real-time polymerase chain reaction (RT-PCR) tests. Long-term positive reverse transcription polymerase chain reaction RT-PCR is described as prolonged SARS-CoV-2 shedding, which may cause immune dysregulation, especially in cancer patients with impaired immune function at baseline, and may lead to disruption of targeted therapy and increased patient mortality [8]. More seriously, SARS-CoV-2 viruses may evolve to produce immune evasion mutations during prolonged infection, potentially increasing the probability of virus transmission and disease severity to enhance the pressure of combating the epidemic [9]. In addition, the 2023 COVID-19 treatment guidelines suggest that there is currently insufficient evidence to guide treatment management or provide clinical recommendations for these individuals [10]. Although prolonged shedding of SARS-CoV-2 is not uncommon, there is a lack of research on the risk factors associated with the duration of viral positivity. Therefore, exploring risk factors for delayed shedding of SARS-CoV-2 presents significant therapeutic and defensive implications for both individual and public health perspectives.

Machine learning is an advanced technology widely employed in biology and medicine [11, 12]. Recently, some studies have utilized classical machine learning methods to analyze COVID-19, such as predicting the severity and prognosis of the patients [13,14,15]. However, there is a dearth of existing research on the prolonged shedding of SARS-CoV-2, and most studies have solely relied on simple machine learning models for COVID-19 analysis. Gradient boosting is an advanced machine learning method that iteratively adds weak predictive models to an ensemble, correcting residual errors from previous models, and has the advantage of high accuracy, which is widely used to deal with regression and classification problems [16]. Extreme gradient boosting (XGBoost) is a robust implementation of the gradient boosting algorithm, with high accuracy, robustness, scalability, and customizability [17].

In this paper, a retrospective analysis was performed utilizing the XGBoost algorithm on ten relevant features of a cohort of 56,878 COVID-19 patients hospitalized in Shanghai Fangcang Shelter Hospital to establish a predictive model for predicting patients with prolonged SARS-CoV-2 shedding and to identify risk factors that influence the duration of SARS-CoV-2 positivity. These ten factors are vaccination, hypertension, diabetes, admission cycle threshold (Ct), cardio-cerebrovascular disease, gender, age, occupation, symptom, and family accompaniment. RT-PCR collected the positive duration of SARS-CoV-2 in patients, and patients with the duration of SARS-CoV-2 positivity > 12 days were defined as delayed SARS-CoV-2 shedding patients. Ultimately, the results suggest that delayed shedding of SARS-CoV-2 may be closely associated with six features, including vaccination, hypertension, Ct, gender, age, and family accompaniment. These findings provide valuable insights into identifying and characterizing risk factors related to prolonged SARS-CoV-2 infection, and could facilitate screening potential long-positive patients.

Methods

Clinical data collection

In this study, from April 10, 2022, to May 30, 2022, we enrolled 56,878 SARS-CoV-2-positive hospitalized patients from Shanghai Fangcang Shelter Hospital. We collected ten features of vaccination, hypertension, diabetes, admission Ct, cardio-cerebrovascular disease, gender, age, occupation, symptom, and family accompaniment for each patient. We classified age into four categories: adolescents (0–18 years), prime-age (18–40 years), middle-age (40–60 years), and old-age (60 years or older), vaccination status into unvaccinated, incompletely vaccinated, and wholly vaccinated, classified occupation into unemployed, self-employed, and employed, classified symptoms into asymptomatic and mild. The diagnosis of COVID-19 was based on the Chinese New Coronavirus Pneumonia Diagnosis and Treatment Program (Trial Version 8) [11]. Mild symptoms patients may present with low-grade fever, mild malaise, and olfactory and taste disturbances without pneumonia manifestations. RT-PCR confirmed SARS-CoV-2 infection, and sampling was performed once a day during hospitalization to obtain the Ct values of patients. The duration of SARS-CoV-2 shedding was the number of days from the initial Ct value less than 40 to the period of Ct value greater than 40, from which we defined patients with greater than 12 days of duration of SARS-CoV-2 shedding as the prolonged SARS-CoV-2 shedding. The Ethics Committee approved the protocol of this study (Approval No: (B) KY2023040).

Data pre-processing

In the collected cohort, we observed a smaller number of patients (1881) with prolonged SARS-CoV-2 shedding compared to those (54,997) with a duration of fewer than 12 days. To ensure a balanced representation of both delayed and nondelayed shedding, we employed an over-sampling strategy. This strategy helped us achieve similar proportions of patients in both categories. In addition, we obtained 339 prolonged SARS-CoV-2 shedding patients and 330 nondelayed SARS-CoV-2 shedding patients as the test set using stratified random sampling.

Modeling and statistical analysis

The Z-score was utilized to standardize continuous variables, representing the median and IQR. For categorical variables, we expressed them as frequencies and percentages. Ct values were treated as continuous variables, and the remaining features were considered categorical variables. Correlation analysis was carried out using Spearman's rank correlation. Differences in the duration of SARS-CoV-2 infection between age groups were analyzed by t-tests, and P < 0.05 were considered significant.

To identify risk factors associated with prolonged SARS-CoV-2 shedding, we constructed an XGBoost model. We employed Bayesian optimization for hyperparameter tuning and performed nine-fold cross-validation. We evaluated the model performance using the area under the receiver operating characteristic curve (AUROC). Feature importance ranking was determined by the frequency of feature usage in sample splitting across all trees. Additionally, we calculated Shapley values using the Shapley Additive Explanations (SHAP) technique to quantify feature importance and enhance the interpretability of the model.

Results

Study cohort and features

Several studies [18,19,20] have demonstrated that age, gender, vaccination status, admission Ct value, symptom, and comorbidities like hypertension, diabetes, and cardiovascular disease may be associated with viral detoxification duration and could contribute to prolonged shedding of SARS-CoV-2. We included these features in our research. An infectious disease dynamics study [21] revealed that prolonged contact with SARS-CoV-2 infected patients and household contact pose the highest transmission risk and can result in secondary transmission. Therefore, we counted whether patients had family accompaniment during hospitalization. Additionally, the role of occupation in SARS-CoV-2 infection is complex. Intuitively, the degree and amount of exposure to airborne viruses vary by occupation, and the risk of delayed shedding may also differ [22]. Thus, we also statistically analyzed this.

A total of 56,878 patients from April 10, 2022, to May 30, 2022, in Shanghai Fangcang Shelter Hospital were included in this study, including 1881 prolonged SARS-CoV-2 shedding patients and 54,997 non-delayed SARS-CoV-2 shedding patients.

Of the overall patients, 34,982 (61.5%) were male, which was higher than the female patients of 21,896 (38.5%). 2641 (4.6%) family members accompanied patients during hospitalization. Nearly 12.6% of the patients had at least one comorbidity. Almost half of the patients received the entire course of vaccinations, while the remaining patients were incompletely vaccinated or unvaccinated. According to the clinical classification guidelines, 40,839 patients were asymptomatic (71.8%), 16,039 were classified as mild (28.2%), and no moderately or severely ill patients were in our cohort. In this study, Compared with all patients, the proportion of prolonged SARS-CoV-2 shedding patients increased in middle-aged and old-age patients while decreasing in adolescents and prime-age. The median Ct value at admission for prolonged SARS-CoV-2 shedding patients was 28.6 (IQR, 25.1–31.6), which was smaller than that of 30.77 (27.8–33.1) for non-delayed SARS-CoV-2 shedding patients. Baseline clinical characteristics of these patients are listed in Table 1. We categorized occupational characteristics into unemployed, self-employed, and employed, where the occupational profiles of patients with non-delayed shedding of SARS-CoV-2 were similar to those of all patients, compared with more unemployed and fewer employed patients with prolonged SARS-CoV-2 shedding.

Table 1 Demographic characteristics of patients with COVID-19

Model construction and evaluation

We verified the association between the ten features with a confusion matrix of Pearson correlation coefficients, where the correlations between hypertension and the two features of age and diabetes were 0.31 and 0.28, respectively (Fig. 1A). These results suggest weak correlations between hypertension and these features. Moreover, our findings indicate no strongly collinear features, dismissing the possibility of multicollinearity within the data.

Fig. 1
figure 1

The evaluation of features and the prediction model associated with delayed shedding of SARS-CoV-2. A Confusion matrix of Pearson’s correlation coefficients for the ten features, showing the correlation between the features. The color indicates the value of the correlation coefficient (r). Darker colors indicate stronger correlations, and in the study, the ten features were weakly correlated with each other (r < 0.4). B ROC curves for the XGBoost model, with ninefold cross-validation. AUROC demonstrates model performance

We trained a predictive model utilizing the XGBoost algorithm with the ten features. Early stopping based on the test set accuracy was implemented to mitigate the risk of overfitting. To determine the classification ability of the model, we evaluated it using ROC curves. The average AUROC is 0.758, indicating the model has excellent classification performance (Fig. 1B). In addition, the nine AUROC results obtained through the implementation of nine-fold cross-validation exhibit extreme similarity, demonstrating the satisfactory robustness of the model.

Feature importance ranking and quantitative analysis

This section presents a comprehensive analysis of feature importance and quantitative impact on prolonged SARS-CoV-2 shedding. First, we rank the features based on their usage frequency (gain analysis technique) in the XGBoost model. We then conduct a quantitative analysis to understand the impact of each feature on the model's predictions using SHAP (SHapley Additive exPlanations) values. This dual approach provides a deeper understanding of each feature’s contribution to the model output and highlights factors associated with prolonged shedding.

Figure 2A illustrates the feature importance ranking of the ten variables used in the XGBoost model. The ranking is determined based on the frequency of feature usage in the sample-splitting process across all trees in the XGBoost model. According to the ranking, six features emerged as having the highest importance, i.e., vaccination status, hypertension, Ct value at admission, gender, age, and family companionship. The prominence of these six features indicates a significant association with prolonged SARS-CoV-2 shedding.

Fig. 2
figure 2

Feature importance analysis of the model. A XGBoost-based feature importance ranking. Calculate the number of times the feature was used as a dividing attribute in all trees with the weight method. B In accordance with their mean absolute values, SHAP beeswarm plot for the ten features demonstrate the relationship between these features and the risk of delayed shedding of SARS-CoV-2. Each point represents one patient. High values of the features were shown in red and low values were shown in blue. C SHAP analysis exhibits the average impact on model output

To further explain the contribution of each feature to the model output, we calculated SHAP values based on game theory. Figure 2B–C shows an aggregate SHAP plot for the ten features included in the model, providing valuable information to understand the impact of each variable on the model output. Based on the mean of the absolute SHAP values, six features, vaccination status, hypertension, Ct value at admission, gender, age, and family companionship, had the most significant average impact on the model output, which is consistent with the results of the XGBoost feature importance analysis.

In the analysis of SHAP, the Ct value at admission was identified as the most significant factor influencing the model. Specifically, lower Ct values were found to be strongly associated with higher SHAP values, indicating a high correlation with the likelihood of prolonged shedding of SARS-CoV-2. Conversely, higher Ct values were associated with a reduced risk of prolonged shedding. Furthermore, increasing age was found to be associated with an increased risk of prolonged shedding of SARS-CoV-2, while decreasing age was associated with a reduced risk of delayed shedding. Additionally, being unvaccinated or incompletely vaccinated and being male were identified as factors that increase the risk of prolonged SARS-CoV-2 shedding, whereas being fully vaccinated or female reduced this risk. In contrast, hypertension and family companionship during hospitalization increased the risk of prolonged SARS-CoV-2 shedding, while the absence of hypertension and lack of family companionship did not reduce this risk.

Features selection and model reconstruction

The XGBoost-based feature importance analysis primarily considers the global importance of features, while the SHAP analysis, based on Shapley value theory and game theory, quantifies the contribution of each feature to the model's prediction outcomes. In our investigation, both analyses consistently identified vaccination status, hypertension, Ct value at admission, gender, age, and family companionship as the top six important features. Consequently, we concentrated on these six features to retrain the model to predict prolonged SARS-CoV-2 shedding.

The retrained process is also based on the XGBoost algorithm. After removing the features with lower contributions, the newly retrained model demonstrated improved classification performance compared to the original training model. The average AUROC of the new model was 0.76, surpassing the AUROC of the initial training model while still showing excellent robustness (Fig. 3). These results suggest that the six selected features are critical determinants of long-term SARS-CoV-2 infection, contributing significantly to the model's classification accuracy.

Fig. 3
figure 3

Evaluation of the new model. ROC curve of the new model, an XGBoost model employing ninefold cross-validation. AUROC demonstrates the performance of the model

Analysis of key risk factors

In both the XGBoost-based feature importance analysis and the Shapley-based SHAP analysis, the Ct value at admission and age emerged as the top two influential features. Consequently, we conducted a detailed investigation into the relationship between these features and the duration of SARS-CoV-2 infection.

Ct values below 35 were considered indicative of higher viral loads. The scatterplot in Fig. 4A illustrates the relationship between admission Ct values and the duration of SARS-CoV-2 infection. The majority of patients had admission Ct values below 35, and among those with Ct values above 35, only five patients exhibited persistent SARS-CoV-2 infections lasting more than 12 days, which was significantly lower compared to those with viral infections of shorter duration. Patients with high Ct values tended to have shorter durations of SARS-CoV-2 positivity.

Fig. 4
figure 4

Association of admission Ct values and age with duration of SARS-CoV-2. A Scatterplot of admission Ct values versus SARS-CoV-2 duration. Thresholds on the horizontal axis are 35, and on the vertical axis are 12. B Relationship between age and duration of SARS-CoV-2. Differences were analyzed using t-tests. *P < 0.05, **P < 0.01, ***P < 0.001

We conducted an analysis of the relationship between age and duration of SARS-CoV-2 positivity. Interestingly, prime-aged patients had the shortest period of SARS-CoV-2 infection compared to other age groups, followed by adolescents and middle-aged patients. Significantly, old-aged patients had the most prolonged duration of SARS-CoV-2 positivity among all groups, potentially attributed to their immunocompromised status and higher prevalence of comorbidities such as hypertension, diabetes mellitus, and cardiovascular disease.

Discussion

This retrospective study included the clinical characteristics of 56,878 patients diagnosed with COVID-19, developed predictive models, and identified several risk factors influencing the duration of SARS-CoV-2 infection, with age, admission Ct value, hypertension, gender, family accompaniment, and vaccination administered being the most highly correlated factors with prolonged SARS-CoV-2 shedding. This can assist clinicians in early identification of patients with potential for prolonged SARS-CoV-2 shedding, informing antiviral interventions and patient isolation strategies.

In this study, age emerged as a significant risk factor, with advanced age being strongly associated with a longer duration of SARS-CoV-2 infection. Consistent with previous research, advanced age has been identified as a risk factor for mortality and increased disease severity in COVID-19 patients [23]. This could be attributed to the higher prevalence of comorbidities such as hypertension, diabetes, and cardiovascular disease among elderly individuals compared to younger patients. Furthermore, the aging process is accompanied by a weakening of the humoral immune response, reduced secretion of cytokines such as gamma interferon in vitro, and an increase in the frequency of late-differentiated effector T cells, effector memory T cells, and T regulatory cells [24]. These factors contribute to a diminished vaccine response in the elderly population and prolonged SARS-CoV-2 viral shedding [25, 26]. Moreover, studies involving aged nonhuman primates have demonstrated that aging is linked to altered adaptive immune responses and oxidative stress resulting from cumulative oxidative damage and weakened antioxidant defenses [27, 28]. This oxidative stress mediates factors such as NF-κB and type I interferon β to modulate the inflammatory state in the host, leading to prolonged viral shedding and a more severe disease course [29].

The initial Ct value at admission was identified as a significant risk characteristic In our study. This finding aligns with a study conducted by Zhang et al. [30] in Shanghai. Their research indicated that prolonged shedding of the virus was associated with lower Ct values at admission. It is important to note that a positive RT-PCR result does not necessarily indicate the presence of a live virus capable of causing infection. However, the Ct value remains a valuable indicator of viral infectivity, with a 3.3 increase in Ct value corresponding to a tenfold decrease in viral RNA [31]. Additionally, there is a strong correlation between the Ct value and viral transmissibility. Singanayagam et al. [32] conducted a study involving 324 clinical specimens for viral culture and observed that for each unit increase in Ct value, the success of viral inoculation decreased by 0.67.

In our study, male patients infected with SARS-CoV-2 may have a longer duration of viral shedding, which is also supported by the research of Zheng et al. [33]. This may be due to differences in specific gene expression and hormone levels between males and females [34, 35]. Abnormal activation of the innate immune system and subsequent release of pro-inflammatory cytokines in hypertensive patients exacerbate airway hyperinflammation, which may lead to delayed clearance of the SARS-CoV-2 virus [36, 37]. Family companionship during hospitalization may alter the psychological state of patients, thereby interfering with their immune response, and the age structure of patients who require family companionship may also be a contributing factor [38].

Previous studies have explored factors associated with prolonged shedding of SARS-CoV-2. For instance, Lin et al. [39] surveyed 137 COVID-19 patients and found a significant correlation between delayed viral shedding and indicators of disease severity, elevated levels of early lymphocytes, eosinophils, and CD8 + T-cells, as well as reduced levels of IL-6 and IL-10. Similarly, Xu et al. [40] investigated 113 patients and identified factors such as male gender, advanced age, hypertension, delayed admission, invasive mechanical ventilation, corticosteroid therapy as being associated with delayed SARS-CoV-2 shedding. In addition, impaired immune status [41], diabetes mellitus [42], vaccination status [43], intermittent viral RNA shedding [44], chronic rhinosinusitis and atopy [45], and initial Ct value at admission [46] were all associated with delayed SARS-CoV-2 shedding. Our study aligns with some findings, including advanced age, male gender, diabetes, vaccination status, and Ct value at admission. Notably, most of the features we collected can be easily obtained through patient interviews and do not require additional diagnostic testing except nucleic acid testing. This facilitates rapid determination of the likelihood of delayed SARS-CoV-2 shedding and aids patients in assessing their condition. Moreover, employing the XGBoost machine learning algorithm for feature selection offers advantages over traditional algorithms. It automatically selects the best combination of features, incorporates regularization techniques to mitigate overfitting risks, and demonstrates high model robustness. Furthermore, our large sample size of 56,878 COVID-19 patients enables effective screening of the most relevant features associated with delayed SARS-CoV-2 shedding.

Vaccines play a crucial role in controlling COVID-19 epidemics, providing an effective and cost-efficient solution. Huang et al. [47] discovered that severely ill patients had prolonged viral shedding in lower respiratory tract samples compared to those with mild SARS-CoV-2 infections, indicating a potential correlation between vaccination, COVID-19 disease severity, and duration of SARS-CoV-2 shedding. In our study, both incompletely vaccinated and unvaccinated were identified as risk factors for prolonged SARS-CoV-2 infection. However, our XGBoost-based analysis did not assign high importance to vaccines as a feature, which could be attributed to previous studies suggesting that while vaccination facilitates the clearance of SARS-CoV-2 viral RNA, it has a limited impact on shortening the shedding duration in mildly ill patients. In contrast, our study cohort consisted of asymptomatic patients with mild illness [43].

There are several limitations in this study. Firstly, the RT-PCR assay used in this study only measured the viral RNA load, which makes it challenging to determine whether the detected viral material represents a viable virus capable of transmission or just residual RNA remnants [48]. Therefore, virus isolation experiments are necessary to confirm this. Secondly, Future research should include a broader range of rugged cases to understand SARS-CoV-2 viral shedding comprehensively. Thirdly, our study did not account for intermittent viral RNA shedding, Thus, future investigations should incorporate the analysis of intermittent viral RNA shedding to enhance our understanding of SARS-CoV-2 shedding.

Conclusion

In this paper, we developed a machine learning model by XGBoost for predicting patients with prolonged SARS-CoV-2 shedding and describing associated risk factors. According to our research, patients with lower admission Ct value, older age, male gender, unvaccinated or incompletely vaccinated, hypertension, and being accompanied during hospitalization were more likely to have longer duration of SARS-CoV-2 infection. This model allows accurate prediction of patients with prolonged SARS-CoV-2 shedding, which may contribute to precise response and guidance for outbreak protection during the outbreak, as well as to the judgment of patients and appropriate individual care.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

COVID-19:

Coronavirus disease 2019,

SARS-CoV-2:

Severe acute respiratory syndrome coronavirus 2 (previously 2019-nCoV)

RT-PCR:

Real-time polymerase chain reaction

XGBoost:

Extreme gradient boosting

SHAP:

Shapley additive explanations

Ct:

Cycle threshold

AUROC:

Receiver operating characteristic curve

References

  1. Umakanthan S, Sahu P, Ranade AV, Bukelo MM, Rao JS, Abrahao-Machado LF, et al. Origin, transmission, diagnosis and management of coronavirus disease COVID-19. Postgraduate Med J. 2019. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/postgradmedj-2020-138234.

    Article  Google Scholar 

  2. World Health Organization. WHO COVID-19 Dashboard. World Health Organization. 2020. https://covid19.who.int/ Accessed 19 April 2023.

  3. Puelles VG, Lütgehetmann M, Lindenmeyer MT, Sperhake JP, Wong MN, Allweiss L, et al. Multiorgan and Renal Tropism of SARS-CoV-2. N Engl J Med. 2020;383(6):590–2. https://doiorg.publicaciones.saludcastillayleon.es/10.1056/NEJMc2011400.

    Article  PubMed  Google Scholar 

  4. McGonagle D, O’Donnell JS, Sharif K, Emery P, Bridgewood C. Immune mechanisms of pulmonary intravascular coagulopathy in COVID-19 pneumonia. Lancet Rheumatol. 2020;2(7):e437–45. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/s2665-9913(20)30121-1.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Guan WJ, Liang WH, Zhao Y, Liang HR, Chen ZS, Li YM, et al. Comorbidity and its impact on 1590 patients with COVID-19 in China: a nationwide analysis. Eur Respira J. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1183/13993003.00547-2020.

    Article  Google Scholar 

  6. Gavriatopoulou M, Korompoki E, Fotiou D, Ntanasis-Stathopoulos I, Psaltopoulou T, Kastritis E, et al. Organ-specific manifestations of COVID-19 infection. Clin Exp Med. 2020;20(4):493–506. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10238-020-00648-x.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Nakamura ZM, Nash RP, Laughon SL, Rosenstein DL. Neuropsychiatric complications of COVID-19. Curr Psychiatry Rep. 2021;23(5):25. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11920-021-01237-9.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Lyudovyk O, Kim JY, Qualls D, Hwee MA, Lin YH, Boutemine SR, et al. Impaired humoral immunity is associated with prolonged COVID-19 despite robust CD8 T cell responses. Cancer Cell. 2022;40(7):738-53.e5. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ccell.2022.05.013.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Kemp SA, Collier DA, Datir RP, Ferreira I, Gayed S, Jahun A, et al. SARS-CoV-2 evolution during treatment of chronic infection. Nature. 2021;592(7853):277–82. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41586-021-03291-y.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. COVID-19 Treatment Guidelines Panel. Coronavirus Disease 2019 (COVID-19) Treatment Guidelines. National Institutes of Health. 2023. https://www.covid19treatmentguidelines.nih.gov/. Accessed 19 April 2023.

  11. Ma C, Wang L, Song D, Gao C, Jing L, Lu Y, et al. Multimodal-based machine learning strategy for accurate and non-invasive prediction of intramedullary glioma grade and mutation status of molecular markers: a retrospective study. BMC Med. 2023;21(1):198. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12916-023-02898-4.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Yu Y, Tran H. An XGBoost-based fitted q iteration for finding the optimal STI strategies for HIV patients. IEEE Trans Neural Netw Learn Syst. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/tnnls.2022.3176204.

    Article  PubMed  Google Scholar 

  13. Xiong Y, Ma Y, Ruan L, Li D, Lu C, Huang L. Comparing different machine learning techniques for predicting COVID-19 severity. Infect Dis Poverty. 2022;11(1):19. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40249-022-00946-4.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Domínguez-Olmedo JL, Gragera-Martínez Á, Mata J, Pachón ÁV. Machine learning applied to clinical laboratory data in spain for COVID-19 outcome prediction: model development and validation. J Med Internet Res. 2021;23(4): e26211. https://doiorg.publicaciones.saludcastillayleon.es/10.2196/26211.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Kim HJ, Han D, Kim JH, Kim D, Ha B, Seog W, et al. An easy-to-use machine learning model to predict the prognosis of patients with COVID-19: retrospective cohort study. J Med Internet Res. 2020;22(11): e24225. https://doiorg.publicaciones.saludcastillayleon.es/10.2196/24225.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002;38(4):367–78.

    Article  Google Scholar 

  17. Chen T, Guestrin C. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.

  18. Lin Y, Wu P, Tsang TK, Wong JY, Lau EHY, Yang B, et al. Viral kinetics of SARS-CoV-2 following onset of COVID-19 in symptomatic patients infected with the ancestral strain and omicron BA2 in Hong Kong: a retrospective observational study. Lancet Microbe. 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/s2666-5247(23)00146-5.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Liu W, Gong F, Zheng X, Pei L, Wang X, Yang S, et al. Factors associated with prolonged viral shedding of SARS-CoV-2 Omicron variant infection in Shanghai: a multicenter, retrospective, observational study. J Med Virol. 2023;95(12): e29342. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/jmv.29342.

    Article  CAS  PubMed  Google Scholar 

  20. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395(10229):1054–62. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/s0140-6736(20)30566-3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Sun K, Wang W, Gao L, Wang Y, Luo K, Ren L, et al. Transmission heterogeneities, kinetics, and controllability of SARS-CoV-2. Science. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.1126/science.abe2424.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Rhodes S, Beale S, Daniels S, Gittins M, Mueller W, McElvenny D, et al. Occupation and SARS-CoV-2 in Europe: a review. Eur respir Rev Off J Eur Respir Soc. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.1183/16000617.0044-2024.

    Article  Google Scholar 

  23. Zhang JJ, Dong X, Cao YY, Yuan YD, Yang YB, Yan YQ, et al. Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan. China Allergy. 2020;75(7):1730–41. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/all.14238.

    Article  CAS  PubMed  Google Scholar 

  24. Wagner A, Garner-Spitzer E, Jasinska J, Kollaritsch H, Stiasny K, Kundi M, et al. Age-related differences in humoral and cellular immune responses after primary immunisation: indications for stratified vaccination schedules. Sci Rep. 2018;8(1):9825. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41598-018-28111-8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Zimmermann P, Curtis N. Factors that influence the immune response to vaccination. Clin Microbiol Rev. 2019. https://doiorg.publicaciones.saludcastillayleon.es/10.1128/cmr.00084-18.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Lee PH, Tay WC, Sutjipto S, Fong SW, Ong SWX, Wei WE, et al. Associations of viral ribonucleic acid (RNA) shedding patterns with clinical illness and immune responses in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Clin Transl Immunol. 2020;9(7): e1160. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/cti2.1160.

    Article  CAS  Google Scholar 

  27. Smits SL, de Lang A, van den Brand JM, Leijten LM, Eijkemans IWF, et al. Exacerbated innate host response to SARS-CoV in aged non-human primates. PLoS Pathogens. 2010. https://doiorg.publicaciones.saludcastillayleon.es/10.1371/journal.ppat.1000756.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Merad M, Blish CA, Sallusto F, Iwasaki A. The immunology and immunopathology of COVID-19. Science. 2022;375(6585):1122–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1126/science.abm8108.

    Article  CAS  PubMed  Google Scholar 

  29. Gao C, Zhu L, Jin CC, Tong YX, Xiao AT, Zhang S. Proinflammatory cytokines are associated with prolonged viral RNA shedding in COVID-19 patients. Clin Immunol. 2020;221: 108611. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.clim.2020.108611.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zhang W, Zhou S, Wang G, Cao M, Sun D, Lu W, et al. Clinical predictors and RT-PCR profile of prolonged viral shedding in patients with SARS-CoV-2 Omicron variant in Shanghai: a retrospective observational study. Front Public Health. 2022;10:1015811. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpubh.2022.1015811.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Tom MR, Mina MJ. To interpret the SARS-CoV-2 test, consider the cycle threshold value. Clin Infect Dis Off Publ Infect Dis Soc Am. 2020;71(16):2252–4. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/cid/ciaa619.

    Article  CAS  Google Scholar 

  32. Singanayagam A, Patel M, Charlett A, Lopez Bernal J, Saliba V, Ellis J, et al. Duration of infectiousness and correlation with RT-PCR cycle threshold values in cases of COVID-19, England, January to May 2020. Eur Commun Dis Bullet. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.2807/1560-7917.Es.2020.25.32.2001483.

    Article  Google Scholar 

  33. Zheng S, Fan J, Yu F, Feng B, Lou B, Zou Q, et al. Viral load dynamics and disease severity in patients infected with SARS-CoV-2 in Zhejiang province, China, January-March 2020: retrospective cohort study. BMJ Clin Res. 2020;369: m1443. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj.m1443.

    Article  Google Scholar 

  34. Baldari CT, Onnis A, Andreano E, Del Giudice G, Rappuoli R. Emerging roles of SARS-CoV-2 Spike-ACE2 in immune evasion and pathogenesis. Trends Immunol. 2023;44(6):424–34. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.it.2023.04.001.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Channappanavar R, Fett C, Mack M, Ten Eyck PP, Meyerholz DK, Perlman S. Sex-based differences in susceptibility to severe acute respiratory syndrome coronavirus infection. J immunol. 2017;198(10):4046–53. https://doiorg.publicaciones.saludcastillayleon.es/10.4049/jimmunol.1601896.

    Article  CAS  PubMed  Google Scholar 

  36. Bartoloni E, Perricone C, Cafaro G, Gerli R. Hypertension and SARS-CoV-2 infection: is inflammation the missing link? Cardiovasc Res. 2020;116(13):e193–4. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/cvr/cvaa273.

    Article  CAS  PubMed  Google Scholar 

  37. Chu C, Schönbrunn A, Klemm K, von Baehr V, Krämer BK, Elitok S, et al. Impact of hypertension on long-term humoral and cellular response to SARS-CoV-2 infection. Front Immunol. 2022;13: 915001. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fimmu.2022.915001.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Segerstrom SC, Miller GE. Psychological stress and the human immune system: a meta-analytic study of 30 years of inquiry. Psychol Bull. 2004;130(4):601–30. https://doiorg.publicaciones.saludcastillayleon.es/10.1037/0033-2909.130.4.601.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Lin A, He ZB, Zhang S, Zhang JG, Zhang X, Yan WH. Early risk factors for the duration of severe acute respiratory syndrome coronavirus 2 viral positivity in patients with coronavirus disease 2019. Clin Infect Dis Off Publ Infect Dis Soc Am. 2020;71(16):2061–5. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/cid/ciaa490.

    Article  CAS  Google Scholar 

  40. Xu K, Chen Y, Yuan J, Yi P, Ding C, Wu W, et al. Factors associated with prolonged viral RNA shedding in patients with coronavirus disease 2019 (COVID-19). Clin Infect Diseases Off Publ Infect Dis Soc Am. 2020;71(15):799–806. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/cid/ciaa351.

    Article  CAS  Google Scholar 

  41. Zhu L, Gong N, Liu B, Lu X, Chen D, Chen S, et al. Coronavirus disease 2019 pneumonia in immunosuppressed renal transplant recipients: a summary of 10 confirmed cases in Wuhan. China Eur Urol. 2020;77(6):748–54. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.eururo.2020.03.039.

    Article  CAS  PubMed  Google Scholar 

  42. Buetti N, Trimboli P, Mazzuchelli T, Lo Priore E, Balmelli C, Trkola A, et al. Diabetes mellitus is a risk factor for prolonged SARS-CoV-2 viral shedding in lower respiratory tract samples of critically ill patients. Endocrine. 2020;70(3):454–60. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s12020-020-02465-4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Tian X, Zhang Y, Wang W, Fang F, Zhang W, Zhu Z, et al. The impacts of vaccination status and host factors during early infection on SARS-CoV-2 persistence:a retrospective single-center cohort study. Int Immunopharmacol. 2023;114: 109534. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.intimp.2022.109534.

    Article  CAS  PubMed  Google Scholar 

  44. Lan L, Xu D, Ye G, Xia C, Wang S, Li Y, et al. Positive RT-PCR test results in patients recovered from COVID-19. JAMA. 2020;323(15):1502–3. https://doiorg.publicaciones.saludcastillayleon.es/10.1001/jama.2020.2783.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Recalde-Zamacona B, Tomás-Velázquez A, Campo A, Satrústegui-Alzugaray B, Fernández-Alonso M, Iñigo M, et al. Chronic rhinosinusitis is associated with prolonged SARS-CoV-2 RNA shedding in upper respiratory tract samples: a case-control study. J Intern Med. 2021;289(6):921–5. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/joim.13237.

    Article  CAS  PubMed  Google Scholar 

  46. Qi L, Yang Y, Jiang D, Tu C, Wan L, Chen X, et al. Factors associated with the duration of viral shedding in adults with COVID-19 outside of Wuhan, China: a retrospective cohort study. Int J Infect Dis Off Publ Int Soc Infect Dis. 2020;96:531–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ijid.2020.05.045.

    Article  CAS  Google Scholar 

  47. Huang Y, Chen S, Yang Z, Guan W, Liu D, Lin Z, et al. SARS-CoV-2 viral load in clinical samples from critically Ill patients. Am J Respir Crit Care Med. 2020;201(11):1435–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1164/rccm.202003-0572LE.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Puhach O, Meyer B, Eckerle I. SARS-CoV-2 viral load and shedding kinetics. Nat Rev Microbiol. 2023;21(3):147–61. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41579-022-00822-w.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work is funded by the National Natural Science Foundation of China (No. 62171114), the Fundamental Research Funds for the Central Universities (No. DUT22RC(3)099).

Funding

This work is funded by the National Natural Science Foundation of China (No. 62171114), the Fundamental Research Funds for the Central Universities (No. DUT22RC(3)099).

Author information

Authors and Affiliations

Authors

Contributions

YZ and QL designed the study and contributed consistently to this study. YZ, QL and JC analyzed the data and interpreted the results. YZ wrote the first draft of the manuscript. JC, HD, LT, and YC organized patient data and revised the manuscript. All authors reviewed the final manuscript.

Corresponding author

Correspondence to Junxin Chen.

Ethics declarations

Ethics approval and consent to participate

The Ethics Committee approved the protocol of this study at Southwest Hospital of Army Medicine University (Approval No: (B) KY2023040).

Consent for publication

All patients agree to publish results that do not contain personally identifiable information.

Competing interests

The authors have no financial/commercial conflicts of interest regarding the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Li, Q., Duan, H. et al. Machine learning based predictive modeling and risk factors for prolonged SARS-CoV-2 shedding. J Transl Med 22, 1054 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12967-024-05872-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12967-024-05872-7

Keywords