Fig. 1

Microbiome-metabolome machine learning for cross-disease predictions in GC. a Fecal microbiome and metabolome data from GC patients (orange) and healthy individuals(green) obtained from Erawijantari et al. b Data preprocessing workflow highlighting the key microbes, metabolites, and samples selected for machine learning, alongside a principal coordinates analysis (PCoA) plot used for outlier removal. c The receiver operator curve ā area under the curve (ROC-AUC) for microbiome and metabolome data across models: XGBoost (blue), Random Forest (green), and LASSO (red). Bar graph showing the best-performing model (microbiome-Random Forest, metabolome-LASSO) based on the highest AUC-ROC score, highlighting the optimal number of features. The selection includes 6 microbial and 8 metabolite features identified through Spearman cluster map analysis. d Validation performance metrics of the optimal features depicted by bar plots for microbiome and metabolome analysis were evaluated using the microbiome dataset from Jaeyun Sung et al. and the metabolome dataset from the UKBB. e Alpha diversity for microbes was visualised with violin plots comparing healthy and GC patients using the Shannon and Gini-Simpson indices. FDR-corrected p-values (p <ā0.05) showed significant differences within both groups. Beta diversity was evaluated using non-metric multidimensional scaling (NMDS) based on Jaccard distances, with the stress value confirming statistical significance between healthy and diseased patients. f Circular bar plots illustrate the performance scores of the three models trained using combined microbiome and metabolome data from GC patients. Key biomarkers from the GC dataset were identified in the IBD and CRC datasets. GC-trained models were applied to predict IBD and CRC outcomes respectively