Noecker, Cecilia; Hsuan-Chao Chiu; Colin McNally and Elhanan Borenstein
Correlation-based analysis of paired microbiome-metabolome data sets is becoming a widespread research approach, aiming to comprehensively identify microbial drivers of metabolic variation. To date, however, the limitations of this approach and other microbiome-metabolome analysis methods have not been comprehensively evaluated. To address this challenge, we have introduced a mathematical framework to quantify the contribution of each taxon to metabolite variation based on uptake and secretion fluxes. We additionally used a multispecies metabolic model to simulate simplified gut communities, generating idealized microbiome-metabolome data sets. We then compared observed taxon-metabolite correlations in these data sets to calculated ground truth taxonomic contribution values. We found that in simulations of both a representative simple 10-species community and complex human gut microbiota, correlation-based analysis poorly identified key contributors, with an extremely low predictive value despite the idealized setting. We further demonstrate that the predictive value of correlation analysis is strongly influenced by both metabolite and taxon properties, as well as by exogenous environmental variation. We finally discuss the practical implications of our findings for interpreting microbiome-metabolome studies. IMPORTANCE Identifying the key microbial taxa responsible for metabolic differences between microbiomes is an important step toward understanding and manipulating microbiome metabolism. To achieve this goal, researchers commonly conduct microbiome-metabolome association studies, comprehensively measuring both the composition of species and the concentration of metabolites across a set of microbial community samples and then testing for correlations between microbes and metabolites. Here, we evaluated the utility of this general approach by first developing a rigorous mathematical definition of the contribution of each microbial taxon to metabolite variation and then examining these contributions in simulated data sets of microbial community metabolism. We found that standard correlation-based analysis of our simulated microbiome-metabolome data sets can identify true contributions with very low predictive value and that its performance depends strongly on specific properties of both metabolites and microbes, as well as on those of the surrounding environment. Combined, our findings can guide future interpretation and validation of microbiome-metabolome studies.