Efficient Bayesian Additive Regression Models For Microbiome Studies
Research Poster Health & Life Sciences 2025 Graduate ExhibitionPresentation by Tinghua Chen
Exhibition Number 2
Abstract
Analyzing sequence count data, such as in microbiome or gene expression studies, poses a challenge. The difficulty lies in estimating the linear and non-linear effects of covariates on the composition of microbial taxa or gene expression. Bayesian multinomial logistic-normal (MLN) models have gained popularity for analyzing these data due to their ability to account for the count's compositional nature. Yet, inferring MLN models can be computationally intractable. Recently, we developed an accurate and computationally efficient particle filter with a marginal Laplace approximation for inferring MLN models that incorporate a marginally latent matrix-t process (MLTP) form. In this work, we introduce a flexible family of Bayesian MLN Additive Gaussian process regression models and demonstrate that they possess the MLTP form. In addition to developing the sampler for this family of models, we extend our previous work by developing efficient maximum marginal likelihood estimation techniques for the models' hyperparameters. We demonstrate the efficiency and utility of these models in estimating linear and non-linear effects through analyses of both real and simulated sequence count data.
Importance
Scientists studying microbiomes or gene expression rely on data that track different types of microbes or genes over time. Understanding how they change under different conditions—such as diet, disease, or treatment—is crucial for advancing medicine and biology. However, determining how various factors influence microbial communities or gene activity is challenging, as these effects can be complex and interconnected over time. Previous studies have struggled to capture these relationships, lacked the ability to analyze time-dependent data, or were not computationally efficient. Our study develops an efficient, flexible, and interpretable method to analyze patterns from various factors, helping scientists gain insights that support better treatments and biomarker discovery.