A Zero-Inflated Weighted Distribution Mixture Model for Spatial Compositional Data in Ecology

Research Poster Physical Sciences & Mathematics 2025 Graduate Exhibition

Presentation by Gerald Brown

Exhibition Number 130

Abstract

The Dirichlet distribution is used to model compositional data primarily due to the fact that it handles the constraint of values summing to one. There are limitations to using this distribution. First, it does not account for spatial dependencies. Secondly, the distribution does not allow for zero-values. We propose a spatial, zero-inflated Dirichlet model that overcomes these limitations. Our method involves setting the Dirichlet shape parameter as a function of a Gaussian Mixture Model (GMM) and subsequently treats zero-values and the rest of the data as separate terms in the likelihood function. We also incorporate covariates of interest into the model. To estimate model parameters, Bayesian inference is conducted via segmented Markov Chain Monte Carlo (MCMC), incorporating the Log Adaptive Proposal and a Dirichlet Process Prior on GMM weight parameters. We finally apply our model to an eBird dataset of the spatial distributions of mallards across North America over a number of weeks. The latter dataset exhibits strong spatial correlation, with patches of land containing high proportions of birds and others containing small proportions, and they include zero-values. We use our model to accurately infer parameters to be used to fit given zero-inflated data as well as to simulate data that is very similar to the dataset. Lastly, our inference on the mallards dataset provides findings about land features that are indicative of higher mallard populations by incorporating these features as covariates in our model.

Importance

Modeling spatial, compositional data with zero-values can be of importance for conservation efforts of animal species, as it can reveal relationships between the landscape and the observed spatial distribution of the species. To the best of our knowledge, there are currently no methods that can model such data. This work introduces a model for zero-inflated, spatial-compositional data with an explicit model for the selection of preferential landscape features. We apply the model to AdaStem output of mallard relative abundance across North America. Results suggest that our model is reliable and can be extended to data for other species.

Comments