Data from sequencing bacterial communities are formalized as contingency tables whose columns correspond to different
biological sample-specimens.
The row-features are a random collection with of Amplicon Sequence Variants (ASVs in the case of 16S rRNA type amplicon sequencing) or
gene fragments (in the case of metagenomics). In both cases, these entities are defined after the data are collected, thus imposing a nonparametric
framework. There are usually more features-rows than columns imposing necessary regularization through use of Bayesian priors.
However the classical Dirichlet-multinomial models are insufficient to account for the strong associations (or exclusions) between certain bacteria,
thus recent hierarchical models such as latent Dirichlet topic models have provided a more flexible framework that allow mixed membership models
more appropriate for these non-Gaussian data.
We will show how these hierarchical topic models can enhance our understanding of both longitudinal dependencies between samples and
biological dependencies between taxa, regardless of the differences in sampling depth and sources of variability.
This is talk contains joint work with Kris Sankaran, Pratheepa Jenganathan and David Relman's group at Stanford.