High-throughput sequencing is used to measure a variety of human physiological characteristics including an individual’s genome, gene expression, and microbiome composition. Sequencing data is commonly analyzed and shared as a matrix of measurement values for a set of samples. Researchers reported that microbiome data could be used as unique identifier in a large population study. We approach the privacy concerns of sharing microbiome data from the perspective of computing on count matrices. We present work using a secure multi-party computation library to perform metagenomic association studies without sharing human microbiome community count matrix entries directly. We detail our benchmarking results of the run-time and usage of computing resources.
We also propose solutions to encompass other aspects of sequencing-based data analysis pipelines including building visualizations without sharing individual counts.