Advancing causal inference in medicine using biobank data

Abstract

Causal inference from observational medical record data is critical for advancing precision and personalization in healthcare. Recently, biobanks, collections of biological samples linked with genetic, lifestyle, environmental, and health-related data, have emerged as valuable resources for large scale population studies. By integrating these resources, biobanks offer a harmonized repository of diverse data for each individual, capturing real world medical events, including procedures, treatments, and diagnoses. However, these resources are often affected by confounding factors, selection biases, and missing information, posing significant challenges to drawing valid causal conclusions. While randomized controlled trials (RCTs) remain the gold standard for drug development and medical decision making, the growing availability of observational data highlights the need for robust causal inference methodologies. This study provides an overview of methods for inferring the effect of a treatment on an outcome from observational data applicable to biobank data, focusing on the unique challenges they address. Our objective is to introduce current methods used for causal discovery in observational medical data. We discuss classic and modern methodologies that offer significant opportunities alongside the difficulty in reaching causality. We cover statistical methods designed for largescale biobanks that have the potential to improve clinical decision-making, guide public health policies, and drive further research.

Publication
In J. Biomed. Informatics 171