Causality in Digital Medicine

Ben Glocker

Ben Glocker (an expert in machine learning for medical imaging, Imperial College London), Mirco Musolesi (a data science and digital health expert, University College London), Jonathan Richens (an expert in diagnostic machine learning models, Babylon Health) and Caroline Uhler (a computational biology expert, MIT) talked to Nature Communications about their research interests in causality inference and how this can provide a robust framework for digital medicine studies and their implementation, across different fields of application.

What is causality and how do causality and digital medicine interact in your field?

Ben Glocker: Causality is concerned with the modelling of the underlying cause-effect relationships in the data that we wish to analyze. Here, the language of causal reasoning allows us to formalize our knowledge about these relationships and any assumptions that we make regarding the so called data generating process which includes aspects of data acquisition, data collection, and data annotation. A detailed, causal description of the data generating process can be used to illustrate how the data has been generated and what factors influence the specific characteristics of a study sample. For example, using causal diagrams we can explicitly communicate what factors of variations affect the study population, the acquisition procedures, the annotation policies, or the inclusion/exclusion criteria. It is important to model and communicate the data generating process as this allows us to identify potential shortcomings, limitations and biases in our data which may impact the generalizability or even the validity of the conclusions we draw from statistical data analyses. In the field of machine learning for medical imaging, we often aim to build statistical models that take medical scans (and possibly other information) as inputs in order to make predictions about a patient’s disease status, the presence of pathology, or the effectiveness of treatment. Here, the underlying causal relationships between the inputs and outputs can have profound implications on the types of machine learning strategies we may want to employ. Further, we may be interested in identifying previously unknown causal relationships, for example, between imaging biomarkers and the efficacy of therapeutic interventions.

Mirco Musolesi: Providing a definition is very hard, since in my opinion, the concept of causality per se is deeply philosophical. I would define causality in very practical terms, also given my own work and background, and say that causality analysis allows us to answer cause-effect questions starting from real-world data. As far as digital medicine is concerned, causality analysis allows us to operationalise our analytical findings in a sense, because it literally enables us to use data to make informed choices. A possible example is the choice of the right therapies and interventions given the existing conditions and the external context. In fact, causal analysis allows us to understand the endogenous and exogenous factors that might have an impact, for instance, on a certain behaviour or medical outcome. It underpins our reasoning and it is of fundamental importance for evidence-based decision making. It is not sufficient to collect data, possibly in real-time and from a large population using digital technologies; interpreting the data from a causal point of view is essential to take informed action. The actual "feedback loop" might be implemented through the same digital technologies. This reasoning is true for situations involving individuals, but also for public health policies and interventions, like those that have been adopted by governments and local authorities during the current covid-19 pandemic.

I have been working in the area of real-time monitoring of physical and mental health using mobile sensing and through the collection of real-time data (e.g., from social media). I am interested in applying causal methods to this class of datasets also for understanding and planning effective feedback systems. Most of the existing work is based on correlation; as I said, deriving causality relationships from these datasets is fundamental for deriving actionable insights.

Jonathan Richens: Many of the routine questions that arise in clinical practice, such as “what treatment should I recommend?” or “why is the patient experiencing these symptoms?”, are fundamentally questions about cause and effect. Causality is a field of research that tells us how to answer these types of questions, and what assumptions and resources are required to do so. For example, one of the key tasks in digital healthcare is generating individualised care plans. This involves tailoring a sequence of decisions to a single patient, steering them towards the desired health outcomes, which in turn requires estimating the causal effect that each decision will ultimately have on the patient. Randomised control trials are the gold standard for establishing these cause-effect relations, but there are many situations where randomising these decisions would be unethical, unscalable or overly disruptive to the patient. So instead we typically have to work with observational data sets such as electronic health records, which only capture associations rather than bona fide causation. We use causal inference methods to bridge this gap and answer these causal questions, using observational data along with modelling assumptions. While this is the most studied application, causality has deep roots in clinical decision making that go beyond estimating treatment effects. For example, diagnostic reasoning involves generating and testing hypotheses for the most likely underlying cause of a patient’s symptoms. So even this textbook clinical decision problem is in fact a causal inference problem in disguise.

Caroline Uhler: Important questions in the biomedical sciences are inherently causal: which genes regulate one another? How does an intervention/perturbation (e.g. drug, over-expression, or knockout) affect the expression of all genes? And which intervention could move the system from a diseased state back to the normal state? Causal relationships between nodes, such as genes, can be represented by a directed network, where a directed edge from node 1 to node 2 means that node 1 directly regulates node 2 and thus perturbing node 1 changes the value of node 2. The biomedical sciences have genetic and chemical tools that allow perturbational screens on a scale that is unmatched by other fields. These features make the biomedical sciences uniquely suited to being not only one of the greatest beneficiaries of methods in causality, but also one of the greatest sources of inspiration for the field of causality.

For more, please click here.

LIDS professor Caroline Uhler
LIDS professor Caroline Uhler
Ben Glocker