Getting Started Learning Target Trial Emulation
1. Learn Causal Diagrams / Directed Acyclic Graphs (DAGs)
The core of target trial emulation (TTE) is the research question and to identify the study assumptions required to answer it. Here directed acyclic graphs (DAGs) are a key tool. This helps you identify confounding, and the adjustment sets required to obtain exchangeability (that is, emulate randomization).
I suggest starting with the free course: Causal Diagrams: Draw You Assumptions Before Your Conclusions, by Miguel Hernan and Harvard.
2. Learn to use DAGitty
Drawing DAGs and identifying open backdoor paths (confounding) and colliders can be challenging. Luckily, free tools exist that can help you do this automatically. One such tool is DAGitty.
There are several DAGitty tutorials on youtube, one suggestion could be this one.
3. Read the “What if” Book
Causal Inference: What if is written by Miguel Hernan and James Robins. It is a somewhat long read, but I really encourage you to take up the challenge! This book is a must read if you want to master the art of target trial emulation. Read one chapter for breakfast each morning and you will soon be a TTE master. The book is freely available here.
4. Read PRINCIPLED
The good people behind the RCT-DUPLICATE initiative have made a great guide on how to use healthcare registry data and claims data to generate decision grade evidence using TTE. This is great to have in hand when designing your next study. Their paper can be found here: PRINCIPLED.
5. Trial Design
Active comparator new user design (ACNU) or sequential nested trial design?
- If a clear time-zero exists for both the treatment and comparator arm - use the ACNU. The ACNU design will help you reduce bias from confounding by indication (Yoshida, Solomon, and Kim (2015)).
- If no clear time-zero exists for one of the arms - use the sequential nested trial design (Hernán et al. (2008), Hernán et al. (2016)).
6. Learn the Statistical Methods Required
I cannot state enough that the statistical methods should be your last focus point. These methods will not be able to help you obtain causal estimates if your TTE design is flawed.
Commonly used in TTE and causal inference are the g-methods and preferably in combination via targeted learning. Don’t worry which of these methods to use, just use targeted learning - this is the most unbiased method for estimating your causal estimands.
This stuff can feel overwhelming and complicated, but rest assured that you only need an intuitive understanding of how these methods work. Great implementations have already been made readily available in R for you to use.
First get to understand inverse probability weighting and the g-formula. Then read about targeted learning and use that!
- Introduction to g-methods
- Introduction to targeted learning / TMLE
- “Machine Learning in Causal Inference for Epidemiology”: Paper
Software in R:
- polle package
- lmtp package
- ltmle package
And please don’t use the cox proportional hazards model! (Hernán (2010))
Good luck! - And feel free to reach out if you would like assistance, I would be happy to collaborate!