[BH 5/n] Argh... Just because we repeat Correlation does not imply Causation does not mean there isn't Causation!
This is a bit of a rant.
I’ve memories of a senior manager shooting ideas down saying
“correlation is not causation, I’ve done Stats at Uni and can prove anything is related to baked beans”
I grated sooooo much.
Firstly as it was thoughtless rhetoric, either purposefully or accidentally steam rolling ideas. Immediately dismissing any attempts at constructive data driven decisions.
Secondly it grated because I didn’t have the tools to show causation. I knew regression is not proof, but what other tools are there??
In studying Data Science and AI I’m getting into the scientific tools that can help that. Unfortunately the module I’m doing now (at least the first half of it) is focused on regression and identifying correlation. I’m frustrated and, whilst using Claude to arrange my notes, I asked it to give me some links for the “next level” of ML that looks at causation.
Here’s what it gave me, not yet checked, but here for reference - and I’ll move back into the world of regression (though thankfully probability is on the cards next few weeks).
🚀 Next Level: Causal Inference & Advanced Data Science
Your roadmap to understanding causation, not just correlation!
📚 Foundation Books (Start Here!)
🎯 For Beginners
-
Causal Inference in Statistics: A Primer by Judea Pearl, Madelyn Glymour, Nicholas P. Jewell
- Perfect first book - only 125 pages!
- Uses simple examples and plain language
- Available on Amazon
-
Causal Inference: The Mixtape by Scott Cunningham
- FREE online version! 📖
- Hands-on with R and Stata code
- Real-world examples and engaging writing
- Perfect bridge between theory and practice
🧠 For Deep Dive
- Causality: Models, Reasoning, and Inference by Judea Pearl (2009)
- The foundational text - more technical
- Introduced Structural Causal Models (SCM)
- Check Judea Pearl’s homepage: bayes.cs.ucla.edu
🎓 Online Courses
Coursera Offerings
-
A Crash Course in Causality (University of Pennsylvania) ⭐ Most Popular
- 5 weeks, ~18 hours total
- Hands-on R programming
- Covers matching, instrumental variables, propensity scores
-
Causal Inference (Columbia University)
- Master’s level rigor
- Mathematical survey approach
- Follow-up: Causal Inference 2
-
Essential Causal Inference Techniques for Data Science
- 2-hour guided project
- Practical R implementation
- Perfect for busy professionals
Free Resources
-
Brady Neal’s Causal Inference Course 🆓
- Free online course with draft textbook
- Machine learning perspective
- Active community Slack workspace
-
Stanford’s Machine Learning & Causal Inference Short Course
- R Markdown tutorials you can download
- Causal trees and forests
- Advanced ML applications
💻 Python Libraries & Tools
Microsoft’s DoWhy ⭐ Industry Standard
- GitHub: github.com/py-why/dowhy
- Documentation: py-why.github.io/dowhy
- Tutorial: Microsoft Research Blog
- DataCamp Tutorial: Intro to Causal AI Using DoWhy
PyWhy Ecosystem (Microsoft + Amazon collaboration)
- Main Hub: github.com/py-why
- EconML: github.com/py-why/EconML - Heterogeneous treatment effects
- CausalML: Advanced causal ML algorithms
- causal-learn: Causal discovery algorithms
Getting Started Code
```
python
Install DoWhy
pip install dowhy
Basic usage
import dowhy
model = dowhy.CausalModel(data, treatment, outcome, graph)
estimate = model.estimate_effect()
model.refute_estimate(estimate) # Test assumptions!
````
🔬 Key Techniques You’ll Learn
The Big 4 Methods
- Randomized Controlled Trials (RCTs) - Gold standard
- Instrumental Variables - Find natural randomization
- Difference-in-Differences - Before/after + treatment/control
- Regression Discontinuity - Exploit arbitrary cutoffs
Advanced Methods
- Propensity Score Matching - Balance treatment groups
- Synthetic Control - Create artificial control groups
- Causal Trees/Forests - ML meets causation
- Double Machine Learning - Use ML to eliminate confounders
Modern Causal ML
- Heterogeneous Treatment Effects - Who benefits most?
- Causal Discovery - Find causal structure from data
- Counterfactual Reasoning - “What if?” analysis
📖 Academic Papers & Tutorials
Essential Reads
- Pearl’s Introduction to Causal Inference - Free PMC article
- The Foundations of Causal Inference - Judea Pearl (2010)
Tutorial Collections
🏢 Industry Applications
Companies Using Causal AI
- Microsoft - DoWhy, recommendation systems
- Amazon - PyWhy ecosystem
- Uber - CausalML, pricing optimization
- Spotify - Content recommendations
- McKinsey - Business strategy consulting
Use Cases
- A/B Testing - Beyond simple experiments
- Marketing Attribution - Which ads actually work?
- Policy Evaluation - Does this program help?
- Root Cause Analysis - Why did sales drop?
- Personalization - Who should get which treatment?
🎯 Your Learning Path
Phase 1: Foundation (2-4 weeks)
- Read Pearl’s “Causal Inference in Statistics: A Primer”
- Take Penn’s “Crash Course in Causality” on Coursera
- Install and try DoWhy with their tutorials
Phase 2: Practice (4-6 weeks)
- Work through “Causal Inference: The Mixtape”
- Complete the DataCamp DoWhy tutorial
- Try Stanford’s R tutorials
Phase 3: Advanced (2-3 months)
- Deep dive into Pearl’s “Causality” book
- Explore PyWhy ecosystem (EconML, causal-learn)
- Read recent papers on causal ML
- Apply to your own data problems!
🤝 Community & Support
- DoWhy GitHub Discussions: Ask questions, share projects
- Brady Neal’s Causal Course Slack: Active community
- PyWhy Discord: Regular office hours and community calls
- Twitter/X: Follow @causalinf for latest research
🚀 Ready to move beyond correlation and discover the secrets of causation? Start with the Penn Coursera course and DoWhy - you’ll be amazed at what’s possible!