[BH 5/n] Argh... Just because we repeat Correlation does not imply Causation does not mean there isn't Causation!

This is a bit of a rant.

I’ve memories of a senior manager shooting ideas down saying

“correlation is not causation, I’ve done Stats at Uni and can prove anything is related to baked beans”

I grated sooooo much.

Firstly as it was thoughtless rhetoric, either purposefully or accidentally steam rolling ideas. Immediately dismissing any attempts at constructive data driven decisions.

Secondly it grated because I didn’t have the tools to show causation. I knew regression is not proof, but what other tools are there??

In studying Data Science and AI I’m getting into the scientific tools that can help that. Unfortunately the module I’m doing now (at least the first half of it) is focused on regression and identifying correlation. I’m frustrated and, whilst using Claude to arrange my notes, I asked it to give me some links for the “next level” of ML that looks at causation.

Here’s what it gave me, not yet checked, but here for reference - and I’ll move back into the world of regression (though thankfully probability is on the cards next few weeks).

🚀 Next Level: Causal Inference & Advanced Data Science

Your roadmap to understanding causation, not just correlation!


📚 Foundation Books (Start Here!)

🎯 For Beginners

  • Causal Inference in Statistics: A Primer by Judea Pearl, Madelyn Glymour, Nicholas P. Jewell

    • Perfect first book - only 125 pages!
    • Uses simple examples and plain language
    • Available on Amazon
  • Causal Inference: The Mixtape by Scott Cunningham

    • FREE online version! 📖
    • Hands-on with R and Stata code
    • Real-world examples and engaging writing
    • Perfect bridge between theory and practice

🧠 For Deep Dive


🎓 Online Courses

Coursera Offerings

Free Resources


💻 Python Libraries & Tools

Microsoft’s DoWhyIndustry Standard

PyWhy Ecosystem (Microsoft + Amazon collaboration)

Getting Started Code

``` python

Install DoWhy

pip install dowhy

Basic usage

import dowhy model = dowhy.CausalModel(data, treatment, outcome, graph) estimate = model.estimate_effect() model.refute_estimate(estimate) # Test assumptions! ````


🔬 Key Techniques You’ll Learn

The Big 4 Methods

  1. Randomized Controlled Trials (RCTs) - Gold standard
  2. Instrumental Variables - Find natural randomization
  3. Difference-in-Differences - Before/after + treatment/control
  4. Regression Discontinuity - Exploit arbitrary cutoffs

Advanced Methods

  • Propensity Score Matching - Balance treatment groups
  • Synthetic Control - Create artificial control groups
  • Causal Trees/Forests - ML meets causation
  • Double Machine Learning - Use ML to eliminate confounders

Modern Causal ML

  • Heterogeneous Treatment Effects - Who benefits most?
  • Causal Discovery - Find causal structure from data
  • Counterfactual Reasoning - “What if?” analysis

📖 Academic Papers & Tutorials

Essential Reads

Tutorial Collections


🏢 Industry Applications

Companies Using Causal AI

  • Microsoft - DoWhy, recommendation systems
  • Amazon - PyWhy ecosystem
  • Uber - CausalML, pricing optimization
  • Spotify - Content recommendations
  • McKinsey - Business strategy consulting

Use Cases

  • A/B Testing - Beyond simple experiments
  • Marketing Attribution - Which ads actually work?
  • Policy Evaluation - Does this program help?
  • Root Cause Analysis - Why did sales drop?
  • Personalization - Who should get which treatment?

🎯 Your Learning Path

Phase 1: Foundation (2-4 weeks)

  1. Read Pearl’s “Causal Inference in Statistics: A Primer”
  2. Take Penn’s “Crash Course in Causality” on Coursera
  3. Install and try DoWhy with their tutorials

Phase 2: Practice (4-6 weeks)

  1. Work through “Causal Inference: The Mixtape”
  2. Complete the DataCamp DoWhy tutorial
  3. Try Stanford’s R tutorials

Phase 3: Advanced (2-3 months)

  1. Deep dive into Pearl’s “Causality” book
  2. Explore PyWhy ecosystem (EconML, causal-learn)
  3. Read recent papers on causal ML
  4. Apply to your own data problems!

🤝 Community & Support

  • DoWhy GitHub Discussions: Ask questions, share projects
  • Brady Neal’s Causal Course Slack: Active community
  • PyWhy Discord: Regular office hours and community calls
  • Twitter/X: Follow @causalinf for latest research

🚀 Ready to move beyond correlation and discover the secrets of causation? Start with the Penn Coursera course and DoWhy - you’ll be amazed at what’s possible!

Learning Being Human Being Human Series