Statistics Term Sheet

Core Measures

Mean (μ or x̄)

Definition: Average value of a dataset Formula:

  • Population: μ = Σxᵢ / N
  • Sample: x̄ = Σxᵢ / n Purpose: Central tendency measure Example: For data [4, 8, 13, 7], mean = 32/4 = 8

Variance (σ² or s²)

Definition: Average squared distance from the mean Formula:

  • Population: σ² = Σ(xᵢ-μ)² / N
  • Sample: s² = Σ(xᵢ-x̄)² / (n-1) Units: Original units squared Purpose: Measures spread/dispersion of data Example: If data varies ±3 from mean, variance ≈ 9

Standard Deviation (σ or s)

Definition: Square root of variance Formula:

  • Population: σ = √(σ²)
  • Sample: s = √(s²) Units: Same as original data Purpose: Interpretable measure of spread Relationship: σ = √(variance)

Relationships Between Variables

Covariance (Cov(X,Y))

Definition: Measures how two variables change together Formula: Cov(X,Y) = Σ(xᵢ-x̄)(yᵢ-ȳ) / (n-1) Units: Units of X × Units of Y Range: -∞ to +∞ Interpretation:

  • Positive: Variables increase together
  • Negative: One increases as other decreases
  • Zero: No linear relationship

Correlation (r or ρ)

Definition: Standardized measure of linear relationship Formula: r = Cov(X,Y) / (σₓ × σᵧ) Units: Dimensionless Range: -1 to +1 Interpretation:

  • +1: Perfect positive linear relationship
  • -1: Perfect negative linear relationship
  • 0: No linear relationship
  • ±0.7: Strong relationship
  • ±0.3: Weak relationship

Standardization

Z-Scores

Definition: Number of standard deviations from the mean Formula: Z = (x - μ) / σ Units: Standard deviations Range: Typically -3 to +3 Purpose:

  • Standardize different scales
  • Compare across datasets
  • Identify outliers (|Z| > 2 or 3) Interpretation: Z = 2 means “2 standard deviations above mean”

Covariance Matrices

Definition: Square matrix containing variances and covariances of multiple variables Structure:

  • Size: n×n matrix for n variables
  • Symmetric: Cov(Xᵢ,Xⱼ) = Cov(Xⱼ,Xᵢ)
  • Positive semi-definite: All eigenvalues ≥ 0 Formula: For variables X₁, X₂, …, Xₙ
     [Var(X₁)   Cov(X₁,X₂) ... Cov(X₁,Xₙ)]
S =  [Cov(X₂,X₁)  Var(X₂)  ... Cov(X₂,Xₙ)]
     [    ...        ...    ...     ...   ]
     [Cov(Xₙ,X₁) Cov(Xₙ,X₂) ...  Var(Xₙ) ]

Uses:

  • PCA analysis
  • Multivariate statistics
  • Portfolio risk analysis
  • Machine learning feature relationships

Matrix Elements

Diagonal Elements

Definition: Elements where row index = column index In Covariance Matrix: Cov(Xᵢ,Xᵢ) = Var(Xᵢ) In Correlation Matrix: Corr(Xᵢ,Xᵢ) = 1 Purpose: Self-relationships (variances or perfect correlation) Example: In 2×2 matrix, positions (1,1) and (2,2)

Off-Diagonal Elements

Definition: Elements where row index ≠ column index In Covariance Matrix: Cov(Xᵢ,Xⱼ) where i≠j In Correlation Matrix: Corr(Xᵢ,Xⱼ) where i≠j Purpose: Cross-relationships between different variables Example: In 2×2 matrix, positions (1,2) and (2,1)

Linear Algebra Concepts

Eigenvectors (v)

Definition: “Special directions” where the matrix only stretches/shrinks the vector, never rotates it Mathematical: Solutions to Av = λv Properties:

  • Direction vectors that remain unchanged under matrix transformation
  • Only magnitude changes, not direction
  • Normalized to unit length (||v|| = 1) In PCA: Point in directions of maximum/minimum variance

Eigenvalues (λ)

Definition: “How much stretching/shrinking” happens in each special direction Mathematical: Scalar values that satisfy Av = λv Interpretation:

  • λ > 1: Vector gets stretched (amplified)
  • 0 < λ < 1: Vector gets shrunk
  • λ < 0: Vector gets flipped and scaled
  • λ = 0: Vector collapses to zero In PCA: Measure amount of variance in each principal direction

Eigenvalue-Eigenvector Relationship

Fundamental Equation: Av = λv Process:

  1. Find eigenvalues by solving: det(A - λI) = 0
  2. For each λ, find eigenvector by solving: (A - λI)v = 0 Result: Each eigenvalue has corresponding eigenvector In PCA: Largest eigenvalue gives first principal component direction Key Insight: The “natural coordinate system” is simply the data’s preferred viewing angle - the orientation that captures maximum variance with minimum dimensions.

Mathematical Foundation

THEOREM (provable): “Eigenvectors of symmetric matrices corresponding to distinct eigenvalues are orthogonal”

This theorem is part of the Spectral Theorem for Symmetric Matrices, which states:

  • Every symmetric matrix can be diagonalized by orthogonal eigenvectors
  • This is a fundamental result in linear algebra

Implication for PCA: Since covariance matrices are always symmetric, orthogonal principal components are mathematically guaranteed.

Matrix Examples

2×2 Covariance Matrix

[Var(X)    Cov(X,Y)]  ← Diagonal: variances
[Cov(X,Y)  Var(Y)  ]  ← Off-diagonal: covariances

2×2 Correlation Matrix

[1      Corr(X,Y)]  ← Diagonal: always 1
[Corr(X,Y)    1   ]  ← Off-diagonal: correlations

Key Relationships

  1. Variance to Standard Deviation: σ = √(σ²)
  2. Covariance to Correlation: r = Cov(X,Y) / (σₓ × σᵧ)
  3. Raw to Standardized: Z = (x - μ) / σ
  4. Matrix Symmetry: Cov(X,Y) = Cov(Y,X)

Quick Reference

Measure Range Units Purpose
Mean Any Original Central tendency
Variance 0 to ∞ Original² Spread
Std Dev 0 to ∞ Original Interpretable spread
Covariance -∞ to ∞ X×Y units Raw relationship
Correlation -1 to +1 None Standardized relationship
Z-Score -∞ to ∞ Std devs Standardized position
Eigenvalues 0 to ∞* Variance units Variance in PC direction
Eigenvectors Unit length Dimensionless Principal directions
Covariance Matrix Symmetric Mixed units All variable relationships

*For covariance matrices (positive semi-definite)

Learning