Probability & Expectation Cheat Sheet

1. Probability Axioms and Basic Rules

Axioms (for events)

Nonnegativity:

\[ P(A) \ge 0 \]

Total probability of the sample space:

\[ P(\Omega) = 1 \]

Countable additivity (disjoint events):

\[ \text{If } A_i \cap A_j = \emptyset \text{ for } i \ne j, \quad P\left(\bigcup_{i} A_i\right) = \sum_i P(A_i) \]

Basic Identities

Complement:

\[ P(A^c) = 1 - P(A) \]

Union (two events):

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

Union (three events):

\[ P(A \cup B \cup C) = P(A)+P(B)+P(C) - P(A\cap B)-P(A\cap C)-P(B\cap C) + P(A\cap B\cap C) \]

Monotonicity:

\[ A \subseteq B \;\Rightarrow\; P(A) \le P(B) \]

Difference:

\[ P(A \setminus B) = P(A) - P(A \cap B) \]

Upper bound (union bound):

\[ P(A \cup B) \le P(A) + P(B) \]

General union bound:

\[ P\left(\bigcup_i A_i\right) \le \sum_i P(A_i) \]

Properties of PMFs and PDFs

PMF nonnegativity:

\[ p_X(x)=P(X=x) \ge 0 \]

PMF normalization:

\[ \sum_x p_X(x)=1 \]

PDF nonnegativity:

\[ f_X(x) \ge 0 \]

PDF normalization:

\[ \int_{-\infty}^{\infty} f_X(x)\,dx=1 \]

Axioms (for events)

Nonnegativity:

\[ P(A) \ge 0 \]

Total probability of the sample space:

\[ P(\Omega) = 1 \]

Properties of CDFs

CDF definition:

\[ F_X(x)=P(X\le x) \]

Discrete case:

\[ F_X(x)=\sum_{k\le x}P(X=k) \]

Continuous case:

\[ F_X(x)=\int_{-\infty}^{x} f_X(t)\,dt \]

CDF properties:

\(F_X(x)\) is increasing/nondecreasing.
\(\lim_{x\to -\infty}F_X(x)=0\)
\(\lim_{x\to \infty}F_X(x)=1\)

2. Combinatorics / Counting

Multiplication Rule

If a process has \(k\) steps, with \(n_1,n_2,\dots,n_k\) choices, then:

\[ \text{total outcomes} = n_1 n_2 \cdots n_k \]

Factorials

\[ n! = n(n-1)(n-2)\cdots 1 \] \[ 0! = 1 \]

Permutations (Order Matters)

Choose and arrange \(k\) objects from \(n\):

\[ P(n,k) = \frac{n!}{(n-k)!} \]

Arrange all \(n\) objects:

\[ n! \]

With repeated objects:

\[ \frac{n!}{n_1! n_2! \cdots n_r!} \]

Combinations (Order Does NOT Matter)

\[ \binom{n}{k} = \frac{n!}{k!(n-k)!} \]

Key identities:

\[ \binom{n}{k} = \binom{n}{n-k} \] \[ \binom{n}{0} = \binom{n}{n} = 1 \]

Stars and Bars (Distributing Identical Objects)

Nonnegative solutions:

Number of solutions to

\[ x_1 + x_2 + \cdots + x_k = n, \quad x_i \ge 0 \] \[ = \binom{n + k - 1}{k - 1} \]

Positive solutions:

Number of solutions to

\[ x_1 + x_2 + \cdots + x_k = n, \quad x_i \ge 1 \] \[ = \binom{n - 1}{k - 1} \]

Choosing With / Without Replacement

Order matters, with replacement:

\[ n^k \]

Order matters, without replacement:

\[ \frac{n!}{(n-k)!} \]

Order does NOT matter, without replacement:

\[ \binom{n}{k} \]

Order does NOT matter, with replacement:

Equivalent to solving

\[ x_1 + \cdots + x_k = n \] \[ \Rightarrow \binom{n + k - 1}{k - 1} \]

Binomial Theorem

\[ (a+b)^n = \sum_{k=0}^n \binom{n}{k} a^k b^{n-k} \]

Inclusion-Exclusion

Two sets:

\[ |A \cup B| = |A| + |B| - |A \cap B| \]

Three sets:

\[ |A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C| \]

Quick Recognition Guide

Order matters → permutations
Order does not matter → combinations
Repetition allowed → stars and bars / powers
Distributing identical items → stars and bars
"At least one" → convert to positive solutions

3. Law of Total Probability / Marginalization

Events: if \(B_i\) partition the sample space,

\[ P(A)=\sum_i P(A\mid B_i)P(B_i) \]

Discrete marginalization:

\[ P(X=x)=\sum_y P(X=x,Y=y) \] \[ P(Y=y)=\sum_x P(X=x,Y=y) \]

Continuous marginalization:

\[ f_X(x)=\int f_{X,Y}(x,y)\,dy \] \[ f_Y(y)=\int f_{X,Y}(x,y)\,dx \]

Total probability with random variables:

\[ P(A)=\sum_x P(A\mid X=x)P(X=x) \] \[ P(A)=\int P(A\mid X=x)f_X(x)\,dx \]

4. Bayes' Rule

Events:

\[ P(A_i\mid B)= \frac{P(B\mid A_i)P(A_i)} {\sum_j P(B\mid A_j)P(A_j)} \]

Discrete random variables:

\[ P(X=x\mid Y=y)= \frac{P(Y=y\mid X=x)P(X=x)} {P(Y=y)} \]

Continuous random variables:

\[ f_{X\mid Y}(x\mid y)= \frac{f_{Y\mid X}(y\mid x)f_X(x)} {f_Y(y)} \]

Continuous Bayes: event \(A\), continuous random variable \(Y\)

\[ P(A\mid Y\in C)= \frac{P(A)\int_C f_{Y\mid A}(y)\,dy} {\int_C f_Y(y)\,dy} \] \[ P(A\mid Y=y)= \frac{f_{Y\mid A}(y)P(A)} {f_Y(y)} \]

5. Expectation and Variance

Discrete:

\[ E[X]=\sum_x xP(X=x) \]

Continuous:

\[ E[X]=\int x f_X(x)\,dx \]

Variance:

\[ \mathrm{Var}(X)=E[X^2]-(E[X])^2 \]

Properties of Expectation

Linearity of expectation:

\[ E[aX+bY+c]=aE[X]+bE[Y]+c \]

Important example:

\[ E[aX^2+bX+c]=aE[X^2]+bE[X]+c \]

Do NOT write:

\[ \color{red}{E[aX^2+bX+c]=aE[X]^2+bE[X]+c} \]

6. Conditional Probability and Conditional Expectation

Events:

\[ P(A\mid B)=\frac{P(A\cap B)}{P(B)} \]

Discrete:

\[ P(X=x\mid Y=y)= \frac{P(X=x,Y=y)}{P(Y=y)} \]

Continuous:

\[ f_{X\mid Y}(x\mid y)= \frac{f_{X,Y}(x,y)}{f_Y(y)} \]

Conditional expectation:

\[ E[X\mid Y=y]= \sum_x xP(X=x\mid Y=y) \] \[ E[X\mid Y=y]= \int x f_{X\mid Y}(x\mid y)\,dx \]

7. Law of Total Expectation

Discrete version:

\[ E[X]=\sum_y E[X\mid Y=y]P(Y=y) \]

where

\[ E[X\mid Y=y]=\sum_x xP(X=x\mid Y=y) \]

So:

\[ E[X]=\sum_y\left(\sum_x xP(X=x\mid Y=y)\right)P(Y=y) \]

Continuous version:

\[ E[X]=\int E[X\mid Y=y]f_Y(y)\,dy \]

where

\[ E[X\mid Y=y]=\int x f_{X\mid Y}(x\mid y)\,dx \]

So:

\[ E[X]=\int\left(\int x f_{X\mid Y}(x\mid y)\,dx\right)f_Y(y)\,dy \]

8. Independence

Events:

\[ A\perp B \Longleftrightarrow P(A\cap B)=P(A)P(B) \]

Discrete random variables, for all \(x,y\):

\[ X\perp Y \Longleftrightarrow P(X=x,Y=y)=P(X=x)P(Y=y) \quad \forall x,y \]

Continuous random variables, for all \(x,y\):

\[ X\perp Y \Longleftrightarrow f_{X,Y}(x,y)=f_X(x)f_Y(y) \quad \forall x,y \]

Mutual independence:

\[ A_1,\dots,A_n \text{ are independent} \Longleftrightarrow P\left(\bigcap_{i\in S}A_i\right) = \prod_{i\in S}P(A_i) \]

for every \(S\subseteq \{1,\dots,n\}\) with \(|S|\ge 2\).

Consequences of independence:

\[ E[XY]=E[X]E[Y] \] \[ E[f(X)g(Y)]=E[f(X)]E[g(Y)] \]

Variance of a sum of independent random variables:

\[ \mathrm{Var}\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \mathrm{Var}(X_i) \]

9. Covariance and Correlation

\[ \mathrm{Cov}(X,Y)=E[XY]-E[X]E[Y] \] \[ \mathrm{Corr}(X,Y)= \frac{\mathrm{Cov}(X,Y)} {\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}} \]

\[ \mathrm{Corr}(X,Y)=0 \not\Rightarrow X\perp Y \]

Zero correlation does not imply independence.

10. Discrete Distributions

Bernoulli \((p)\)

When: one success/failure trial.

\[ P(X=1)=p,\quad P(X=0)=1-p \] \[ E[X]=p,\quad \mathrm{Var}(X)=p(1-p) \]

Binomial \((n,p)\)

When: number of successes in \(n\) independent Bernoulli trials.

\[ X=\sum_{i=1}^n X_i,\quad X_i\sim \mathrm{Bernoulli}(p) \] \[ P(X=k)=\binom{n}{k}p^k(1-p)^{n-k} \] \[ E[X]=np,\quad \mathrm{Var}(X)=np(1-p) \]

Geometric \((p)\)

When: number of trials until first success.

\[ P(X=k)=(1-p)^{k-1}p,\quad k=1,2,\dots \] \[ E[X]=\frac{1}{p},\quad \mathrm{Var}(X)=\frac{1-p}{p^2} \]

\[ F_X(k)=P(X\leq k)=1-(1-p)^{k},\quad k=1,2,\dots \]

Memoryless property \[ P(X> t+s\mid X > t)=P(X> s)\]

Discrete Uniform \(\{1,\dots,n\}\)

When: all outcomes are equally likely.

\[ P(X=k)=\frac{1}{n} \] \[ E[X]=\frac{n+1}{2},\quad \mathrm{Var}(X)=\frac{n^2-1}{12} \]

Poisson \((\lambda)\)

When: counts of rare independent events.

\[ P(X=k)=\frac{\lambda^k e^{-\lambda}}{k!} \] \[ E[X]=\lambda,\quad \mathrm{Var}(X)=\lambda \]

Binomial limit:

\[ n\to\infty,\quad p\to 0,\quad np\to \lambda \]

<br/>

11. Continuous Distributions

Normal \((\mu,\sigma^2)\)

When: sums/averages of many independent effects.

\[ f(x)= \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \] \[ E[X]=\mu,\quad \mathrm{Var}(X)=\sigma^2 \]

Properties:

The normal distribution is symmetric around the mean \(\mu\).
The highest point of the curve is at \(x=\mu\).
About 68%, 95%, and 99.7% of values fall within \(1\sigma\), \(2\sigma\), and \(3\sigma\) of the mean.

How to use a normal table / Z-table:

First standardize: \(Z=\frac{X-\mu}{\sigma}\).
Most Z-tables give left-tail probabilities: \(P(Z\le z)\).
For right-tail probabilities, use \(P(Z>z)=1-P(Z\le z)\).
For negative values, use symmetry: \(P(Z\le -z)=1-P(Z\le z)\).
For intervals, subtract: \(P(a\le Z\le b)=P(Z\le b)-P(Z\le a)\)

Exponential \((\lambda)\)

When: waiting time between Poisson events.

\[ f_X(x)=\lambda e^{-\lambda x},\quad x\ge 0 \] \[ E[X]=\frac{1}{\lambda},\quad \mathrm{Var}(X)=\frac{1}{\lambda^2} \]

\[ F_X(x)=P(X\leq x) =1-\exp(-\lambda x),\quad x\ge 0, 0 \mbox{ otherwise.} \] \[ E[X]=\frac{1}{\lambda},\quad \mathrm{Var}(X)=\frac{1}{\lambda^2} \]

Memoryless property \[ P(X> t+s\mid X > t)=P(X> s)\]

Continuous Uniform \((a,b)\)

When: all values in an interval are equally likely.

\[ f(x)=\frac{1}{b-a},\quad a\le x\le b \] \[ E[X]=\frac{a+b}{2},\quad \mathrm{Var}(X)=\frac{(b-a)^2}{12} \]

\[ F_X(x)=P(X\leq x) =\frac{x-a}{b-a},\quad a\le x\le b, \quad 0 \text{ when }x < a, \quad 1 \text{ when }x\ge b \]

12. Markov's Inequality, Chebyshev's Inequality, and WLLN

Markov's Inequality

If \(X\ge 0\) and \(a>0\), then:

\[ P(X\ge a)\le \frac{E[X]}{a} \]

Chebyshev's Inequality

If \(E[X]=\mu\) and \(\mathrm{Var}(X)=\sigma^2\), then for \(a>0\):

\[ P(|X-\mu|\ge a)\le \frac{\sigma^2}{a^2} \]

Equivalently, for \(k>0\):

\[ P(|X-\mu|\ge k\sigma)\le \frac{1}{k^2} \]

Weak Law of Large Numbers

If \(X_1,\dots,X_n\) are i.i.d. with \(E[X_i]=\mu\) and \(\mathrm{Var}(X_i)=\sigma^2\), then:

\[ \bar X_n=\frac{1}{n}\sum_{i=1}^n X_i \] \[ E[\bar X_n]=\mu,\quad \mathrm{Var}(\bar X_n)=\frac{\sigma^2}{n} \]

Using Chebyshev's inequality, for every \(\varepsilon>0\):

\[ P(|\bar X_n-\mu|>\varepsilon) \le \frac{\sigma^2}{n\varepsilon^2} \]

13. Central Limit Theorem

If \(X_1,\dots,X_n\) are i.i.d. with \(E[X_i]=\mu\) and \(\mathrm{Var}(X_i)=\sigma^2\), then:

Sample mean:

\[ \bar X_n \approx \mathcal{N}\left(\mu,\frac{\sigma^2}{n}\right) \]

Standardized form:

\[ \frac{\bar X_n-\mu}{\sigma/\sqrt n} \approx \mathcal{N}(0,1) \]

Remember

Total probability = sum/integrate over cases.
Marginalization = remove one variable by summing/integrating.
Bayes = flip conditioning.
Mutual independence of RVs= factorization for all values.
Mutual independence of events= factorization works for every subset.
Continuous distribution = \(P(X=x)=0\).
Binomial = sum of independent Bernoulli variables.
Poisson = Binomial limit with \(n\to\infty\), \(p\to0\), \(np\to\lambda\).
Exponential = continuous analogue of the geometric distribution..
Correlation \(0\) does not imply independence.

14. Functions of One Random Variable

Discrete case:

If \(Y = g(X)\), then

\[ P(Y=y) = \sum_{x: g(x)=y} P(X=x) \]

Continuous case (CDF method):

If \(Y = g(X)\), then

\[ F_Y(y) = P(Y \le y) = P(g(X) \le y) \] Differentiate to get density: \[ f_Y(y) = \frac{d}{dy} F_Y(y) \]

Continuous case (monotone \(g\)):

If \(g\) is one-to-one and differentiable:

\[ f_Y(y) = f_X(x)\left|\frac{dx}{dy}\right| \quad \text{where } x = g^{-1}(y) \]

Expectation shortcut:

\[ E[g(X)] = \sum_x g(x)P(X=x) \quad \text{(discrete)} \] \[ E[g(X)] = \int g(x) f_X(x)\,dx \quad \text{(continuous)} \]

15. Multiple Random Variables

Chain Rule

Events:

\[ P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1)\,P(A_2 \mid A_1)\,P(A_3 \mid A_1,A_2)\cdots P(A_n \mid A_1,\dots,A_{n-1}) \]

Discrete random variables:

\[ P(X_1=x_1,\dots,X_n=x_n) = P(X_1=x_1)\,P(X_2=x_2 \mid X_1=x_1)\cdots P(X_n=x_n \mid X_1,\dots,X_{n-1}) \]

Continuous random variables:

\[ f_{X_1,\dots,X_n}(x_1,\dots,x_n) = f_{X_1}(x_1)\,f_{X_2 \mid X_1}(x_2 \mid x_1)\cdots f_{X_n \mid X_1,\dots,X_{n-1}}(x_n \mid x_1,\dots,x_{n-1}) \]

Marginalization

Discrete:

\[ P(X=x) = \sum_y P(X=x,Y=y) \] \[ P(X_1=x_1) = \sum_{x_2,\dots,x_n} P(X_1=x_1,\dots,X_n=x_n) \]

Continuous:

\[ f_X(x) = \int f_{X,Y}(x,y)\,dy \] \[ f_{X_1}(x_1) = \int \cdots \int f_{X_1,\dots,X_n}(x_1,\dots,x_n) \,dx_2 \cdots dx_n \]