Nonnegativity:
\[ P(A) \ge 0 \]Total probability of the sample space:
\[ P(\Omega) = 1 \]Countable additivity (disjoint events):
\[ \text{If } A_i \cap A_j = \emptyset \text{ for } i \ne j, \quad P\left(\bigcup_{i} A_i\right) = \sum_i P(A_i) \]Complement:
\[ P(A^c) = 1 - P(A) \]Union (two events):
\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]Union (three events):
\[ P(A \cup B \cup C) = P(A)+P(B)+P(C) - P(A\cap B)-P(A\cap C)-P(B\cap C) + P(A\cap B\cap C) \]Monotonicity:
\[ A \subseteq B \;\Rightarrow\; P(A) \le P(B) \]Difference:
\[ P(A \setminus B) = P(A) - P(A \cap B) \]Upper bound (union bound):
\[ P(A \cup B) \le P(A) + P(B) \]General union bound:
\[ P\left(\bigcup_i A_i\right) \le \sum_i P(A_i) \]If a process has \(k\) steps, with \(n_1,n_2,\dots,n_k\) choices, then:
\[ \text{total outcomes} = n_1 n_2 \cdots n_k \]Choose and arrange \(k\) objects from \(n\):
\[ P(n,k) = \frac{n!}{(n-k)!} \]Arrange all \(n\) objects:
\[ n! \]With repeated objects:
\[ \frac{n!}{n_1! n_2! \cdots n_r!} \]Key identities:
\[ \binom{n}{k} = \binom{n}{n-k} \] \[ \binom{n}{0} = \binom{n}{n} = 1 \]Nonnegative solutions:
Number of solutions to
\[ x_1 + x_2 + \cdots + x_k = n, \quad x_i \ge 0 \] \[ = \binom{n + k - 1}{k - 1} \]Positive solutions:
Number of solutions to
\[ x_1 + x_2 + \cdots + x_k = n, \quad x_i \ge 1 \] \[ = \binom{n - 1}{k - 1} \]Order matters, with replacement:
\[ n^k \]Order matters, without replacement:
\[ \frac{n!}{(n-k)!} \]Order does NOT matter, without replacement:
\[ \binom{n}{k} \]Order does NOT matter, with replacement:
Equivalent to solving
\[ x_1 + \cdots + x_k = n \] \[ \Rightarrow \binom{n + k - 1}{k - 1} \]Two sets:
\[ |A \cup B| = |A| + |B| - |A \cap B| \]Three sets:
\[ |A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C| \]Events: if \(B_i\) partition the sample space,
\[ P(A)=\sum_i P(A\mid B_i)P(B_i) \]
Discrete marginalization:
\[ P(X=x)=\sum_y P(X=x,Y=y) \] \[ P(Y=y)=\sum_x P(X=x,Y=y) \]
Continuous marginalization:
\[ f_X(x)=\int f_{X,Y}(x,y)\,dy \] \[ f_Y(y)=\int f_{X,Y}(x,y)\,dx \]
Total probability with random variables:
\[ P(A)=\sum_x P(A\mid X=x)P(X=x) \] \[ P(A)=\int P(A\mid X=x)f_X(x)\,dx \]
Events:
\[ P(A_i\mid B)= \frac{P(B\mid A_i)P(A_i)} {\sum_j P(B\mid A_j)P(A_j)} \]
Discrete random variables:
\[ P(X=x\mid Y=y)= \frac{P(Y=y\mid X=x)P(X=x)} {P(Y=y)} \]
Continuous random variables:
\[ f_{X\mid Y}(x\mid y)= \frac{f_{Y\mid X}(y\mid x)f_X(x)} {f_Y(y)} \]
Continuous Bayes: event \(A\), continuous random variable \(Y\)
\[ P(A\mid Y\in C)= \frac{P(A)\int_C f_{Y\mid A}(y)\,dy} {\int_C f_Y(y)\,dy} \] \[ P(A\mid Y=y)= \frac{f_{Y\mid A}(y)P(A)} {f_Y(y)} \]
Discrete:
\[ E[X]=\sum_x xP(X=x) \]
Continuous:
\[ E[X]=\int x f_X(x)\,dx \]
Variance:
\[ \mathrm{Var}(X)=E[X^2]-(E[X])^2 \]
Linearity of expectation:
\[ E[aX+bY+c]=aE[X]+bE[Y]+c \]
Important example:
\[ E[aX^2+bX+c]=aE[X^2]+bE[X]+c \]
Do NOT write:
\[ \color{red}{E[aX^2+bX+c]=aE[X]^2+bE[X]+c} \]
Events:
\[ P(A\mid B)=\frac{P(A\cap B)}{P(B)} \]
Discrete:
\[ P(X=x\mid Y=y)= \frac{P(X=x,Y=y)}{P(Y=y)} \]
Continuous:
\[ f_{X\mid Y}(x\mid y)= \frac{f_{X,Y}(x,y)}{f_Y(y)} \]
Conditional expectation:
\[ E[X\mid Y=y]= \sum_x xP(X=x\mid Y=y) \] \[ E[X\mid Y=y]= \int x f_{X\mid Y}(x\mid y)\,dx \]
Discrete version:
\[ E[X]=\sum_y E[X\mid Y=y]P(Y=y) \]
where
\[ E[X\mid Y=y]=\sum_x xP(X=x\mid Y=y) \]
So:
\[ E[X]=\sum_y\left(\sum_x xP(X=x\mid Y=y)\right)P(Y=y) \]
Continuous version:
\[ E[X]=\int E[X\mid Y=y]f_Y(y)\,dy \]
where
\[ E[X\mid Y=y]=\int x f_{X\mid Y}(x\mid y)\,dx \]
So:
\[ E[X]=\int\left(\int x f_{X\mid Y}(x\mid y)\,dx\right)f_Y(y)\,dy \]
Events:
\[ A\perp B \Longleftrightarrow P(A\cap B)=P(A)P(B) \]
Discrete random variables, for all \(x,y\):
\[ X\perp Y \Longleftrightarrow P(X=x,Y=y)=P(X=x)P(Y=y) \quad \forall x,y \]
Continuous random variables, for all \(x,y\):
\[ X\perp Y \Longleftrightarrow f_{X,Y}(x,y)=f_X(x)f_Y(y) \quad \forall x,y \]
Mutual independence:
\[ A_1,\dots,A_n \text{ are independent} \Longleftrightarrow P\left(\bigcap_{i\in S}A_i\right) = \prod_{i\in S}P(A_i) \]
for every \(S\subseteq \{1,\dots,n\}\) with \(|S|\ge 2\).
Consequences of independence:
\[ E[XY]=E[X]E[Y] \] \[ E[f(X)g(Y)]=E[f(X)]E[g(Y)] \]
Variance of a sum of independent random variables:
\[ \mathrm{Var}\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \mathrm{Var}(X_i) \]
\[ \mathrm{Cov}(X,Y)=E[XY]-E[X]E[Y] \] \[ \mathrm{Corr}(X,Y)= \frac{\mathrm{Cov}(X,Y)} {\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}} \]
\[ \mathrm{Corr}(X,Y)=0 \not\Rightarrow X\perp Y \]
Zero correlation does not imply independence.
When: one success/failure trial.
\[ P(X=1)=p,\quad P(X=0)=1-p \] \[ E[X]=p,\quad \mathrm{Var}(X)=p(1-p) \]
When: number of successes in \(n\) independent Bernoulli trials.
\[ X=\sum_{i=1}^n X_i,\quad X_i\sim \mathrm{Bernoulli}(p) \] \[ P(X=k)=\binom{n}{k}p^k(1-p)^{n-k} \] \[ E[X]=np,\quad \mathrm{Var}(X)=np(1-p) \]
When: number of trials until first success.
\[ P(X=k)=(1-p)^{k-1}p,\quad k=1,2,\dots \] \[ E[X]=\frac{1}{p},\quad \mathrm{Var}(X)=\frac{1-p}{p^2} \]
When: all outcomes are equally likely.
\[ P(X=k)=\frac{1}{n} \] \[ E[X]=\frac{n+1}{2},\quad \mathrm{Var}(X)=\frac{n^2-1}{12} \]
When: counts of rare independent events.
\[ P(X=k)=\frac{\lambda^k e^{-\lambda}}{k!} \] \[ E[X]=\lambda,\quad \mathrm{Var}(X)=\lambda \]
Binomial limit:
\[ n\to\infty,\quad p\to 0,\quad np\to \lambda \]
<br/>
When: sums/averages of many independent effects.
\[ f(x)= \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \] \[ E[X]=\mu,\quad \mathrm{Var}(X)=\sigma^2 \]
Properties:
How to use a normal table / Z-table:
When: waiting time between Poisson events.
\[ f(x)=\lambda e^{-\lambda x},\quad x\ge 0 \] \[ E[X]=\frac{1}{\lambda},\quad \mathrm{Var}(X)=\frac{1}{\lambda^2} \]
When: all values in an interval are equally likely.
\[ f(x)=\frac{1}{b-a},\quad a\le x\le b \] \[ E[X]=\frac{a+b}{2},\quad \mathrm{Var}(X)=\frac{(b-a)^2}{12} \]
If \(X\ge 0\) and \(a>0\), then:
\[ P(X\ge a)\le \frac{E[X]}{a} \]
If \(E[X]=\mu\) and \(\mathrm{Var}(X)=\sigma^2\), then for \(a>0\):
\[ P(|X-\mu|\ge a)\le \frac{\sigma^2}{a^2} \]
Equivalently, for \(k>0\):
\[ P(|X-\mu|\ge k\sigma)\le \frac{1}{k^2} \]
If \(X_1,\dots,X_n\) are i.i.d. with \(E[X_i]=\mu\) and \(\mathrm{Var}(X_i)=\sigma^2\), then:
\[ \bar X_n=\frac{1}{n}\sum_{i=1}^n X_i \] \[ E[\bar X_n]=\mu,\quad \mathrm{Var}(\bar X_n)=\frac{\sigma^2}{n} \]
Using Chebyshev's inequality, for every \(\varepsilon>0\):
\[ P(|\bar X_n-\mu|>\varepsilon) \le \frac{\sigma^2}{n\varepsilon^2} \]
If \(X_1,\dots,X_n\) are i.i.d. with \(E[X_i]=\mu\) and \(\mathrm{Var}(X_i)=\sigma^2\), then:
Sample mean:
\[ \bar X_n \approx \mathcal{N}\left(\mu,\frac{\sigma^2}{n}\right) \]
Standardized form:
\[ \frac{\bar X_n-\mu}{\sigma/\sqrt n} \approx \mathcal{N}(0,1) \]
Discrete case:
If \(Y = g(X)\), then
\[ P(Y=y) = \sum_{x: g(x)=y} P(X=x) \]Continuous case (CDF method):
If \(Y = g(X)\), then
\[ F_Y(y) = P(Y \le y) = P(g(X) \le y) \] Differentiate to get density: \[ f_Y(y) = \frac{d}{dy} F_Y(y) \]Continuous case (monotone \(g\)):
If \(g\) is one-to-one and differentiable:
\[ f_Y(y) = f_X(x)\left|\frac{dx}{dy}\right| \quad \text{where } x = g^{-1}(y) \]Expectation shortcut:
\[ E[g(X)] = \sum_x g(x)P(X=x) \quad \text{(discrete)} \] \[ E[g(X)] = \int g(x) f_X(x)\,dx \quad \text{(continuous)} \]Events:
\[ P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1)\,P(A_2 \mid A_1)\,P(A_3 \mid A_1,A_2)\cdots P(A_n \mid A_1,\dots,A_{n-1}) \]Discrete random variables:
\[ P(X_1=x_1,\dots,X_n=x_n) = P(X_1=x_1)\,P(X_2=x_2 \mid X_1=x_1)\cdots P(X_n=x_n \mid X_1,\dots,X_{n-1}) \]Continuous random variables:
\[ f_{X_1,\dots,X_n}(x_1,\dots,x_n) = f_{X_1}(x_1)\,f_{X_2 \mid X_1}(x_2 \mid x_1)\cdots f_{X_n \mid X_1,\dots,X_{n-1}}(x_n \mid x_1,\dots,x_{n-1}) \]Discrete:
\[ P(X=x) = \sum_y P(X=x,Y=y) \] \[ P(X_1=x_1) = \sum_{x_2,\dots,x_n} P(X_1=x_1,\dots,X_n=x_n) \]Continuous:
\[ f_X(x) = \int f_{X,Y}(x,y)\,dy \] \[ f_{X_1}(x_1) = \int \cdots \int f_{X_1,\dots,X_n}(x_1,\dots,x_n) \,dx_2 \cdots dx_n \]