Probability Theory & Bayes Theroem
Random Variables
Discrete Random Variables
A set $X$ which contains a finite number of values ${x_1,x_2,\ldots,x_n}$
$P(X=x_i)$ (or $P(x_i)$) is the probability that the random variable $X$ takes on the value $x_i$.
Continuous Random Variables
$X$ takes on values along a bell curve.
$p(X=x)$ (or $p(x)$) is a probability density function
A probability density function would look like so:
\[P(x\in(a,b)) = \int^b_ap(x)dx\]This will be approximated by splitting the range into discrete values.
Joint and Conditional Probability
Joint Probability
If $X$ and $Y$ are independent:
\[P(x,y) = P(x)P(y)\]Conditional Probability
Probability of $x$ given $y$:
\[P(x\mid y)=\frac{P(x,y)}{P(y)}\]given that $x$ and $y$ are conditional:
\[P(x,y) = P(x\mid y)P(y)\]If $x$ are $y$ are independent:
\[\begin{aligned} P(x\mid y)&=\frac{P(x,y)}{P(y)}\\ &=\frac{P(x)P(y)}{P(y)}\\ &=P(x) \end{aligned}\]Theorem of Total Probability & Marginals
Marginalisation
-
Discrete Case
\[P(x)=\sum_yP(x,y)\] -
Continuous Case
\[p(x)=\int p(x,y)dy\]
Theorem of Total Probability
This follows from marginalisation as $P(x,y)=P(x\mid y)P(y)$:
-
Discrete Case
\[P(x)=\sum_yP(x\mid y)P(y)\] -
Continuous Case
\[p(x)=\int p(x\mid y)p(y)dy\]
Bayes Formula
The joint probability of $x$ and $y$ is commutative:
\[P(x.y) = P(x\mid y)P(y) = P(y\mid x)P(x)\]From this we can derive the Bayes Formula:
\[P(x\mid y) = \frac{P(y\mid x)P(x)}{P(y)}\] \[\text{posterior} = \frac{\text{likelihood}\times\text{prior}}{\text{evidence}}\]Normalisation
As $P(y)$ doesn’t depend on $x$ then we can replace it with a normaliser ($\eta=P(y)^{-1}$):
\[\begin{aligned} P(x\mid y) &= \frac{P(y\mid x)P(x)}{P(y)}\\ &=\eta P(y\mid x)P(x) \end{aligned}\]We can then calculate $\eta$ at a later stage:
\[\eta=\frac 1{\sum_xP(y\mid x)p(x)}\]Bayes Rule with Background Knowledge
Consider we want to calculate the probability of $x$ given $y$ and $z$. We can write this as:
\[P(x\mid y,z) = \frac{P(y\mid x,z)P(x\mid z)}{P(y\mid z)}\]