Probability Theory & Bayes Theroem

Random Variables

A set $X$ which contains a finite number of values ${x_1,x_2,\ldots,x_n}$

$P(X=x_i)$ (or $P(x_i)$) is the probability that the random variable $X$ takes on the value $x_i$.

$X$ takes on values along a bell curve.

$p(X=x)$ (or $p(x)$) is a probability density function

A probability density function would look like so:

\[P(x\in(a,b)) = \int^b_ap(x)dx\]

This will be approximated by splitting the range into discrete values.

If $X$ and $Y$ are independent:

\[P(x,y) = P(x)P(y)\]

Probability of $x$ given $y$:

\[P(x\mid y)=\frac{P(x,y)}{P(y)}\]

given that $x$ and $y$ are conditional:

\[P(x,y) = P(x\mid y)P(y)\]

If $x$ are $y$ are independent:

\[\begin{aligned} P(x\mid y)&=\frac{P(x,y)}{P(y)}\\ &=\frac{P(x)P(y)}{P(y)}\\ &=P(x) \end{aligned}\]

This follows from marginalisation as $P(x,y)=P(x\mid y)P(y)$:

The joint probability of $x$ and $y$ is commutative:

\[P(x.y) = P(x\mid y)P(y) = P(y\mid x)P(x)\]

From this we can derive the Bayes Formula:

\[P(x\mid y) = \frac{P(y\mid x)P(x)}{P(y)}\] \[\text{posterior} = \frac{\text{likelihood}\times\text{prior}}{\text{evidence}}\]

As $P(y)$ doesn’t depend on $x$ then we can replace it with a normaliser ($\eta=P(y)^{-1}$):

\[\begin{aligned} P(x\mid y) &= \frac{P(y\mid x)P(x)}{P(y)}\\ &=\eta P(y\mid x)P(x) \end{aligned}\]

We can then calculate $\eta$ at a later stage:

\[\eta=\frac 1{\sum_xP(y\mid x)p(x)}\]

Consider we want to calculate the probability of $x$ given $y$ and $z$. We can write this as:

\[P(x\mid y,z) = \frac{P(y\mid x,z)P(x\mid z)}{P(y\mid z)}\]