Probabilistic Inference

COMP111 Lectures 2020-11-25

Marginalisation

Given an joint distribution $\mathbf{P}(F_1,\ldots,F_k)$, one can compute the marginal probabilities of the random variables $F_i$ by summing out the remaining variables.

For example:

\[\begin{aligned} P(\text{Cavity}=1)&=0.108+0.012+0.072+0.008\\ &=0.2 \end{aligned}\]

is the sum of the entries in the first row of:

	$\text{Toothache}=1$	$\text{Toothache}=1$	$\text{Toothache}=0$	$\text{Toothache}=0$
	$\text{Catch}=1$	$\text{Catch}=0$	$\text{Catch}=1$	$\text{Catch}=0$
$\text{Cavity}=1$	0.108	0.012	0.072	0.008
$\text{Cavity}=0$	0.016	0.064	0.144	0.576

Conditional Distributions

We can also computer conditional distributions from the full joint distributions.

We use the $\mathbf{P}$ notion for conditional distributions.
$\mathbf{P}(F\ \vert\ G)$ gives the conditional distribution of $F$ given $G$ given by the probabilities $P(F=r\ \vert\ G=s)$ for all values $r$ and $s$.
Using $\mathbf{P}$ notation, the general version of the product rule is as follows:

\[\mathbf{P}(F,G)=\mathbf{P}(F\ \vert\ G)\mathbf{P}(G)\]

This stands for the list of equations:

\[\begin{aligned} P(F=r_1,G=s_1)=&P(F=r_1\ \vert\ G=s_1)\\ &\times P(G=s_1)\\ P(F=r_2,G=s_2)=&P(F=r_2\ \vert\ G=s_2)\\ &\times P(G=s_2)\\ \ldots=&\ldots \end{aligned}\]

Probabilistic Inference

Probabilistic inference can be characterised as the computation of posterior probabilities:

\[\mathbf{P}(Q\ \vert\ E_1=e_1,\ldots,E_n=e_n)\]

for query variable $Q$ given observed evidence $e_1,\ldots,e_n$.

In principle, we can use the full joint distribution to do this.

Example - $\mathbf{P}(\text{Cavity}\ \vert\ \text{Toothache}=1)$

We want to compute the conditional probability distribution for $\text{Cavity}$ given the observation or evidence of $\text{Toothache}=1$.

Thus we want to compute:

$P(\text{Cavity}=1\ \vert\ \text{Toothache}=1)$
$P(\text{Cavity}=0\ \vert\ \text{Toothache}=1)$

We can easily obtain this using the table:

\[\begin{aligned} P(C=1\ \vert\ T=1)&=\frac{P(C=1, T=1)}{P(T=1)}\\ &=\frac{0.12}{0.2}=0.6\\ \end{aligned}\] \[\begin{aligned} P(C=0\ \vert\ T=1)&=\frac{P(C=0, T=1)}{P(T=1)}\\ &=\frac{0.08}{0.2}=0.4 \end{aligned}\]

In this example $C$ is $\text{Cavity}$ and $T$ is $\text{Toothache}$.

The denominator 0.2 can be viewed as a normalisation constant $\frac{1}{\alpha}=5$ for the distribution $\mathbf{P}(\text{Cavity}\ \vert\ \text{Toothache}=1)$, ensuring that is adds up to 1.

This means that instead of completing the computation above we can consider this:

\[\begin{aligned} \mathbf{P}(C\vert T=1)&=\alpha\mathbf{P}(C,T=1)\\ &= \alpha(0.12,0.08)\\ &=5(0.12,0.08)\\ &= (0.6,0.4) \end{aligned}\]

In this case $\alpha$ is defined as a value that makes the sum of the vector 1. In this case it is 5.

Additionally, $C$ is $\text{Cavity}$ and $T$ is $\text{Toothache}$.

Combinatorial Explosion

This approach does not scale well: for a domain described by $n$ random variable taking $k$ distinct values each we face tow problems:

Writing up the full joint distribution requires $k^n-1$ entries.
- The $-1$ in this case is as the full joint distribution adds to 1. To optimise we can not calculate the last value and just take the difference of the existing values and one.
- There are too many calculations
How to we find the numbers (probabilities) for the entries?
- We don’t have enough raw data.

For these reasons, the full joint distribution in tabular form in not a practical tool for building reasoning systems.