UoL CS Notes

Bayesian Networks - 2

COMP111 Lectures 2020-12-02

Joint Probability Distribution

We want to know the joint probability distribution from:

\[\mathbf{P}(F\vert \text{parents}(F))\]

Given a Belief Network, we can always assume an ordering:

\[F_1,\ldots,F_n\]

of its random variables such that for all $i,j$:

\[F_i\rightarrow F_j \text{ implies } i<j\]

Using the examples we can order the random variables as follows:

The student exam domain:

 graph TD
 Background --> Answers
 w[Works_hard] --> Answers
 Answers --> Grade

$\text{Background, Works_hard, Answers, Grade}$

The fire alarm domain:
```
 graph TD
 Tampering --> Alarm
 Fire --> Alarm
 Fire --> Smoke
 Alarm --> Leaving
 Leaving --> Report
```
$\text{Tampering, Fire, Alarm, Smoke, Leaving,}$ $\text{Report}$

In these examples we can see that there are no return/backward dependencies from items in the right of the list to items in the left of the list.

According to the chain rule, given $F_1,\ldots,F_n$ we have for all $r_1,\ldots r_n$:

\[\begin{aligned} & P(F_1=r_1,\ldots,F_n=r_n)=\\ & P(F_1=r_1)\times\\ & P(F_2=r_2\vert F_1=r_1)\times\\ & P(F_3=r_3\vert F_1=r_1,F_2=r_2)\times\\ & \cdots\\ & P(F_n=r_n\vert F_1=r_1,\ldots,F_{n-1}=r_{n-1}) \end{aligned}\]

Using bold $\mathbf{P}$ notation this reads:

\[\begin{aligned} \mathbf{P}(F_1,\ldots,F_n)=&\mathbf{P}(F_1)\times\\ & \mathbf{P}(F_2\vert F_1)\times\\ & \mathbf{P}(F_3\vert F_1,F_2)\times\\ & \cdots\\ & \mathbf{P}(F_n\vert F_1,\ldots,F_{n-1}) \end{aligned}\]

This also means that $F_1$ must have no parents as it is first in the list. This further means that if there are cycles they must start without a parent.

As parents $(F_i)\subseteq\{F_1,\ldots,F_{i-1}\}$ conditional independence implies:

\[\mathbf{P}(F_i\vert F_1,\ldots F_{i-1})=\mathbf{P}(F_i\vert\text{parents}(F_i))\]

This is to say that only the parent of the values of $F_i$ are required in order to calculate the probability of $F_i$ given it’s parents.

This means that we can rewrite the equation before as:

\[\begin{aligned} \mathbf{P}(F_1,\ldots,F_n)=&\mathbf{P}(F_1)\times\\ & \mathbf{P}(F_2\vert \text{parents}(F_2))\times\\ & \mathbf{P}(F_3\vert \text{parents}(F_3))\times\\ & \cdots\\ & \mathbf{P}(F_n\vert \text{parents}(F_n)) \end{aligned}\]

Example - Student Exam Domain

graph TD
Background --> Answers
w[Works_hard] --> Answers
Answers --> Grade

This graph gives the full joint probability distribution of:

\[\mathbf{P}(\text{Background, Works\_hard, Answers, Grade})\]

This can then be computed with:

\[\begin{aligned} &\mathbf{P}(\text{Background})\times\\ & \mathbf{P}(\text{Works\_hard})\times\\ & \mathbf{P}(\text{Answers}\vert \text{Background, Works\_hard})\times\\ & \mathbf{P}(\text{Grade}\vert \text{Answers}) \end{aligned}\]

This method gives significantly less entries than calculating using a full joint probability table. E.g. in the fire alarm example we go from $2^6-1$ entries to 12 by using conditional probabilities.

Example - Fire Alarm Domain

graph TD
Tampering --> Alarm
Fire --> Alarm
Fire --> Smoke
Alarm --> Leaving
Leaving --> Report

In this set we are using the following abbreviations:

$P(A\vert B)=P(A=1\vert B=1)$
$P(\neg A \vert B ) = P(A=0\vert B=1)$
and so on.

Assume the following (conditional) probabilities:

$P(\text{Tampering})=0.02$
$P(\text{Fire})=0.01$
$P(\text{Smoke} \vert \text{Fire})=0.9$
$P(\text{Smoke} \vert \neg\text{Fire})=0.01$
$P(\text{Alarm} \vert \text{Fire}\wedge\text{Tampering})=0.5$
$P(\text{Alarm} \vert \text{Fire}\wedge\neg\text{Tampering})=0.99$
$P(\text{Alarm} \vert \neg\text{Fire}\wedge\text{Tampering})=0.85$
$P(\text{Alarm} \vert \neg\text{Fire}\wedge\neg\text{Tampering})=0.0001$
$P(\text{Leaving} \vert \text{Alarm})=0.88$
$P(\text{Leaving} \vert \neg\text{Alarm})=0.001$
$P(\text{Report} \vert \text{Leaving})=0.75$
$P(\text{Report} \vert \neg\text{Leaving})=0.01$

Querying

If a report is observed, then the probability of fire and tampering go up:

$P(\text{Fire})=0.01$ and $P(\text{Fire}\vert\text{Report})=0.2305$
$P(\text{Tampering})=0.02$ and $P(\text{Tampering}\vert\text{Report})=0.399$

If, in addition, Smoke is observed, then probability of Fire goes up further but Tampering goes down:

$P(\text{Fire}\vert \text{Report}\wedge\text{Smoke})=0.964$

If you have two independent sources of information related to a subject in conjunction then you can expect the probability to increase dramatically.
$P(\text{Tampering}\vert \text{Report}\wedge\text{Smoke})=0.0284$

If, however, $\neg$Smoke is observed, then the probability of Fire goes down:

$P(\text{Fire}\vert \text{Report}\wedge\neg\text{Smoke})=0.0294$
$P(\text{Tampering}\vert \text{Report}\wedge\neg\text{Smoke})=0.501$

We can see from this that independent sources of evidence to make decisions are crucial.

For worked examples on querying see these videos.

Summary

Belief network are representations of conditional independence in probabilistic models.
Querying can often be done using exact inference.
Sometimes exact inference is too hard. There are also approximate algorithms.
Lots of research on learning belief networks from data. Either learning the conditional probabilities or even the structure of a belief network.