Skip to content
UoL CS Notes

Naive Bayes Classifier - 2

COMP111 Lectures

Suppose we want to predict the class of the instance with features $(e_1,\ldots,e_n)$.

We assume $E_1,\ldots,E_n$ are independent given $Y$ and estimate:

\[V_y=P(Y=y)\prod^n_{i=1}P(E_i=e_i\vert Y=y)\]

The $\prod$ symbol means the product of.

for all $y\in \mathcal Y$ as follows:

  • $P(Y=y)$ is estimated by the number of instance labelled with $y$ in the training data divided by the number of all instances in the training data.
  • $P(E_i=e_i\vert Y=y)$ is estimated by the number of instances with feature $e_i$ labelled as $y$ in the training data divided by the number of all instances labelled with $y$ in the training data.

We then set:


for the $y$ for which $V_y$ is maximal (take the maximally probability hypothesis).

View side 27 onwards for an additional example about filtering spam emails.


  • A supervised learning algorithm based on Bayes Theorem.
  • It is called naive because it is assumed that the features are independent of each other, given the classification.
  • Naive Bayes classifier works surprising well in practice even if the features are obviously not independent given the classification.