Skip to content
UoL CS Notes

Chomsky Normal Form

COMP218 Lectures

To parse quicker we can use the Cocke-Younger-Kasami algorithm. To do this we must prepare the CFG and convert it to Chomsky normal form:

  1. Eliminate $\epsilon$ productions.
  2. Eliminate unit productions.
  3. Add rules for terminals and split longer sequences of non-terminals.

Chomsky Normal Form

A CFG is in Chomsky normal form if every production has the form:

\[A\rightarrow BC\text{ or }A\rightarrow a\]

We allow $S\rightarrow\epsilon$ for the start variable only.

The third step of conversion goes like so. We start with the following production:

\[A\rightarrow BcDE\]
  1. Replace Terminals with new variables:

    \[\begin{aligned} A&\rightarrow BCDE\\ C&\rightarrow c \end{aligned}\]
  2. Break up sequences with new variables:

    \[\begin{aligned} A&\rightarrow BX\\ X&\rightarrow CY\\ Y&\rightarrow DE\\ \hline C&\rightarrow c \end{aligned}\]

If ORs are used |, then treat each as a separate production when simplifying. ORs are allowed as they are just a shorthand.