Chomsky Normal Form
To parse quicker we can use the Cocke-Younger-Kasami algorithm. To do this we must prepare the CFG and convert it to Chomsky normal form:
- Eliminate $\epsilon$ productions.
- Eliminate unit productions.
- Add rules for terminals and split longer sequences of non-terminals.
Chomsky Normal Form
A CFG is in Chomsky normal form if every production has the form:
\[A\rightarrow BC\text{ or }A\rightarrow a\]We allow $S\rightarrow\epsilon$ for the start variable only.
The third step of conversion goes like so. We start with the following production:
\[A\rightarrow BcDE\]-
Replace Terminals with new variables:
\[\begin{aligned} A&\rightarrow BCDE\\ C&\rightarrow c \end{aligned}\] -
Break up sequences with new variables:
\[\begin{aligned} A&\rightarrow BX\\ X&\rightarrow CY\\ Y&\rightarrow DE\\ \hline C&\rightarrow c \end{aligned}\]
If ORs are used |
, then treat each as a separate production when simplifying. ORs are allowed as they are just a shorthand.