Skip to content
UoL CS Notes

Parsing Context Free Grammars

COMP218 Lectures

Parsing

  • We can use a brute force method like BFS to search if an input is in the language.
    • This is not ideal as it produces an exponentially large tree.
    • If input is not in the language parsing will never stop.

When to Stop

We could try to aid the previous method by implementing the following:

\[\lvert\text{derived string}\rvert>\lvert\text{input}\rvert\]

This can be problematic as:

  • Strings can shrink due to $\epsilon$ productions
  • There can be loops in the productions.

Removal of $\epsilon$-Productions

A variable $N$ is nullable if it can derive the empty string:

\[N\overset*\Rightarrow\epsilon\]
  1. Identify all nullable variables $N$.

    This includes all productions that point to nullable productions.

  2. Remove all $\epsilon$-Productions carefully (while adding extra productions to compensate for this).

    For every nullable $N$:

    • If you see $X\rightarrow\alpha N\beta$, add $X\rightarrow\alpha\beta$
    • If you see $N\rightarrow\epsilon$, remove it.
  3. If start variable $S$ is nullable:

    1. Add a net start variable $S’$.
    2. Add special productions $S’\rightarrow S\vert\epsilon$.

Removal of Unit Productions

A unit production is a production of the form:

\[A\rightarrow B\]

For example:

\[\begin{aligned} S&\rightarrow0S1\vert1S01\vert T\\ T&\rightarrow S\vert R\vert\epsilon\\ R&\rightarrow 0SR \end{aligned}\]
graph LR
S --> T
T --> S & R
  1. If there is a cycle of unit productions:

    \(A\rightarrow B\rightarrow C \rightarrow A\) delete it and replace everything with $A$.

    This would give the following graph and productions:

     graph LR 
     S --> R
    
    \[\begin{aligned} S&\rightarrow0S1\vert1S01\vert R\vert\epsilon\\ R&\rightarrow 0SR \end{aligned}\]
  2. Replace every chain:

    \[A\rightarrow B\rightarrow C\rightarrow \alpha\]

    by $A\rightarrow \alpha, B\rightarrow \alpha,C\rightarrow \alpha$.

    This would give the following productions:

    \[\begin{aligned} S&\rightarrow0S1\vert1S01\vert 0SR\vert\epsilon\\ R&\rightarrow 0SR \end{aligned}\]

Complete Method

  1. Eliminate $\epsilon$ productions.
  2. Eliminate unit productions.
  3. Try all possible derivations via BFS but stop parsing when:

    \[\lvert\text{derived string}\rvert>\lvert\text{input}\rvert\]

The order of these steps is important as eliminating $\epsilon$ productions can introduce unit productions.