Parsing Context Free Grammars
Parsing
- We can use a brute force method like BFS to search if an input is in the language.
- This is not ideal as it produces an exponentially large tree.
- If input is not in the language parsing will never stop.
When to Stop
We could try to aid the previous method by implementing the following:
\[\lvert\text{derived string}\rvert>\lvert\text{input}\rvert\]This can be problematic as:
- Strings can shrink due to $\epsilon$ productions
- There can be loops in the productions.
Removal of $\epsilon$-Productions
A variable $N$ is nullable if it can derive the empty string:
\[N\overset*\Rightarrow\epsilon\]-
Identify all nullable variables $N$.
This includes all productions that point to nullable productions.
-
Remove all $\epsilon$-Productions carefully (while adding extra productions to compensate for this).
For every nullable $N$:
- If you see $X\rightarrow\alpha N\beta$, add $X\rightarrow\alpha\beta$
- If you see $N\rightarrow\epsilon$, remove it.
-
If start variable $S$ is nullable:
- Add a net start variable $S’$.
- Add special productions $S’\rightarrow S\vert\epsilon$.
Removal of Unit Productions
A unit production is a production of the form:
\[A\rightarrow B\]For example:
\[\begin{aligned} S&\rightarrow0S1\vert1S01\vert T\\ T&\rightarrow S\vert R\vert\epsilon\\ R&\rightarrow 0SR \end{aligned}\]graph LR
S --> T
T --> S & R
-
If there is a cycle of unit productions:
\(A\rightarrow B\rightarrow C \rightarrow A\) delete it and replace everything with $A$.
This would give the following graph and productions:
\[\begin{aligned} S&\rightarrow0S1\vert1S01\vert R\vert\epsilon\\ R&\rightarrow 0SR \end{aligned}\]graph LR S --> R
-
Replace every chain:
\[A\rightarrow B\rightarrow C\rightarrow \alpha\]by $A\rightarrow \alpha, B\rightarrow \alpha,C\rightarrow \alpha$.
This would give the following productions:
\[\begin{aligned} S&\rightarrow0S1\vert1S01\vert 0SR\vert\epsilon\\ R&\rightarrow 0SR \end{aligned}\]
Complete Method
- Eliminate $\epsilon$ productions.
- Eliminate unit productions.
-
Try all possible derivations via BFS but stop parsing when:
\[\lvert\text{derived string}\rvert>\lvert\text{input}\rvert\]
The order of these steps is important as eliminating $\epsilon$ productions can introduce unit productions.