Skip to content
UoL CS Notes

The Chomsky Hierarchy

COMP218 Lectures

This is the order of how powerful various languages are in comparison to each-other.

Unrestricted Grammars

These grammars are made of a string of variables and terminals that lead to a string of variables and terminals like so:

\[\begin{aligned} S&\rightarrow aBc\\ aB&\rightarrow cA\\ Ac&\rightarrow d \end{aligned}\]

This is very similar to context free grammars.

Natural Language Processing

A language $L$ is Turing-acceptable if and only if $L$ is generated by an unrestricted grammar:

  • Context-free grammars are not sufficient to capture all natural languages.
  • Unrestricted grammars are sufficient, but too complex to be used in natural language processing (parsing is undecidable).

Context-Sensitive Grammars

These are made from a string of variables and terminals that lead to a string of variables and terminals, however for:

\[u\rightarrow v\]

The size of $u$ must the less than, or equal to, $v$:

\[\lvert u\rvert\leq\lvert v\rvert\]

For example, the language $\{a^nb^nc^n\}$ is context-sensitive:

\[\begin{aligned} S&\rightarrow abc\vert aAbc\\ Ab&\rightarrow bA\\ Ac &\rightarrow Bbcc\\ bB &\rightarrow Bb\\ aB &\rightarrow aa\vert aaA \end{aligned}\]

Linear-Bounded Automaton (LBA)

The computational version of context-sensitive grammars are LBAs.

These are Turing machines that can only write to the tape space reserved for the input string:

  • There are left and right-hand markers that the input string lie between.

An example of languages accepted by LBAs are:

\[L=\{a^nb^nc^n\}\]

and

\[L=\{a^{n!}\}\]
  • LBAs are more powerful than PDAs
  • LBAs are less powerful than Turing machines.

LBAs allow us to find loops as there is a limit on the number of states it can be in.

Context-Sensitive Languages

A language $L$ is context sensitive if an only if $L$ is accepted by a linear-bounded automaton.

There are languages which are recursive but not context sensitive.

The following language is decidable but not context-sensitive:

\[L_d=\{\langle B\rangle\vert B \text{ is an LBA and }\langle B\rangle \notin L(B)\}\]

This means that $B$ is a LBA that isn’t in the language of $B$.

It is decidable by the following program:

  1. Check if input $x=\langle B\rangle$ for some LBA $B$, if not reject.
  2. Run $B$ on input $\langle B\rangle$, if accepts then reject, otherwise accept.

Proof by contradiction:

Suppose LBA \(B^*\) recognises \(L_d\) (\(B^*\) is in the language \(L(B^*)\)):

  • If \(\langle B^*\rangle\notin L(M^*)\) then \(\langle B^*\rangle\in L(B^*)\).
  • If \(\langle B^*\rangle\in L(B^*)\) then \(\langle B^*\rangle\notin L(B^*)\).

The Chomsky Hierarchy

The languages we have seen so far are organised in the following hierarchy (with the most expressive and complex at the top):

  • Recursively Enumerable
    • Recursive
      • Context-Sensitive
        • Context-Free
          • Deterministic Context-Free
            • Regular

The number of all possible languages is $2^{\Sigma^*}$ and the complement of this are unrecognisable languages.