The Chomsky Hierarchy

COMP218 Lectures 2021-12-09

This is the order of how powerful various languages are in comparison to each-other.

Unrestricted Grammars

These grammars are made of a string of variables and terminals that lead to a string of variables and terminals like so:

\[\begin{aligned} S&\rightarrow aBc\\ aB&\rightarrow cA\\ Ac&\rightarrow d \end{aligned}\]

This is very similar to context free grammars.

Natural Language Processing

A language $L$ is Turing-acceptable if and only if $L$ is generated by an unrestricted grammar:

Context-free grammars are not sufficient to capture all natural languages.
Unrestricted grammars are sufficient, but too complex to be used in natural language processing (parsing is undecidable).

Context-Sensitive Grammars

These are made from a string of variables and terminals that lead to a string of variables and terminals, however for:

\[u\rightarrow v\]

The size of $u$ must the less than, or equal to, $v$:

\[\lvert u\rvert\leq\lvert v\rvert\]

For example, the language $\{a^nb^nc^n\}$ is context-sensitive:

\[\begin{aligned} S&\rightarrow abc\vert aAbc\\ Ab&\rightarrow bA\\ Ac &\rightarrow Bbcc\\ bB &\rightarrow Bb\\ aB &\rightarrow aa\vert aaA \end{aligned}\]

Linear-Bounded Automaton (LBA)

The computational version of context-sensitive grammars are LBAs.

These are Turing machines that can only write to the tape space reserved for the input string:

There are left and right-hand markers that the input string lie between.

An example of languages accepted by LBAs are:

\[L=\{a^nb^nc^n\}\]

and

\[L=\{a^{n!}\}\]

LBAs are more powerful than PDAs
LBAs are less powerful than Turing machines.

LBAs allow us to find loops as there is a limit on the number of states it can be in.

Context-Sensitive Languages

A language $L$ is context sensitive if an only if $L$ is accepted by a linear-bounded automaton.

There are languages which are recursive but not context sensitive.

The following language is decidable but not context-sensitive:

\[L_d=\{\langle B\rangle\vert B \text{ is an LBA and }\langle B\rangle \notin L(B)\}\]

This means that $B$ is a LBA that isn’t in the language of $B$.

It is decidable by the following program:

Check if input $x=\langle B\rangle$ for some LBA $B$, if not reject.
Run $B$ on input $\langle B\rangle$, if accepts then reject, otherwise accept.

Proof by contradiction:

Suppose LBA $B^*$ recognises $L_d$ ($B^*$ is in the language $L(B^*)$):

If $\langle B^*\rangle\notin L(M^*)$ then $\langle B^*\rangle\in L(B^*)$.
If $\langle B^*\rangle\in L(B^*)$ then $\langle B^*\rangle\notin L(B^*)$.

The Chomsky Hierarchy

The languages we have seen so far are organised in the following hierarchy (with the most expressive and complex at the top):

Recursively Enumerable
- Recursive
  - Context-Sensitive
    - Context-Free
      - Deterministic Context-Free
        
        Regular

The number of all possible languages is $2^{\Sigma^*}$ and the complement of this are unrecognisable languages.