The Chomsky Hierarchy
This is the order of how powerful various languages are in comparison to each-other.
Unrestricted Grammars
These grammars are made of a string of variables and terminals that lead to a string of variables and terminals like so:
\[\begin{aligned} S&\rightarrow aBc\\ aB&\rightarrow cA\\ Ac&\rightarrow d \end{aligned}\]This is very similar to context free grammars.
Natural Language Processing
A language $L$ is Turing-acceptable if and only if $L$ is generated by an unrestricted grammar:
- Context-free grammars are not sufficient to capture all natural languages.
- Unrestricted grammars are sufficient, but too complex to be used in natural language processing (parsing is undecidable).
Context-Sensitive Grammars
These are made from a string of variables and terminals that lead to a string of variables and terminals, however for:
\[u\rightarrow v\]The size of $u$ must the less than, or equal to, $v$:
\[\lvert u\rvert\leq\lvert v\rvert\]For example, the language $\{a^nb^nc^n\}$ is context-sensitive:
\[\begin{aligned} S&\rightarrow abc\vert aAbc\\ Ab&\rightarrow bA\\ Ac &\rightarrow Bbcc\\ bB &\rightarrow Bb\\ aB &\rightarrow aa\vert aaA \end{aligned}\]Linear-Bounded Automaton (LBA)
The computational version of context-sensitive grammars are LBAs.
These are Turing machines that can only write to the tape space reserved for the input string:
- There are left and right-hand markers that the input string lie between.
An example of languages accepted by LBAs are:
\[L=\{a^nb^nc^n\}\]and
\[L=\{a^{n!}\}\]- LBAs are more powerful than PDAs
- LBAs are less powerful than Turing machines.
LBAs allow us to find loops as there is a limit on the number of states it can be in.
Context-Sensitive Languages
A language $L$ is context sensitive if an only if $L$ is accepted by a linear-bounded automaton.
There are languages which are recursive but not context sensitive.
The following language is decidable but not context-sensitive:
\[L_d=\{\langle B\rangle\vert B \text{ is an LBA and }\langle B\rangle \notin L(B)\}\]This means that $B$ is a LBA that isn’t in the language of $B$.
It is decidable by the following program:
- Check if input $x=\langle B\rangle$ for some LBA $B$, if not reject.
- Run $B$ on input $\langle B\rangle$, if accepts then reject, otherwise accept.
Proof by contradiction:
Suppose LBA \(B^*\) recognises \(L_d\) (\(B^*\) is in the language \(L(B^*)\)):
- If \(\langle B^*\rangle\notin L(M^*)\) then \(\langle B^*\rangle\in L(B^*)\).
- If \(\langle B^*\rangle\in L(B^*)\) then \(\langle B^*\rangle\notin L(B^*)\).
The Chomsky Hierarchy
The languages we have seen so far are organised in the following hierarchy (with the most expressive and complex at the top):
- Recursively Enumerable
- Recursive
- Context-Sensitive
- Context-Free
- Deterministic Context-Free
- Regular
- Deterministic Context-Free
- Context-Free
- Context-Sensitive
- Recursive
The number of all possible languages is $2^{\Sigma^*}$ and the complement of this are unrecognisable languages.