Definition of Regular Expressions

COMP218 Lectures 2021-10-06

String Concatenation

This is the product of two words. It is obtained by appending them together to form one long word.

s = abb
t = bab
st = abbbab
ts = bababb

It is non-commutative.

These strings can be defined formally:

\[s=x_1\ldots x_n , t=y_1\ldots y_m\] \[st=x_1\ldots x_ny_1\ldots y_m\]

There is also some notation:

$s^k$ - The string $s$ repeated $k$ times.
$\vert s\vert$ - The length of the string.

Concatenation of Languages

This operation can be extended to languages:

The concatenation of languages $L_1$ and $L_2$ is:
\[L_1L_2=\{st\vert s\in L_1,t\in L_2\}\]
The $n^{\text{th}}$ power of $L^n$ is:
\[L^n=\{s_1s_2\ldots s_n\vert s_1,s_2,\ldots,s_n\in L\}\]
The union of $L_1$ and $L_2$ is:
\[L_1\cup L_2=\{s\vert s\in L_1 \text{ or } s\in L_2\}\]

Example

For the following languages:

$L_1\{0,01\}$
$L_2\{\epsilon,1,11,111,\ldots\}$

you would get the following results:

$L_1L_2=\{0,01,011,0111,\ldots\}$
$L_1^2=\{00,001,010,0101\}$

This is all the combinations of the strings 0 and 01.
$L_1\cup L_2=\{0,01,\epsilon,1,11,111,\ldots\}$

Operations on Languages

See Set Theory, Kleene Star & DNF for additional examples.

$L^*=L^0\cup L^1 \cup L^2\cup\ldots$
- This is all the strings made up of all combinations of $L$, including the empty string.
  
  The number of words in each length of expansion follows the Fibonacci sequence.

Combining Languages

We can construct languages by starting with simple ones, like $\{0\},\{1\}$ and combining them. Here are two languages and their equivalent regular expressions:

$\{0\}(\{0\}\cup\{1\})^*$ - All strings that start with 0.
```
  0(0 + 1)*
```
$(\{0\}\{1\}^*)\cup(\{1\}\{0\}^*)$ - 0 followed by any number of 1s or 1 followed by any number of 0s.
```
  01* + 10*
```

The order of operations is like so: *,.,+.

Regular Expressions

A regular expression over $\Sigma$ is an expression formed using the following rules:

The symbols $\emptyset$ and $\epsilon$ are regular expressions.
Ever $a$ in $\Sigma$ is a regular expression.
If $R$ and $S$ are regular expressions, so are $R+S$, $RS$ and $R^*$.

Any language corresponding to a regular expression is called regular. The set of all of them define the class of regular languages.