Skip to content
UoL CS Notes

Definition of Regular Expressions

COMP218 Lectures

String Concatenation

This is the product of two words. It is obtained by appending them together to form one long word.

s = abb
t = bab
st = abbbab
ts = bababb

It is non-commutative.

These strings can be defined formally:

\[s=x_1\ldots x_n , t=y_1\ldots y_m\] \[st=x_1\ldots x_ny_1\ldots y_m\]

There is also some notation:

  • $s^k$ - The string $s$ repeated $k$ times.
  • $\vert s\vert$ - The length of the string.

Concatenation of Languages

This operation can be extended to languages:

  • The concatenation of languages $L_1$ and $L_2$ is:

    \[L_1L_2=\{st\vert s\in L_1,t\in L_2\}\]
  • The $n^{\text{th}}$ power of $L^n$ is:

    \[L^n=\{s_1s_2\ldots s_n\vert s_1,s_2,\ldots,s_n\in L\}\]
  • The union of $L_1$ and $L_2$ is:

    \[L_1\cup L_2=\{s\vert s\in L_1 \text{ or } s\in L_2\}\]

Example

For the following languages:

  • $L_1\{0,01\}$
  • $L_2\{\epsilon,1,11,111,\ldots\}$

you would get the following results:

  • $L_1L_2=\{0,01,011,0111,\ldots\}$
  • $L_1^2=\{00,001,010,0101\}$

    This is all the combinations of the strings 0 and 01.

  • $L_1\cup L_2=\{0,01,\epsilon,1,11,111,\ldots\}$

Operations on Languages

See Set Theory, Kleene Star & DNF for additional examples.

  • $L^*=L^0\cup L^1 \cup L^2\cup\ldots$
    • This is all the strings made up of all combinations of $L$, including the empty string.

      The number of words in each length of expansion follows the Fibonacci sequence.

Combining Languages

We can construct languages by starting with simple ones, like $\{0\},\{1\}$ and combining them. Here are two languages and their equivalent regular expressions:

  • $\{0\}(\{0\}\cup\{1\})^*$ - All strings that start with 0.

      0(0 + 1)*
    
  • $(\{0\}\{1\}^*)\cup(\{1\}\{0\}^*)$ - 0 followed by any number of 1s or 1 followed by any number of 0s.

      01* + 10*
    

The order of operations is like so: *,.,+.

Regular Expressions

A regular expression over $\Sigma$ is an expression formed using the following rules:

  • The symbols $\emptyset$ and $\epsilon$ are regular expressions.
  • Ever $a$ in $\Sigma$ is a regular expression.
  • If $R$ and $S$ are regular expressions, so are $R+S$, $RS$ and $R^*$.

Any language corresponding to a regular expression is called regular. The set of all of them define the class of regular languages.