COMP116 (Maths for Computer Science)

Numbers & Number Types

COMP116 Lectures 2021-01-24

What are Numbers?

Numbers are the objects that we use to count things. Depending on what we are counting we want to use different sets, or types, of numbers.

Sets of Numbers

$\Bbb N$:1,2,3,4… (Natural numbers)
- Zero is not a part of this set.
$\Bbb W$:0,1,2,3,4… (Whole numbers)
$\Bbb Z$: …,-3,-2,-1,0,1,2,3… (Integers)
$\Bbb Q$: $\frac{1}{2},\frac{3}{4},\frac{2}{3},\frac{5}{8}$… (Rational)

\[\Bbb{N}\subseteq\Bbb{W}\subseteq\Bbb{Z}\subseteq\Bbb{Q}\]

All computers only use rational numbers.

Irrational Numbers

If we want to calculate things such as areas and lengths then we cannot guarantee that we will receive a rational number.

This set of numbers are called the Real Numbers ($\Bbb{R}$)

These numbers are modelled by data types such as float and double in high level languages.

Polynomials & their Properties

COMP116 Lectures 2021-01-24

Irrationals Revisited

Irrational numbers such as $\sqrt{2}$ can be seen as solutions to particular identities.

Polynomials & Their Roots

In general, a polynomial of degree $k$ in $x$ is defined through $k+1$ coefficients:

\[(c_0,c_1,\ldots,c_k)\]

This can be written in the following functional form:

\[p(x)=\sum^k_{n=0}c_nx^n\]

The coefficient, $c_t$, multiplies $x^t$ when $p(x)$ is evaluated at some $x=\alpha$. This is written as $p(\alpha)$.

Coefficients Expanded

The number types used to choose coefficients can be from any of the sets we have seen so far.

Coefficients can also be limited to finite subset of these:

$\Bbb{Z}_m$
- The integers $\{0,1,2,\ldots,m-1\}$ with modulo $m$ arithmetic.

If $H$ is such a set $H_k[X]$ or $H[X]$, is used for the set of (degree $k$) polynomials with coefficients in $H$.

Roots of Polynomials

These are values that can be assigned to $x$ that make the function evaluate to 0.

A polynomial of degree $k$ in $x$ has $k$ roots.
- $(r_1,r_2,\ldots,r_k)$
- These are not necessarily distinct, hence the rounded brackets.
- It is also possible that some are not real numbers.

Some examples of simple polynomials were given along with their roots and degrees.

Operations

Scalar multiplication is when you multiply each $c_k$ by $\alpha$.

We are expected to know how to add, multiply and divide polynomials.

Finding Roots

Finding roots for polynomials can be very useful for optimisation, using spectral methods for image compression and image compression.

Completing these tasks programmatically can be very taxing, especially for large polynomials.

We can give a finite description of an irrational number as the root of a minimal polynomial.

It is not possible to define all real numbers as the root of a polynomial with only rational coefficients $Q[X]$. A famous example of this is $\pi$.

Vectors & Vector Operations

COMP116 Lectures 2021-01-28

$n$-vectors

A set of numbers, $H$.
- $H$ is one of: $\Bbb{N,W,Z,Q;R}$.
A natural number, $n\in\Bbb{N}$.
An $n$-vector over $H$, has $n$ components: $\underline x = \langle x_1,x_2,\ldots,x_n\rangle$
Each $x_k\in H$.

This is not a set: we cam have $x_s=x_t$ and $s\neq t$.

The order of components matters: $\langle 1,2,1\rangle\neq\langle1,1,2\rangle$.

Notation & Operands

In the presentation the lecturer will use the following notation:

Arbitrary vectors: $\mathbf{\underline x,\underline y, \underline u, \underline v; \underline w}$.
Writing vectors in full: $\langle u_1,u_2,\ldots,u_m\rangle$.
For points we use: $(u_1,u_2,\ldots,u_m)$.

As this notation for arbitrary vectors is tedious to type the following will be used:

Arbitrary vectors: $\vec x, \vec y,\vec u,\vec v, \vec w$.

Operations

Addition: $+$
Scalar multiple: $*$
Size or norm: $\Vert u\Vert$ or $\Vert u\Vert_2$
Product: $\cdot$ or $\times$
- You can use: $u\cdot v=\Vert u\Vert \Vert v\Vert cos\theta$.

Given $n$-vectors: $\vec u,\vec v,\vec w$ etc. and a constant $\alpha\in H$.

Addition
- $\vec w=\vec u+\vec v$
- $w_k+v_k\forall k1\leq k\leq n$ This is to say that if you want to add two vectors they need to have the same number of components.
Scalar Multiple
- $\vec w=\alpha\vec u$
- $w_k=\alpha u_k\forall k1\leq k\leq n$
  
  This means that you just times all the components by the scalar.
Norm
- $\Vert u\Vert=\sqrt{\sum^n_{k=1}\vert u_k\vert^2}$
  
  This is just the Pythagorean theorem.
Product
- Definition is not given.

Attributes

A vector $\vec u$ has a size $\Vert u\Vert$ and a direction.
A vector doesn’t not have a position.
The operation can be interpreted geometrically.
The length of a vector resulting from adding two vectors may be written in terms of basic trigonometric relations.

Dot Product

For two vectors, their dot product is the cosine of the angle between them. This gives the following cases:

Right angles give a dot product of 0.
Parallel vectors give a dot product of -1.
- Position doesn’t matter and thus the direction doesn’t change this value.

Special Vectors

The zero vector $\vec 0$:
- All values are zero.
The basis vector $e_k$:
- All components are zero except the $k^{th}$ which is equal to 1.
- There are $n$ basis vectors, one for each $k$.
Unit vectors:
- Those whose magnitude is 1.
Orthogonal vectors:
- At right angles, whose dot product is 0.

Vector Spaces

COMP116 Lectures 2021-01-28

Vector spaces allow us to collect together vectors with similar characteristics.

Basics

We have $U$ - a set of $n$-vectors using $H$:

e.g. $U$ is the set of all 3-vectors from $\Bbb R$.

The set $U$ defines a vector space if it has two closure properties:

Adding any two vectors in $U$ produces a vector in $U$.
Multiplying any vector in $U$ by a constant in $H$ produces a vector in $U$.

Formal Definition

Given $U$ - a set of $n$-vectors using $H$. The set $U$ defines a vector space if: $\begin{align} \forall v\in U, \forall w\in U &:v+w\in U &\text{(C1)}\\ \forall v\in U, \forall \alpha\in H &: \alpha v\in U &\text{(C2)} \end{align}$

The underlying set $H$ is very important.

Examples

$\text{EVEN}=\{\langle x,y \rangle:x+y=2p,p\in\Bbb Z\}$

If $H=\Bbb Z_k$ then $\text{EVEN}$ is a vector space.

If $H=\Bbb R$ then $\text{EVEN}$ is not a vector space.
- Consider C2 using $\langle1,1\rangle,\alpha=\frac{1}{2}$.
$\text{ODD}=\{\langle x,y\rangle:x+y=2p-1,p\in\Bbb Z\}$

For this $\text{ODD}$ is not a vector space e.g. $H=\Bbb Z _{16}$:
- $\langle1,4\rangle\in\text{ODD},\langle4,1\rangle\in\text{ODD}$
- $\langle5,5\rangle\notin\text{ODD}$

Attributes

Suppose $U$ is a vector space of $n$-vectors from $\Bbb R$. Important notions are a basis set and its dimension.

The idea of basis allows us to reduce $U$, even if it is infinite, to at most $n$ distinct vectors. We do this by expressing members of $U$ as linear combinations of other vectors in $U$.

Linear Combinations

Take a set $B$ of $m$ vectors from $U$:

\[B=\{t_1,t_2,\ldots,t_m\}\]

Take a collection $\vec a$ of $m$ values at least one of which is not 0 and all non-zero values are from $H$.

\[\vec a=\langle\alpha_1,\alpha_2,\ldots,\alpha_m\rangle\]

$\vec w$ is a linear combination of $B$ using $\vec a$ is $\vec w$ is:

\[\sum^m_{k=1}\alpha_kt_k\]

Linear Dependence & Independence

Now suppose that $U$ is a vector space and $B$ is a subset of $k$ $n$-vectors from $U$ for which any $w$ in $U$ can be obtained as a linear combination of $B$?

That is:

\[\forall w\in U\exists\langle\alpha_1,\alpha_2,\ldots,\alpha_k\rangle:w=\sum^k_{i=1}\alpha_ib_i\]

Then only $B$ is needed to capture $U$.

Basis and Dimension

A subset $B$ of $U$ for which any vector in the vector space $U$ can be formed as a linear combination of $B$ is called a basis of $U$.

The number of vectors in the smallest basis of $U$ is called the dimension of $U$ denoted $\text{dim}(U)$.

A vector space $U$ of $n$-vectors have a dimension of at most $n$.

Example

\[\text{EVEN}=\{\langle x,y \rangle:x+y=2p,p\in\Bbb Z\}\]

We saw that this was a vector space for $H=\Bbb Z_4$.

It contains:

\[\left\{ \begin{align} \langle0,0\rangle, & \langle0,2\rangle, & \langle1,1\rangle, & \langle1,3\rangle,\\ \langle2,0\rangle, & \langle2,2\rangle, & \langle3,1\rangle, & \langle3,3\rangle \end{align} \right\}\]

$\text{dim(EVEN)}=2;B=\{\langle0,2\rangle,\langle3,1\rangle\}$

e.g. $\langle1,1\rangle=1 * \langle0,2\rangle+3 * \langle3,1\rangle$

Linear Transformations & Their Role in Computer Graphics

COMP116 Lectures 2021-02-05

Linear transformation s map between vector spaces. When we use some method $T$ to produce a vector $\vec v$ in some vector space $V$ from a vector $\vec u$ belonging to a vector space $U$.

Formally:

\[T:U\rightarrow V\]

$U$ and $V$ can be the same vector space:

\[U=V=R^2\]

These may also be different vector spaces:

\[U=R^3,V=R^2\]

An example of this is projecting 3D object to 2D displays.

Why Are These Interesting?

Graphical Effects
- Scaling an object, rotation may be implemented by matrix and vector multiplication.
I f an effect is a linear transformation then it is a matrix product.
If an effect is a matrix product then it is a linear transformation.

Matrices

A quick review of matrices was given. This included their anatomy and methods for multiplication.

Linear Transformation - Definition

Vector spaces $U$ and $V$ where $U$ may equal $V$.
$T$ a mapping from $U$ to $V$.
- $T:V\rightarrow V$
$T$ is a linear transformation if:
- $\forall p,q\in U:T(p+q)=T(p)+T(q)$
  
  This means that if we take any two vectors ($p,q$) in the vector space $U$, if we add those two vectors and then apply the mapping $T$ then that should be the same as adding together the mapping on each individual vector.
- $\forall\alpha\in H,\forall p\in U: T(\alpha p)=\alpha T(p)$
  
  The second condition is that if we take any constant value in the underlying set of numbers we are using to define the vectors, and we take any vector in the vector space; then the result of applying the transformation to the vector multiplied by the constant is the same as applying the transformation to the vector and multiplying the result by the constant.

Examples of 2D Effects

Scaling by a factor $s\in \Bbb R$ $\pmatrix{s&0\\0&s}\pmatrix{x\\y}=\pmatrix{sx\\sy}=s\pmatrix{x\\y}$
Rotating anti-clockwise around origin by $\theta^\circ$
\[\begin{align} &\pmatrix{\cos\theta&-\sin\theta\\\sin\theta&\cos\theta}\pmatrix{x\\y}\\ &=\pmatrix{x\cos\theta-y\sin\theta\\x\sin\theta+y\cos\theta} \end{align}\]

Examples of 3D Effects

Scaling by a factor $s\in \Bbb R$ $\pmatrix{s&0&0\\0&s&0\\0&0&s}\pmatrix{x\\y\\z}=\pmatrix{sx\\sy\\sz}=s\pmatrix{x\\y\\z}$

Rotation in 3D is more complex. You wouldn’t typically do this using matrix operations.

Translation

A very basic effect is move an object at $(x,y)$ to a position $(x+p,y+q)$. Similarly in 3D.

The mapping $T:R^2\rightarrow R^2$ defined through:

\[T(x,y)=(x+p,y+q)\]

This is not a linear transformation. This is also the same in 3D.

Homogenous Coordinates

Using homogenous coordinate systems we can implement translations by matrix-vector products.

Instead of using $\left(\begin{smallmatrix}x\\y\end{smallmatrix}\right)$ we use $\left(\begin{smallmatrix}x\\y\\1\end{smallmatrix}\right)$:

\[\pmatrix{1&0&p\\0&1&q\\0&0&1}\pmatrix{x\\y\\1}=\pmatrix{x+p\\y+q\\1}\]

By adding an extra row and column to the $2\times2$ matrices for scaling and rotation we can implement most effect via $3\times4$ vectors products.

Combining Effects

By multiplying the matrices together we can apply all three effects at the same time.

Care is needed since the outcome of move then scale is not the same as the of scale then move.

3D rotations add additionally complexity.

Week 1 Summary

COMP116 Lectures Summaries 2021-02-05

Number

Recall the different classes of number and recognise which is the most suitable in different contexts.
Be aware of how the call of real number originates and the idea of the subset of reals obtains as roots of polynomial expressions.

Typical Questions

You may be asked which type of number is the most suitable for a given application.

Polynomial Forms

You should be aware of:

The formal definition of polynomial of degree $k$ in $x$.
The notions of coefficient and the set $H_k[X]$ for numbers $H$.
Basic operations including addition and scalar multiplication.
The concept of roots of a polynomial properties.

You don’t need to know:

Root finding algorithms or methods.
How to multiply, factor or divide polynomials.

Typical Questions

Asked for the coefficient after expanding.
Convert a wordy question into a polynomial.
Ask for the degree of an expanded polynomial.

Vectors and Matrices

You should be aware of:

The form and attributes of $n$-vectors: direction, size. Component properties (ordered number types used).
Basic vector operations: addition, scalar multiple, size.
The properties and requirements of vector spaces.
The form take by matrices and simpler operations/.
The process of multiply a matrix by a vector.
The notions of vector space dimension , linear combination, linear independence and basis of vector space.

You don’t need to know:

The formal structure of vector product definitions: scalar, dot.
The matric concept of determinant, singularity, inverse.
How to calculate the product of matrices larger than 4.

Typical Questions.

What is the size of a vector.
The result of transformations.
Whether vectors are lineally independent.

Weak point.
Show that a vector can be described as a linear combination of the others if it is not independent.

Weak point.

Linear Transformation and Graphics

You should be aware of:

How linear transformations and matrix-vector are related.
What’s needed for a mapping to be linear transformation.

Weak point.
The structure of $2\times2$ and $3\times3$ scaling matrices.
The concept of homogeneous coordinates and matrices.
The way in which matrix produces are combines to realise different graphics effects.
The consequences of effect not being commutative.

Typical Questions

What the effect of applying a particular combination of matrices will be to a point in space.
How are a given sequence of operations realised by a sequence of matrix-vector products.
You may need to expand on the previous with additional comments.

Tutorial 1 Review

COMP116 Tutorials 2021-02-15

This tutorial focuses on the answers for the first tutorial on numbers and polynomials.

Question 5

We are given the following numbers which can be expressed by the way of the following polynomials: $\begin{align} 1567\rightarrow p(x)=&x^3+5x^2+6x+7\\ 3791\rightarrow q(x)=&3x^3+7x^2+9x+1 \end{align}$ By multiplying them together we get the following polynomial: $\begin{align} p\cdot q(x)=&3x^6+22x^5+62x^4+109x^3\\ &+108x^2+69x+7 \end{align}$ The link between the set of digits and the set of numbers is such that each index relates to it’s place value.

Multiplying integers can be expressed in terms of finding the coefficients from the product of two polynomials.

As finding coefficients can be done quickly by exploiting properties of polynomial multiplication; this results in multiplication methods that are significantly faster than classical methods (grid method).

Question 6

This question is to prove that we can apply question 5 in any number base.

Converting between bases will increase the degree by a factor of their difference (base 6 to binary $=\text{degree}\times3$).

Summary

The purpose of these two questions is to show that you can complete multiplication via the use of polynomials. This will be advanced upon later in the way of showing an optimisation of multiplication for very large numbers.

Partial Derivatives & Multi-variable Functions

COMP116 Lectures 2021-02-18

Consider a function such as:

\[z=-(3x^2+2y^2+xy)\]

What are the values of $x$ and $y$ which maximise $z$?

Are there any such values?

Partial Derivatives

For the prior function:

We need values of $x$ and $y$ which maximise $z$.
We know $x$ to maximise $z$ when $y$ is fixed.
We know $y$ to maximise $z$ when $x$ is fixed.
But we want to do things simultaneously.
Assume that $y$ is fixed and find the “right” $x$.
Assume that $x$ is fixed and find the “right” $y$.

From this we can make a set of simultaneous equations to solve.

Example

For $z=-(3x^2+2y^2+xy)$.

Differentiate as if $y$ was a constant:
\[f_x(x,y)=-(6x+y)\]
Differentiate $f(x,y)$ as if $x$ was a constant.
\[f_y(x,y)=-(4y+x)\]
Find the values of $x,y$ for which $f_x(x,y)=0,$ $f_y(x,y)=0$ by solving the simultaneous equations.

Notation

For functions of two variables ($z=f(x,y)$) we have:

$f_x(x,y)$ - The first partial derivative of $f(x,y)$ with respect to $x$.
$f_y(x,y)$ - The first partial derivative of $f(x,y)$ with respect to $y$.

Also used are:

\[\frac{\partial z}{\partial x}; \frac{\partial z}{\partial y}\]

For a function of $n$ variables:

\[z = f(x_1,x_2,\ldots,x_k,\ldots,x_n)\]

this style of notation will be used:

\[\frac{\partial z}{\partial x_k}\]

Example Continued

From the prior example the two simultaneous equations were:

\[\begin{aligned} -6x-y&=0\\ -x-4y&=0 \end{aligned}\]

The only solution is: $x=0, y=0$.

Therefore, this function is critical at the point $(0,0)$.

To test if this is maximum we need an analogue of the second derivative test.

Second Derivative Test with 2 Variables

With 2 variables there are 4 possible forms of the second order partial derivative.
With $n$ variables there are $n^2$.

How do we combine these?

The 4 forms: $(f_{xx},f_{xy},f_{yx},f_{yy})$ or:

\[\frac{\partial^2z}{\partial x^2},\frac{\partial^2z}{\partial x\partial y},\frac{\partial^2z}{\partial y\partial x},\frac{\partial^2z}{\partial y^2}\]

Interpretation of Notation

$f_{xx}$ - The partial derivative of $f_x$ with respect to $x$.
$f_{xy}$ - The partial derivative of $f_y$ with respect to $x$.

For $f(x,y)=-(3x^2+2y^2+xy)$:

\[\begin{aligned} f_{xx}&=-6\\ f_{xy}&=-1\\ f_{yx}&=-1\\ f_{yy}&=-4 \end{aligned}\]

In general $f_{xy}$ and $f_{yx}$ are identical functions.

Using the Test

There is a precondition:

\[(f_{xx}f_{yy}-(f_{xy})^2)(\alpha, \beta)>0\]

If $f_{xx}(\alpha, \beta)>0$ the point is a minimum.
If $f_{xx}(\alpha, \beta)<0$ the point is a maximum.
If $f_{xx}(\alpha, \beta)=0$ no conclusion can be made.

Where $(\alpha, \beta)$ is the critical point.

Vectors, Matrices & Graphical Effects

COMP116 Tutorials 2021-02-19

Vectors

Movement

3D vectors are ordered like so:

\[\langle x,y,z\rangle\]

With $z$ being the vertical component.

In 4D a 4$^{th}$ dimension is used for time:

\[\langle x,y,z,t\rangle\]

Size

Here is an example:

\[\vert\vert\langle1,-2,2,-4\rangle\vert\vert\]

You will generally be expected to find the euclidean distance.

This example will give:

\[\sqrt{25}=5\]

Spaces

Suppose you are given the following collection:

\[S=\{\langle p,q,r\rangle\in\Bbb Z ^3_8:p^q=q^r\}\]

Three vectors from the set of integers with modulo 8. When we take the first component and raise it to the power of the second modulo 8 we get the same as the second and third

\[H=\Bbb Z_8=\{0,1,2,3,4,5,6,7\}\]

$S$ is closed under scalar multiplication - False
$S$ is not closed under addition. - True
$S$ contains $\langle0,1,1\rangle$. - False
$S$ is a vector space. - False
$S$ does not contain $\langle1,1,0\rangle$. - False

Matrices

You can’t implement translation using a $2\times2$ vector. Scaling and rotation you can.

Example Questions

Polynomials and Properties

Degree 3 and $2x^2$.
Degree 4 and $4x^2$.
Degree 4 and $0x^2$.

Introduction to Complex Numbers

COMP116 Lectures 2021-03-01

Complex numbers revolve around the imaginary number $i$. It follows the following identities:

\[\begin{aligned} \sqrt{-1}&=i\\ i^2&=-1\\ (-i)^2&=(-1)^2\times i^2\\&=-1 \end{aligned}\]

In general, complex numbers are written in the form:

\[z=a+ib\]

$a$ is called the real part of $z$:
- $\text{Re}(z)$
$b$ is called the imaginary part of $z$:
- - $\text{Im}(z)$
The numbers $a+ib$ ($a$ and $b$ are both real numbers) form the set of complex numbers: denoted by $C$.

The pars of real numbers $(\text{Re}(z),\text{Im}(z))\ z\in C$ form the complex plane.

Basic Operations

Addition $z=u+v$

For addition of complex numbers you add the corresponding components together.

Complex Conjugate $\bar z$

The real part stays the same but the imaginary part is multiplied by $-1$.

$\overline{1+i}=1-i$

Modulus $\vert z\vert$

\[\vert z \vert =\sqrt{\text{Re}(z)^2+\text{Im}(z)^2}\]

We take the positive root.

Complex Operations

Scalar Multiplication $z=\alpha \times u$

This is as you would expect.

Complex Multiplication $z=u\times v$

You would expand the two like multiplying brackets.

Division $z=u\div v$

In order to divide we need to emulate $v^{-1}$ in complex numbers. This is defined as so:

\[\frac{1}{v}=\frac{\bar v}{\vert v\vert^2}\]

Additionally $v\neq 0$

We can check this by multiplying $v$ with $v^{-1}$ to get $1$.

Complex Number Representations

COMP116 Lectures 2021-03-01

Complex numbers as $2\times 2$ Matrices

For the complex number $z=a+ib$:

\[\mathbf M_z=\begin{pmatrix}a & -b \\ b & a\end{pmatrix}\]

The following operations will then apply:

Addition and Multiplication
- As normal.
Conjugate
- $\mathbf M_{\bar z}=\mathbf M_z^\top$ - This is the transpose of $\mathbf M_z$.
Modulus
- $\vert z\vert=\text{det }\mathbf M_z$

Argand Diagram

In this representation then you treat them as vectors.

Conjugate is reflection in the real axis.
The modulus is the size of the vector.

Multiplication and division don’t have a natural geometric analogy.

Polar Coordinates

For $z=a_ib$ with $\vert z\vert=r$. The number $z$ is described in polar form as:

\[z=(r,\theta)\]

The angle (phase) of the complex number is denoted as $\text{arg}\ z$. It can be shown as:

\[\begin{aligned} \text{arg}\ z &= \cos^{-1}\left(\frac{\text{Re}(z)}{\vert z\vert}\right)\\ &= \sin^{-1}\left(\frac{\text{Im}(z)}{\vert z\vert}\right) \end{aligned}\]

A graphical example of the polar coodinate system.

Euler Form

For $z=a_ib$ with $\vert z\vert=r$ and $\text{arg}\ z=\theta$. The number $z$ is described in Euler form as:

\[z=r\times e^{i\theta}\]

This gives the following identity:

\[z=re^{i\theta}=r(\cos\theta+i\sin\theta)\]

This relation is known as Euler’s Formula and leads to ($\forall\alpha\in\Bbb R$):

\[(\cos\theta+i\sin\theta)^\alpha=\cos(\alpha\theta)+i\sin(\alpha\theta)\]

Consequences of Euler Form

For $\vert u \vert = s$ and $\text{arg} u=\sigma$, $\vert v\vert=t$ and $\text{arg}v=\tau$:

\[uv = se^{i\sigma}te^{i\tau}=(st)e^{i(\sigma+\tau)}\]

Looking at Argand diagram: the phase resulting from the product of two complex numbers is the result of adding the phase of each.

There are infinitely many representations of any single complex number. To avoid this we use the principle value of $\theta$.

Principle values can be written as $\varphi_m$.

Primitive Roots of Unity

For any $k\in\Bbb N$ there are exactly $k$ primitive roots of unity.

Example

Identify a primitive fourth root of unity modulo 8.

For each value:

\[p\in\{0,1,2,3,4,5,6,7,8\}\]

Compute $p^4(\mod 8)$:

\[\{0,1,0,1,0,1,0,1\}\]

If it divides wholely then we don’t count it.

The four primitive roots are:

\[\{1,3,5,7\}\]

The Fourier Transform & Fast Arithmetic

COMP116 Lectures 2021-03-04

The Fourier transform is an operation that maps between K$n$-vectors of complex values and $n$-vectors of complex values:

\[F:C^n\leftrightarrow C^n\]

The $\leftrightarrow$ indicates that this is a two-way process.

This transformation is very useful for reducing noise and distortion in signal processing.

It is also useful for a method of integer multiplication called the Schönhage-Strassen method.

Formal Definition

Given a complex vector:

\[\vec x= \langle x_0,x_1,\ldots,x_{n-1}\rangle\in C^n\]

This should be indexed from 0.

The set $C^n$ is a complex $\mathbf n$-vector.

The Fourier transform, $F(x)$ is:

\[\vec y = \langle y_0,y_1,\ldots,y_{n-1}\rangle \in C^n\]

The components of the previous vector are defined by the following equation:

\[y_t=\sum^{n-1}_{k=0}x_ke^{\frac{-2\pi itk}{n}}\]

By writing $\omega_k \equiv e^{\frac{-2\pi ik}{n}}$ we can now rewrite this equation as:

\[y_t=\sum^{n-1}_{k=0}x_k\omega_k^t\]

Important Aspects

The object $\omega_k$ is a primitive $n^{th}$ root of unity:
\[\omega_k^n=1\]
This is to say if we take $\omega_k$ and raise it to the power $n$ then we get the output one.
We can describe the computation as a matrix-vector product:
\[\begin{pmatrix} y_0\\y_1\\\vdots\\y_{n-1} \end{pmatrix} = \begin{pmatrix} (\omega_0)^0 & (\omega_1)^0 & \cdots & (\omega_{n-1})^0\\ (\omega_0)^1 & (\omega_1)^1 & \ddots & (\omega_{n-1})^1\\ \vdots & \vdots & \ddots & \vdots\\ (\omega_0)^{n-1} & (\omega_1)^{n-1} & \cdots & (\omega_{n-1})^{n-1} \end{pmatrix} \begin{pmatrix} x_0\\x_1\\\vdots\\x_{n-1} \end{pmatrix}\]
- The shorthand for this is:
  \[\vec y= \mathbf M_F\vec x\]

Useful Properties

It is easily invertible.

Given $\vec y\in C^n$ we can find $\vec x \in C^n$ with $\vec y = F(\vec x)$ by computing:
\[\vec x = \frac{F(\vec y)}{n}\]
It is a linear transformation therefore:
\[F(\alpha\vec x + \vec y) = \alpha F(\vec x) + F(\vec y)\]
It has a convolution property.

This was not explained. This is the reason that we can complete fast multiplication.

Fast Multiplication

Traditional methods to multiply two $n$ digit numbers, $X$ and $Y$, take each of the digits in $X$ and multiply each of the digits in $Y$.

This involves about $n^2$ basic operations.

Karatsuba’s Algorithm

graph TD
subgraph x
ab[A, B] --> ab2[A+B]
ab --> ac[A x C]
end
ab2 --> abcd["(A+B)x(C+D)"]
subgraph y
cd[C, D] --> cd2[C+D]
cd --> bd[DxD]
end
cd2 --> abcd
ac --> abcd2["(A+B)x(C+D)-(AxC)-(BxD)=(AxD)+(BxC)"]
bd --> abcd2
abcd --> abcd2

$A$ and $B$ are two equal length parts of the number $X$. The same is done for $Y$.

This is equivalent to:

\[\begin{aligned} X\times Y&=(A\times10^m+B)(C\times10^m+D)\\ &=(A\times C)\times 10^{2m}+(A\times D+B\times C)\times10^m+(B\times D) \end{aligned}\]

This method only uses three recursive calls and computes:

\[A\times C;B\times D;(A+B)(C+D)\]

The term $A\times D+B\times C$ needed is then just:

\[(A+B)(C+D)-A\times C-B\times D\]

The extra addition saves one call and the algorithm only needs $~n^{1.59}$ basic operations. This is much less than $n^2$

This method splits into two and exchanges multiplications for additions. By splitting more times then the additions add up and the effectiveness is reduced.

Schönhage-Strassen Algorithm

This method divides a $n$-digit numbers into roughly $\sqrt n$ parts each having $\sqrt n$ digits.

The convolution property allows the multiplication to be done quickly via the Fourier transform in $~n\log n$ steps.

Summary

The Fourier transform provides a powerful range of methods of importance in electronics and signal processing.
Its basis is in the complex analysis notion of primitive root.
In computer science one of the most significant applications was in its use to develop fast algorithms for multiplication.
Other uses include image compression.
Fast algorithms to compute the Fourier transform have also been developed.

Complex Valued Functions - 1 (Differential Calculus)

COMP116 Lectures 2021-03-04

The main issue with complex numbers is that they cannot be ordered.

These issues mainly effect the following class of functions:

\[f:\Bbb C \rightarrow \Bbb C\]

Functions with Real Domain & Complex Range

The special case $f:\Bbb R\rightarrow \Bbb C$ is fairly straightforward since $f(x)$ is two rel valed functions:

\[f_{\text{Re}}:\Bbb R\rightarrow \Bbb R;f_{\text{Im}}:\Bbb R\rightarrow \Bbb R\]

So that:

\[f(x)=f_{\text{Re}}(x)+if_{\text{Im}}(x)\]

Therefore the first derivative is represented by the following:

\[f'(x)=f'_{\text{Re}}(x)+if'_{\text{Im}}(x)\]

Integration would give a similar equation:

\[\int^q_p f(x)dx=\int^q_p f_{\text{Re}}(x)dx+i\int^q_p f_{\text{Im}}(x)dx\]

Functions with Complex Domain & Range

When $f:\Bbb C\rightarrow \Bbb C$ more care is needed.

Consider $f(z)$ where $z=p+iq$

We have real valued functions $u(p,q)$ and $v(p,q)$:

\[\begin{aligned} &f(z)=f(p+iq)\\ \equiv& f(p,q)=u(p,q)+iv(p,q) \end{aligned}\]

So we have a function of two real variables mapping to two real-valued functions of two real variables.

This suggest applying partial derivatives.

Cauchy-Reimann Conditions

In order for $f’(z)$ to be sensibly defined bia partial derivatives we respect to $\text{Re}(z)$ and $\text{Im}(z)$ of $\text{Re}\left(f(z)\right)$ and $\text{Im}\left(f(z)\right)$ (the 2 variable functions $u(p,q)$ and $v(p,q)$ of the previous slide) there are some important preconditions that $u(p,q)$ and $v(p,q)$ have to satisfy.

There are called the Cauchy-Riemann conditions:

\[\frac{\partial u}{\partial p}=\frac{\partial v}{\partial q}\] \[\frac{\partial u}{\partial q}=-\frac{\partial v}{\partial p}\]

These are conditions on functions.

These conditions say that it doesn’t matter that there are many ways of approaching zero as they are all equivalent.

Example

For $f(z)=z^2$ where $z=p+iq$. We have:

\[\begin{aligned} u(p,q)&=\text{Re}\left(f(z)\right)\\ &=\text{Re}\left((p+iq)^2\right)\\ &=p^2-q^2\\ v(p,q)&=\text{Im}\left(f(z)\right)\\ &=\text{Im}\left((p+iq)^2\right)\\ &=2pq \end{aligned}\]

Lets apply the first Cauchy-Reimann condition:

\[\begin{aligned} \frac{\partial u}{\partial p}&=2p\\ \frac{\partial v}{\partial q}&=2p \end{aligned}\]

These two are the same so we know the first condition is met.

And the second condition:

\[\begin{aligned} \frac{\partial u}{\partial p}&=-2q\\ \frac{\partial v}{\partial q}&=2q \end{aligned}\]

This condition is also satisfied.

Notice:

\[\begin{aligned} f'(z)&=2z\\ &=2p+2iq\\ &=\frac{\partial u}{\partial p}+i\frac{\partial v}{\partial p}\\ &=\frac{\partial v}{\partial q}-i\frac{\partial u}{\partial q} \end{aligned}\]

The Cauchy-Riemann conditions establish the validity of this rule.

Additional Example Notes

The conjugate function does not satisfy the Cauchy-Riemann conditions. This means that it is not able to be differentiated.

Consider $f(z)=\frac{1}{z}$. From the complex division operation we must rewrite this like so:

\[\frac{1}{z}=\frac{\bar z}{\vert z \vert^2}=\frac{\bar z}{\left(\text{Re}(z)\right)^2+\left(\text{Im}(z)\right)^2}\]

We can let $z=p+iq$

\[f(p+iq)=\frac{p-iq}{p^2+q^2}=u(p,q)-iv(p,q)\]

Here:

\[\begin{aligned} u(p,q)&=\frac{p}{p^2+q^2}\\ v(p,q)&=\frac{-q}{p^2+q^2} \end{aligned}\]

These two functions satisfy the Cauchy-Riemann conditions.

Complex Valued Functions - 2 (Integral Calculus)

COMP116 Lectures 2021-03-04

In real integration this can be likened to measuring the area under the graph. This is not true for complex numbers.

Complex integration can be useful in finding the average case of an algorithm.

Integrating Complex Numbers

Suppose $f:\Bbb C\rightarrow \Bbb C$ and $p,q\in \Bbb C$.

How do we interpret:

\[\begin{aligned} \int^q_pf(z)dz=&\int^q_p\text{Re}(f(z))dz\\ &+i\int^q_p\text{Im}(f(z))dz \end{aligned}\]

The complex numbers are not ordered, so (unlike the reals) we cannot think in terms of area between $p$ and $q$.

We can think in terms of: move from the point $p$ to the point $q$ in the complex plane.

Curves and Contours

Suppose we have two points $p,q$ in the complex plane.

Intuitively, we understand how to connect how to connect these by drawing some curve from one to the other:

A curve that connects two points in the complex plane.

Parameterised Curves

We can move between points by describing the path to use (and its direction). This was $\gamma$ and $\mu$ in the example.

When we do this the integral concerned is defined as:

\[\int_\gamma f(z)dz\]

The difference here is that we aren’t giving bounds but the curve we are integrating with respect to.

In the special case of closed curves (when the start and end coincide) we have so-called contour integrals:

\[\oint_\mu f(z)dz\]

Parametrisation of Curves

We have an interval we are interested in: $[\text{Re}(p),\text{Re}(q)]$.

As before $p$ is the start point and $q$ is the end point.

How do we describe moving along a path $\gamma$?

By using a function $c:\Bbb R\rightarrow \Bbb C$ for which:
\[c(\text{Re}(p))=p;c(\text{Re}(q))=q\]
These properties must be satisfied. The resultant $p$ and $q$ are complex numbers.

We can split this above function into both real and imaginary parts:
\[x(t) = \text{Re}(c(t));y(t)=\text{Im}(c(t))\]
This means that for any $t:\text{Re}(p)\leq t\leq \text{Re}(q)$:
\[c(t) = x(t)+i\times y(t)\]

This is called a parametrisation of the curve.

Application

It can be shown that, when the function $c:\Bbb R \rightarrow \Bbb C$ parametrises the curve $\gamma$ joining $p$ and $q$ then:

\[\int_\gamma f(z)dz\equiv\int_{\text{Re}(p)}^{\text{Re}(q)}f(c(t))c'(t)dt\]

The domain of the right hand integral is the reals. This means that we can use standard integration by writing:

\[\begin{aligned} &\int_{\text{Re}(p)}^{\text{Re}(q)}\text{Re}\left(f(c(t))c'(t)dt\right)\\ &+\int_{\text{Re}(p)}^{\text{Re}(q)}\text{Im}\left(f(c(t))c'(t)dt\right) \end{aligned}\]

Example

Two examples are given from slide 8 of the lecture slides.

This example is also run through from 10:29-16:24:

The first is for an open curve and the second is for a close curve.

Counting Objects

In computer science we might want to find exact or asymptotic estimates for the number of object of a particular type having size $n$.

Two tools that are often used are the notion of a generating function and a result called the (generalised) Cauchy integral formula.

Generating Function

Suppose we write:

\[G(z)=\sum^\infty_{n=0}a_nz^n\]

The coefficient of $z^n$ describes the number of interest.

We can often manipulate this sum to find a closed form.

Example

Consider counting the number of binary trees that have a certain number of leaves. We will use the above function as our generating function.

A binary tree is either a single root node ($n=0$) or a root node and two sub-trees, whose number of leaves added is $n$.

After some manipulation this allows $B(z)$ to be written as:

\[B(z)=\frac{1-\sqrt{1-4z}}{2z}\]

If we can find $z^n$ easily then we are done.

The Generalised Cauchy Integral Formula

If we want to find $a_n$ from a closed form (say $g(z)$ for $G(z)$ its generating function) we could differentiate $g(z)$ $n$ times and extract the value we need. Or, letting $g^{(n)}(z)$ denote this derivative:

We can use the generalised Cauchy integral theorem. Choose a closed contour $C$ then:

\[g^{(n)}(z)=\frac{n!}{2\pi i}\oint_C\frac{g(\beta)d\beta}{(\beta - z)^{n+1}}\]

and just evaluate this at $z=0$.

Summary

This lecture doesn’t cover significant details of all aspects in this lecture.

Complex analysis is important in specialised areas such as:
- Algorithmics - counting.
- Estimating hard to count quantities.
- Average-case (as opposed to worse case) studies.

Quaternions and 3D Graphics

COMP116 Lectures 2021-03-06

How to we extend the complex plane to 3D? We do this by considering vector algebra with four components.

Quaternion Structure

Complex numbers can be written as 2-vectors $\langle x,y\rangle$.

Quaternions use points in $\Bbb R^4$ (all values are real) of the form:

\[q=(w,x,y,z)\]

Matching $z=x+iy$ we now have:

\[q=w+ix+jy+kz\]

Where: $i^2=j^2=k^2=-1$

And:

\[\begin{aligned} jk&=-kj=i\\ ki&=-ik=j\\ ij&=-ji=k \end{aligned}\]

Notation

$\Bbb H$ - The set of all quaternions.

The notation $\Bbb H$ is in honour of William Hamilton
$q\in\Bbb H$ - An arbitrary quaternion.
The forms $q=w+ix+jy+kz$ and $q[\alpha,\vec v]$ are both used.
- In the second of these $w\equiv \alpha \in \Bbb R;$ $\vec v=\langle x,y,.z\rangle\in\Bbb R^3$
  
  The vector $\vec v$ is a 3-vector in $\Bbb R$.

Operations

Addition
- Add the corresponding components.
Scalar Multiplication by $\alpha\in\Bbb R$
- Multiply individual components by $\alpha$.
Conjugate
- If $q=(w,x,y,z)$, $\bar q=(w,-x,-y,-z)$
- If $q=[\alpha,\vec v]$, $\bar q=[\alpha,-\vec v]$
Modulus
- $\vert\vert(w,x,y,z)\vert\vert=\sqrt{w^2+x^2+y^2+z^2}$
- Equivalently $\vert\vert[\alpha,\vec v]\vert\vert=\sqrt{\alpha^2+\vert\vert\vec v\vert\vert^2}$
- If $\vert\vert q\vert\vert=1$, $q$ is called a unit quaternion.
Division
- $q^{-1}=\frac{\bar q}{\vert\vert q\vert\vert^2}$

Multiplication

For two arbitrary quaternions $s,t\in\Bbb H$ how do we form their product?

Expanding brackets is complicated and tedious. Consider this form, for $s=[\alpha,\vec u]$ and $t=[\beta,\vec v]$, $s\times t=$:

\[[\alpha\beta-\vec u\cdot\vec v,\alpha\vec v+\beta\vec u+\vec u\times\vec v]\]

Pay attention to the dot product and the cross product.

Quaternion product is not commutative ($s\cdot t\neq t\cdot s$)

3D Graphics

Matrix-vector products don’t work well for rotation in 3D. Additionally they are computationally expensive.

Gimbal Lock

This phenomenon occurs when using matrix-vectors for 3D graphics. This is because you may ask for undefined values of trig functions. This will occur in errors.

Quaternions do not suffer from these errors.

Rotation with Quaternions

Suppose we are looking to rotate a point $(x,y,z)$ about some axis of rotation (an infinite line in 3D) $\vec v=\langle \alpha,\beta,\gamma\rangle$.

We wish to rotate through some angle $\theta^\circ$.

Choose the quaternion $q_\theta=[\cos(\frac{\theta}{2}),\sin(\frac{\theta}{2})\vec v]$

Since we are dealing with a line we can always choose $\vec v$ so that $q_\theta$ is a unit quaternion.
The computation “rotate $\vec w=\langle x,y,z\rangle$ by $\theta^\circ$ about $\vec v$” is just the quaternion product:
\[q_\theta\cdot[0,\vec w]\cdot q_\theta^{-1}=q_\theta^{-1}\cdot[0,\vec w]\cdot q_\theta\]

Computer Creativity

COMP116 Lectures 2021-03-06

Iterated Function Sequences

Let $f:\Bbb C\rightarrow \Bbb C$ be some complex valued function.

The $n^{\text{th}}$ iterate of $f$, denoted by $f_(n)$, when applied to $z\in\Bbb C$ is the complex number, denoted by $z_n$, that results by applying $f$ repeatedly (to the initial value $z_0$). Thus:

\[f_{(n)}(z)=\begin{cases}z & \text{if }n=0\\f\left(f_{n-1}(z)\right)&\text{if }n>0\end{cases}\]

Escape Radius

The complex plane ca be divided into two disjoint parts relative to some parameter $r$ called the escape radius of $f$:

\[\begin{aligned} \text{in}(f)&=\{z\in\Bbb C:\forall n\vert f_{(n)}(z)\vert\leq r\}\\ \text{out}(f)&=\{z\in\Bbb C:\exists k\vert f_{(f)}(z)\vert>r\} \end{aligned}\]

The escape radius can often be set as $r=2$.

Orbits

An orbit is a sequence that loops back on itself:

\[f(z_k)=z_0;f(z_j)=z_{j+1}\forall 0\leq j<k\]

Popular Sets

Interesting patters are generally of the form:

\[f(z)=z^2+c;c\in\Bbb C\]

The Mandelbrot Set classifies those $c$ for which:

\[\vert f_{(n)}(0)\vert\leq 2;\forall n\geq0\]

If $c$ is fixed the subset (using radius 2) $\text{in}(z^2+c)$ is called a Julia Set while the set $\text{out}(z^2+c)$ is called the Fatou Set.

Computing and Displaying Sets

Iterating a fixed number of times gives a good enough approximation.

The colour mapping of these sets often represent the number of times required to iterate before a point exceeds the escape radius.

Fractal Dimension

In Euclidean dimension, the dimension is based on whole numbers. Fractal sets have a dimension that is in the rationals ($\Bbb Q$).

Statistics & Data Analysis

COMP116 Lectures 2021-03-15

Experimental proof makes justifications more clear and accessible.

Methodology

Suppose we have some algorithm which we suspect runs efficiently most of the time. We can test this using the following experimental method:

Run the algorithm a number of times on different data.
Determine the average time taken.

This raises the following questions:

How do we select the data?
Data sizes to use?
Number of tests to perform?
How to present the results?

Population

This is the set of items from which object to test are chosen. It is often referred to by $\Omega$.

We should have a population of finite size.

The set forming a population may vary over time:
- The students forming a year 1 module.
The population may be fixed:
- The permutations of a set of numbers.
You can focus on a subset of the population.
- This is to find how characteristics of a specific set effect the population as a whole.

Sampling

You can’t consider everyone in a population for the following reasons:

The population is too large.
There may be an incomplete set of answers.
The responses may not be reliable.

This means that you need to sample a population with a random selection method.

Unbiased vs Biased

Each element $X\in\Omega$ has a probability $P[X]$ of being chosen:

\[\sum_{X\in\Omega}P[x]=1;0\leq P[X]\leq 1\]

For all members of the population the sum of their probabilities is 1.
Each members probability must be between 0 and 1 inclusive.

In an unbiased / uniform selection method:

\[P[x]=\frac{1}{\vert\Omega\vert}\]

Where $\vert\Omega\vert$ is the size of the population.

Every member has the same chance of being selected.

Why Use Biased?

Giving everyone an equal chance can be misleading in practice. This can give errors in our analysis.

Random Variables

Random variables are a methods of associating a variable with a member of a population:

A student from the 1$^{\text{\bf st}}$ year has a particular grade.

\[r:\Omega\rightarrow\Bbb R\]

Calculus in the Complex Plane

COMP116 Tutorials 2021-03-15

Expressing Complex Functions in Terms of $u$ and $v$.

Calculate for $z=x+iy$

$f(z)=z^3$

$(x+iy)(x+iy)(x+iy)$

$(x^2+2ixy-y^2)(x+iy)$

$x^3+x^2iy+2ix^2y-2xy^2-xy^2-iy^3$

$u(x,y)=x^3-3xy^2$

$v(x,y)=3x^2y-y^3$
$f(z)=\vert z\vert+ z^2$

$(x^2+y^2)+(x+iy)^2$

$(x^2+y^2)+x^2+2ixy-y^2$

$2x^2+2ixy$

$u(x,y)=2x^2$

$v(x,y)=2xy$

Cauchy-Riemann Conditions

$f(z)=z^3$

$u(x,y)=x^3-3xy^2$

$v(x,y)=3x^2y-y^3$

$\frac{\partial u}{\partial x}=3x^2-3y^2$

$\frac{\partial v}{\partial y}=3x^2-3y^2$

This part is valid.

$\frac{\partial u}{\partial y}=-6xy$

$\frac{\partial v}{\partial x}=6xy$

This part is also valid.
$f(z)=\vert z\vert+ z^2$

$u(x,y)=2x^2$

$v(x,y)=2xy$

$\frac{\partial u}{\partial x}=4x$

$\frac{\partial v}{\partial y}=2x$

The first condition is not met.

This means that this function is only differentiate-able at $x=0$.

$\frac{\partial u}{\partial y}=0$

$\frac{\partial v}{\partial x}=2y$

This condition is also not met.

First Derivative

The first derivative of $f(z)=z^3$ is:

\[f'(z)=3z^2\]

Statistical Measures - Expectation, Variance & Standard Deviation

COMP116 Lectures 2021-03-16

Random Variables & Measures

A random variable is just a real valued function over a population:

\[r:\Omega\rightarrow\Bbb R\]

Using a probability distribution:

\[P:\Omega\rightarrow [0,1]\]

we can find the expected value $E[X]$ of a random variable.

This is also sometimes called the average value or the mean value.

This is calculated using the following summation:

\[E[X]=\sum_{X\in\Omega}P[X]r(X)\]

This is the probability of choosing an element multiplied by the value of the associated random variable.

This $E[X]$ represents the typical value or what we expect to happen.

Dice Example

The expected value of throwing a fair die is $\frac{7}{2}$. This is calculated by:

\[E[X]=\sum^6_{k=1}\frac{1}{6}k=\frac{21}{6}=3.5\]

Suppose we are using a biased die with the following probabilities:

\[P[x]=\begin{cases}\frac14 &\text{if }X\in\{1,2,3\}\\\frac1{12}&\text{if }X\in\{4,5,6\}\end{cases}\]

We can then work out the expected value to be:

\[E[X]=\frac64+\frac{15}{12}=\frac{11}4=2.75\]

Mode & Median

Sometimes means can be misleading due to outliers; mode and median don’t take into account these outliers. You calculate them as you expect.

Variance

You may find that for a dice (with the same population and basis for random variables) you get distinct outcomes.

The idea of variance allows you to see the spread of the random variables are.

Formal Definition

We have a population $\Omega$ and a random variable $r(X)$; leading to expectation $E[X]$ using probability distribution $P[X]$.

The variance $\text{Var}(X)$ is defined to be:

\[\sum_{X\in\Omega}(r(X)-E[X])^2\]

Variance gives a measure of by how much the population as a whole differs from the expected value.

This is always non-negative.

Standard Deviation

Formally the exact standard deviation is:

\[\sigma \equiv\sqrt{\text{Var}(X)}\]

The variance is defined with respect to the whole population but it is not always feasible to compute that.

Estimated Standard Deviation

One way of estimating is by using a sampling method according the the probability distribution.

We take $N$ samples from $\Omega$:

\[\langle y_1,y_2,\ldots,y_N\rangle:y_k=r(X_k)\]

The estimated standard deviation is:

\[S_N\equiv\sqrt{\frac{\sum^N_{i=1}(y_i-E[Y])^2}N}\]

This is the variance of the sample, divided by the number of samples chosen. This is then rooted to find the standard deviation.

Notice that:

\[E[Y]=\frac{\sum^N_{i=1}y_i}N\]

This method captures how the population behaves b looking at a sample of its members.

This method sometimes gives underestimates.

Bessel’s Correction

This method attempts to correct for underestimates when estimating standard deviation.

Take $N$ sample from $\Omega$:

\[\langle y_1,y_2,\ldots,y_N\rangle:y_k=r(X_k)\]

The Bessel’s correction for estimated standard deviation:

\[S^B_N\equiv\sqrt{\frac{\sum^N_{i=1}(y_i-E[Y])^2}{N-1}}\]

The change is that we divide by the number of samples minus one.

Significance Testing

A typical experiment will have:

A predicted outcome: $X$
An actual outcome: $Y$

We want to know the chance of out prediction begin accurate given the outcome is likely.

To assess this we count the number of standard deviations by which $Y$ differs from $X$. If there are too many then the hypothesis isn’t valid.

Regression & Data Analysis

COMP116 Lectures 2021-03-18

This is the method that we use to draw trends from experimental data. This is better than a table as it makes the data clearer.

Example

How to we best fit a line to this graph:

$x$	1.5	2.4	3.6	4.2	4.8	5.7	6.3	7.9
$y$	1.2	1.5	1.9	2.0	2.2	2.4	2.5	2.8

It seems that a curve would be best suited.

Least Squares

The standard interpretation of best fit is least squares.

We have data observation pairs:

\[\{\langle x_k,y_k\rangle:1\leq k\leq n\}\]

We want a function $F:\Bbb R\rightarrow \Bbb R$ that best describes these (minimises):

\[\sum^n_{k=1}(y_k-F(x_k))^2\]

This gives smaller values if the function fits the data better.

Linear Functions & Regression

This function is in the following form:

\[F(x)=ax+b\]

This means that finding the best line is finding the combination of $a$ and $b$ which results in the least squares sum.

Applying Least Squares

To minimise we would use the following function:

\[F(a,b)=\sum^n_{k=1}(y_k-ax_k-b)^2\]

Notice that the values $\{\langle x_k,y_k\rangle:1\leq k\leq n\}$ are constants.

To solve this we use partial differentiation, finding $\left(\frac{\partial F}{\partial a},\frac{\partial F}{\partial b}\right)$, and find the critical points for these.

Best Line Fit Function

From that we find that the best line fit function is:

\[\left(\frac{nW_{xy}-W_xW_y}{nW_{xx}-(W_x)^2}\right)x+\left(\frac{W_{xx}W_y-W_{xy}W_x}{nW_{xx}-(W_x)^2}\right)\]

Where:

\[\begin{aligned} W_x&=\sum^n_{k=1}x_k\\ W_y&=\sum^n_{k=1}y_k\\ W_{xx}&=\sum^n_{k=1}x_k^2\\ W_xy&=\sum^n_{k=1}x_ky_k\\ \end{aligned}\]

Other Functions

Other functions can have the following forms:

Powers - $ax^b$
Exponentials - $ae^{bx}$
- Also can be written as $a\exp(bx)$.
Logarithms - $a\ln x+b$
Linear Rational Functions - $\frac1{ax+b}$

We may also have more than one parameter to optimise:

Quadratic Functions - $ax^2+bx+c$

Data Analysis Issues

The functions shown so far require manual selection and thus don’t always exactly match the pairs of data that have been observed.

Lagrange interpolation finds the best polynomial to fit a set of data.
Trigonometric interpolation finds the best trigonometric curve to fit a set of data.
- This is useful in signal processing.

Spectral Methods

COMP116 Lectures 2021-04-20

Spectral methods refer to the study of matrices.

Identity Matrix

The identity matrix $I$ is a matrix which is all zero apart from the lead diagonal:

\[\begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ \end{pmatrix}\]

For any matrix $A$:

\[A\times I = I \times A = A\]

This is valid for square matrices.

Matrix Inverse

For an $n\times n$ matrix $A$ its inverse is the matrix denoted by $A^{-1}$ for which:

\[A\times A^{-1}=A^{-1}\times A=I\]

Not every matrix has an inverse.

Singular matrices have an inverse.
Non-singular matrices don’t have an inverse.

Matrix Determinant

\[\det A \neq 0 \iff A^{-1} \text{ exists}\]

This is to say that $A$ is non-singular.

Eigenvalues

Given an $n\times n$ matrix $A$, find all scalr values $\lambda$ for which ther is an $n$-vector $\vec x_\lambda$ satisfying:

\[A\times \vec x_\lambda = \lambda \vec x_\lambda\]

These scalar values are called the eigenvalues of $A$.
The $n$-vectors $\vec x_\lambda$ are called the eigenvectors of $A$ (with respect to $\lambda$).

The case $\vec x = \vec 0$ always produces a solution. Theretofore, in finding eigenvectors, this case is not considered.

The case $\lambda = 0$ is of interest.

In general, finding eigenvalues involves fiding $\lambda$ for which:

\[(A-\lambda I)\vec x_\lambda = \vec 0\]

This is the same as finding:

\[\det(A-\lambda I)=0\]

Notice that $(A-\lambda I)$ is the matrix $A$ with $\lambda$ subtracted from each diagonal entry.

Solving via Polynomial Roots

The form $\det(A-\lambda I)=0$ has an interpretation involving roots of a polynomial.

Since $\det A$ (with some parameter $\lambda$) may be described as a polynomial of degree $n$ in $\lambda$ (with $A$ an $n\times n$ matrix) the solutions to $\det(A-\lambda I)=0$ are exactly the roots of this polynomial which is called the characteristic polynomial of $A$.

The characteristic polynomial of $A$ is denoted by: $\chi(A)$.

Consequences of Polynomial Similarity

An $n\times n$ matrix has $n$ eigenvalues (although these are not necessarily distinct).
- This is similar to any other polynomial.
Eigenvalues may be complex numbers.
- We won’t be studying these.
- Matrices with all real eigenvalues include symmetric matrices:
  \[[a_{ij}]=[a_{ji}]\]

Ordering Eigenvalues and Dominance

Conventionally eigenvalues are written as:

\[\sigma(A)=(\lambda_1,\lambda_2,\ldots,\lambda_k,\ldots,\lambda_n)\]

This is called the spectrum of $A$.

The eigenvalues are ordered from largest to smallest.
- For complex values, the modulus is used.
If $\lambda_1$ is unique, then it is called the dominant eigenvalue.
The largest eigenvalue is referred to as the spectral radius.
For complex values if $\lambda$ is an eigenvalue of $A$ then the conjugate, $\bar\lambda$, is too.

Computing Eigenvalues & Eigenvectors

COMP116 Lectures 2021-04-21

Dominance & Real Eigenvalues

It is often the case in applications that we don’t need all the eigenvalues for a given matrix.

We can check if a real matrix has a positive, real, largest eigenvalue using the Perron-Frobenius theorem.

These conditions also ensure existence of positive real eigenvectors for the associated eigenvalue.

This doesn’t check for dominant or uniqueness.

The Perron-Frobenius Conditions

If $A$ has only non-negative real elements and is irreducible:

There is unique positive real eigenvalue, $\lambda_{pf}$ associated with it.

Unique doesn’t imply dominant.
This eigenvalue has an eigenvector, $\vec x_{pf}$, that contains only positive real components.
Every other eigenvalue (including complex values) is such that $\vert\lambda\vert\leq\lambda_{pf}$.

The adjacency matrix of a strongly-connected directed graph is irreducible.

A directed graph is strongly-connected if there is a directed path from any node to any other node in the graph.

The Power Method

If $A$ has a dominant eigenvalue the power method allows an eigenvector associated with it to be found.

The method uses the fact that if $\vec x$ is an eigenvector of $\lambda$ of $A$ then for any constant $\alpha \in \Bbb R$, $a\vec x$ is also an eigenvector of $\lambda$.

Choose an initial guess.
- Commonly used: $\vec x_0=\langle 1,1,1,\ldots,1,1\rangle$.
Set $i=0$.
Compute $\vec x_{i+1}=A\cdot \vec x_i$.
Set $i=i+1$.
Continue from 3. until $i\geq \max$.
- $\max$ is user-defined.

Problems

Although the convergence to a dominant eigenvector is quick, the numbers involved in computing successive $\vec x_i$ become large so increasing computing time is needed.

This problem can be overcame using scaling. This adds the following step:

\[\vec x_i := \frac{\vec x_i}{\max\{\vert x_k\vert:x_k\in\vec x_i\}}\]

Divide each component of the current vector by the largest value in the vector itself.

This is after step 4.

This doesn’t affect the result (the vector is replaced with a scalar multiple of itself) and ensures the numbers are small.

Rayleigh Quotient

This method finds eigenvalues.

Suppose we have found (by the power method) the approximating eigenvector $\vec x$.

The Raleigh Quotient is:

\[\frac{(A\cdot\vec x)\cdot\vec x^T}{\vec x\cdot\vec x}\]

The transpose allows this to be calculated as a matrix by flipping the vector on it’s side.

This uses the dot product and reports the eigenvalue for $\vec x$. The dot product is calculated as so:

\[\vec x\cdot \vec y=\sum^n_{k=1}x_k\cdot y_k\]

Smallest Eigenvalue

We can find the smallest eigenvalue of $A$ is it is non-singular.

The smallest eigenvalue of $A$ is:

\[\frac 1 \lambda\]

where $\lambda$ is the largest eigenvalue of $A^{-1}$.

Spectral Graph Theory

The relationship between the spectra (adjacency matrices) of graphs and properties of the graphs themselves has been a long established study in algorithmics and computer science.

Usage Examples

The number of colours needed to properly colour the nodes of a graph so that no two linked nodes have the same colour is at most:
\[\lambda_1(G)+1\]
Round down for the upper bound.

and at least:
\[1-\frac{\lambda_1(G)}{\lambda_n(G)}\]
Round up for the lower bound.
The number of links in any graph is exactly:
\[\frac1 2\sum^n_{k=1}\left(\lambda_k(G)\right)^2\]
Round to the nearest whole number.

This relies on the fact that the sum of the $k^{th}$ powers of eigenvalues of an adjacency matrix divided by $k!$ reports the number of closed walks of length $k$ in the graph.

For $k\geq4$ we include degenerate cases as you are able to go back an forth between already visited edges.

Spectral Methods - Google's Page Rank Algorithm

COMP116 Lectures 2021-04-22

Consider that web-pages form a directed graph. There is a directed edge from one node to another if that page links to it.

We should order these nodes such that the most important is at the top.

If a node is pointed to by a lot of other pages then it is more important.
You can also score the sources that point to certain pages.

Modelling this with Matrices

Consider the following variables:

Set of pages (graph nodes):
- $\{p_1,p_2,\ldots,p_n\}$
Set of links (graph edges):
- $\{\langle p_i,p_j\rangle\}:p_i\text{ links to }p_j$
The score for page $p_k$ that we want to compute:
- $r_k$
The number of links out of page $p_i$:
- $r_i$

The score is defined by the following equation:

\[r_k=\sum_{\langle p_i,p_k\rangle}\frac{r_i}{t_i}\]

This is the sum over the pages that link to $p_k$.

$t_i$ is the total number of links from $p_i$ (the source page).

Conditions

The score vector $\vec r=\langle r_1,r_2,\ldots,r_n\rangle$ must satisfy:

\[W\cdot \vec r^T=\vec r^T\]

where $W$ is the $n\times n$ matrix with:

\[w_{ij}= \begin{cases} 0 & \text{if }\langle p_j,p_i\rangle\notin\text{ link}\\ \frac1{t_j} & \text{if }\langle p_j,p_i\rangle\in\text{ link} \end{cases}\]

Bad Lecture Example

Consider the following set of pages:

flowchart TD
1 --> 2
2 --> 3
1 <--> 3
3 --> 5
4 --> 1
5 --> 4
3 --> 4

This gives the following weight matrix:

\[W= \begin{pmatrix} 0 & 0 & \frac 1 3 & 1 & 0\\ \frac 1 2 & 0 & 0 & 0 & 0\\ \frac 1 2 & 1 & 0 & 0 & 0\\ 0 & 0 & \frac 1 3 & 0 & 1\\ 0 & 0 & \frac 1 3 & 0 & 0\\ \end{pmatrix}\]

All of the columns add to 1. This means that it is column stochastic and there is guaranteed to be an eigenvalue of 1.

Connection with Spectra

We are looking for a score vector that satisfies $W\cdot\vec r^T=\vec r^T$.

This means that the score vector is an eigenvector of $W$ for an eigenvalue of 1.

It can be shown that 1 is always an eigenvalue of column stochastic matrices.

For suitable graphs this eigenvalue is dominant.

This means that the score vector is unique.

Bad Lecture Example Continued

For our five page example we get the following score vector:

\[\vec r = \langle 1, \frac 1 2, 1, \frac 2 3, \frac 1 3\rangle\]

This means that pages 1 and 3 are the highest ranked.

Issues with this Model

A web page that has no outgoing links is called an dangling page.
- These require some adjustments to be made as it makes the matrix not column stochastic.
A page could also manipulate outcomes by altering the link structures.

Example Question

Weighting Matrix

Consider the following graph representing webpage links:

graph LR
2 --> 4
4 --> 1
4 --> 5
5 --> 3
2 --> 1
3 --> 2
1 --> 3

Define the weighting matrix for the previous graph:

First we form the 5 vector that represents the number of edges leaving a node:
\[\vec t =\langle1,2,1,2,1\rangle\]
Using the following cases construct the weighting matrix:
\[w_{ij}= \begin{cases} 0 & \text{if }\langle p_j,p_i\rangle\notin\text{ link}\\ \frac1{t_j} & \text{if }\langle p_j,p_i\rangle\in\text{ link} \end{cases}\]
This states that if there is a link in the previous $n$-vector that the you should but $\frac 1 {t_j}$ in the matrix:
\[W= \begin{pmatrix} 0 & \frac 1 2 & 0 & \frac 1 2 & 0\\ 0 & 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0 & 1\\ 0 & \frac 1 2 & 0 & 0 & 0\\ 0 & 0 & 0 & \frac 1 2 & 0\\ \end{pmatrix}\]
The rows represent the source and columns are the sink.

As before, as this matrix is column stochastic, 1 is an eigenvalue of this matrix.

Ranking Vector

To calculate the ranking vector, solve the following equation:

\[W\times \vec r_k=\lambda \vec r_k\]

Where:

$W$ - Weighting matrix
$\vec r_k$ - The ranking vector
$\lambda$ The eigenvalue of $W$

As we know that 1 is an eigenvalue of the weighting matrix this can be simplified to:

\[W\times \vec r_k=\vec r_k\]

When multiplying $W$ and $r_k$ symbolically we get the following equations for each index of $\vec r_k$:

$r_1=\frac 1 2 r_2+\frac 1 2 r_4$
$r_2=r_3$
$r_3=r_1+r_5$
$r_4=\frac 1 2 r_2$
$r_5=\frac 1 2 r_4$

Solve these equations simultaneously to get the final ranking vector of:

\[\vec r_k=\langle 1, \frac 4 3,\frac4 3,\frac 2 3,\frac1 3\rangle\]

Spectral Methods - SVD & Image Compression

COMP116 Lectures 2021-04-22

An image can be represented as a matrix where each cell used 24 bits to encode the colour as RGB.

To compress it we can use the following method:

Given an $H\times W$ image $P$, let $M$ be the corresponding matrix.

$M$ will not typically be a square matrix.

Exploiting Spectra

Any $H\times W$ matrix $M$ can be described as the product of three matrices $U,S,V$:

$U$ is the same dimensions as the original matrix.
$S$ and $V$ are square matrices the same dimensions as $W$.

The original matrix can be recovered like so:

\[M=U\cdot S\cdot V\]

The Three Matrices

This method appears to use more space as shown below:

\[HW+W^2+W^2=W\cdot(H+2W)\]

The compression comes from removing rows and columns in $S$. As a result we have to remove the same:

Columns in $U$.
Rows in $V$.

This is to keep that matrix proportions.

Suppose we keep only $k$ rows and columns giving $S_k$. We end up with space:

\[k\cdot(H+k+W)< HW\]

This can be much smaller than $HW$.

Example

For an image with dimensions $2500\times2500$ requiring 18.75 Mb of space (using the RGB scheme). If it is the be encoded with SVD using a maximum of 3 Mb, what are the maximum dimensions for $S$.

Currently the SVD approach uses this much space:
\[3(2500^2+2500^2+2500^2)\approx56.25\text{Mb}\]
We can use the following equation from earlier to scale the image to fit a new size (the centre value is the matrix $S$):
\[3(2500k+k^2+2500k)\]
Solve this equation with respect to the required size:
\[3(2500k+k^2+2500k)=3\times10^6\]

This gives a $k$ value as $k=192$ as you round down to satisfy the size requirement.

Calculating $U,S$ and $V$

Given $M$, form the matrices:

\[\begin{aligned} M&\cdot M^T\\ M^T&\cdot M \end{aligned}\]

$M^T$ is the transpose of $M$.

These two matrices are square and symmetric. This means that:

All of their eigenvalues are real numbers.
Their spectra are identical:
\[(\lambda_1,\lambda_2,\ldots,\lambda_i,\ldots,\lambda_n)(\lambda_i\leq\lambda_{i+1})\]

To calculate $S,U$ and $V$:

$S:s_{ij}=0$ if $i\neq j$ and $s_{ii}=\sqrt{\lambda_i}$
$U:$ $k^{th}$ column an eigenvector of $M\cdot M^T$ with $\lambda_k$
$V:$ $k^{th}$ row an eigenvector of $M^T\cdot M$ with $\lambda_k$

Choosing What to Throw Away

Small eigenvalues aren’t going to make much difference ot he matrix product outcome.

So to reduce space:

Throw away all by the $k$ largest values in $S$.
Throw away the matching eigenvectors in $U$ and $V$.

Summary

The technique of using three matrices to represent one is called singular value decomposition (SVD).
Both Python (numpy) and Java (jama) provide packages to compute SVD decompositions.

Spectral Methods

COMP116 Tutorials 2021-04-29

Bipartite Graphs

Graphs which are bipartite may have the nodes split into two sets with no two nodes in the same set being linked by an edge.

They have a spectrum which is symmetric:

\[(\lambda_1,\lambda_2,\lambda_3,\ldots,-\lambda_3,-\lambda_2,-\lambda_1)\]