Regression & Data Analysis
This is the method that we use to draw trends from experimental data. This is better than a table as it makes the data clearer.
Example
How to we best fit a line to this graph:
$x$ | 1.5 | 2.4 | 3.6 | 4.2 | 4.8 | 5.7 | 6.3 | 7.9 |
---|---|---|---|---|---|---|---|---|
$y$ | 1.2 | 1.5 | 1.9 | 2.0 | 2.2 | 2.4 | 2.5 | 2.8 |
It seems that a curve would be best suited.
Least Squares
The standard interpretation of best fit is least squares.
We have data observation pairs:
\[\{\langle x_k,y_k\rangle:1\leq k\leq n\}\]We want a function $F:\Bbb R\rightarrow \Bbb R$ that best describes these (minimises):
\[\sum^n_{k=1}(y_k-F(x_k))^2\]This gives smaller values if the function fits the data better.
Linear Functions & Regression
This function is in the following form:
\[F(x)=ax+b\]This means that finding the best line is finding the combination of $a$ and $b$ which results in the least squares sum.
Applying Least Squares
To minimise we would use the following function:
\[F(a,b)=\sum^n_{k=1}(y_k-ax_k-b)^2\]Notice that the values $\{\langle x_k,y_k\rangle:1\leq k\leq n\}$ are constants.
To solve this we use partial differentiation, finding $\left(\frac{\partial F}{\partial a},\frac{\partial F}{\partial b}\right)$, and find the critical points for these.
Best Line Fit Function
From that we find that the best line fit function is:
\[\left(\frac{nW_{xy}-W_xW_y}{nW_{xx}-(W_x)^2}\right)x+\left(\frac{W_{xx}W_y-W_{xy}W_x}{nW_{xx}-(W_x)^2}\right)\]Where:
\[\begin{aligned} W_x&=\sum^n_{k=1}x_k\\ W_y&=\sum^n_{k=1}y_k\\ W_{xx}&=\sum^n_{k=1}x_k^2\\ W_xy&=\sum^n_{k=1}x_ky_k\\ \end{aligned}\]Other Functions
Other functions can have the following forms:
- Powers - $ax^b$
- Exponentials - $ae^{bx}$
- Also can be written as $a\exp(bx)$.
- Logarithms - $a\ln x+b$
- Linear Rational Functions - $\frac1{ax+b}$
We may also have more than one parameter to optimise:
- Quadratic Functions - $ax^2+bx+c$
Data Analysis Issues
The functions shown so far require manual selection and thus don’t always exactly match the pairs of data that have been observed.
- Lagrange interpolation finds the best polynomial to fit a set of data.
- Trigonometric interpolation finds the best trigonometric curve to fit a set of data.
- This is useful in signal processing.