Statistics & Data Analysis
Experimental proof makes justifications more clear and accessible.
Methodology
Suppose we have some algorithm which we suspect runs efficiently most of the time. We can test this using the following experimental method:
- Run the algorithm a number of times on different data.
- Determine the average time taken.
This raises the following questions:
- How do we select the data?
- Data sizes to use?
- Number of tests to perform?
- How to present the results?
Population
This is the set of items from which object to test are chosen. It is often referred to by $\Omega$.
We should have a population of finite size.
- The set forming a population may vary over time:
- The students forming a year 1 module.
- The population may be fixed:
- The permutations of a set of numbers.
- You can focus on a subset of the population.
- This is to find how characteristics of a specific set effect the population as a whole.
Sampling
You can’t consider everyone in a population for the following reasons:
- The population is too large.
- There may be an incomplete set of answers.
- The responses may not be reliable.
This means that you need to sample a population with a random selection method.
Unbiased vs Biased
Each element $X\in\Omega$ has a probability $P[X]$ of being chosen:
\[\sum_{X\in\Omega}P[x]=1;0\leq P[X]\leq 1\]- For all members of the population the sum of their probabilities is 1.
- Each members probability must be between 0 and 1 inclusive.
In an unbiased / uniform selection method:
\[P[x]=\frac{1}{\vert\Omega\vert}\]Where $\vert\Omega\vert$ is the size of the population.
Every member has the same chance of being selected.
Why Use Biased?
Giving everyone an equal chance can be misleading in practice. This can give errors in our analysis.
Random Variables
Random variables are a methods of associating a variable with a member of a population:
A student from the 1$^{\text{\bf st}}$ year has a particular grade.
\[r:\Omega\rightarrow\Bbb R\]