Support Vector Machines
Optimal Hyperplane
Support vector machines create a hyperplane that linearly separates a set of data. This is called the optimal hyperplane.
The equation of optimal hyperplane takes the form:
\[g(x)=\mathbf x^T+b=0\]Support Vectors
Support vectors are the closes data-points in the dataset to the optimal hyperplane. They are used to determine the location of the hyperplane that separates the set.
Using the function for the optimal hyperplane we can say:
\[g(\mathbf x_\text{sv})=1\]The euclidean distance of a support vector from the optimal hyperplane can be calculated by:
\[\begin{aligned} \text{distance}\ r &= \frac{g(\mathbf x)}{\parallel \mathbf w\parallel_2}\\ &=\frac1{\parallel \mathbf w\parallel_2} \end{aligned}\]Non-Separable Patterns
We can either use a slack variable ($\xi$) if the data is close to linearly separable, or we can transform the data to become linear.
Kernels
We can project each patterns $\mathbf x_i$ to a new feature space of higher dimensionality (compared to the original input space) using a projection function $\phi$:
\[\phi(\mathbf x_i):\mathbb R^d\rightarrow\mathbb R^b\]We can use various types of kernels but the general form of the kernel machine is:
\[\begin{aligned} K(\mathbf x_i, \mathbf x_j)&=\phi(\mathbf x_i)^T\phi(\mathbf x_j)\\ &=\sum^b_{k=1}\phi_k(\mathbf x_i)\phi_k(\mathbf x_j) \end{aligned}\]