This week’s machine learning course is about SVM, which is a very important machine learning algorithm.
What is SVM (Support Vector Machine)?
SVM is like logistical regression. It has the same way to solve z, which is ϴ’*X.
The difference is the cost function. SVM’s cost function is two simple straight lines for y == 1; and symmetrically, two other straight lines for y == 0.
This is computationally more efficient. It also makes effort to make θ’x >=1 when y = 1 (not merely making θ’x >=0 ), and make tθ’x <=-1 when y = 0 (not merely making θ’x <0).
By solving the minimization of (this modified version of) cost function SVM, thanks to Large Margin Technique, we can draw a linear line (if features are not polynomial), we will get the final ϴ, separating the positives (h=1) and negatives(h=0) for classification problem.
What is Large Margin?
With Large Margin, we can draw a line that separates the positive points from the negative points. Large Margin guaranties to have a large minimum length of projection from each point to that boundary line.
What is Vector Inner Product?
If we have two vectors: u = [u1 u2] v = [v1 v2]. One way to calculate the inner product is u’v, which is u1v1 + u2*v2.
Another way to calculate the inner product is based on geometry: The normal or (euclidean) length of vector u, ||u|| is sqrt(u1^2 + u2^2). It’s like when we project u1, u2 into axis x (x = u1), and axis y (y = u2), then calculate the Pythagoras theorem. Draw vector v in the axis x and axis y, do a orthogonal projection from v to u and get the length of p (from the origin (0,0) to the orthogonal point), p is signed and could be negative, finally the inner production = p*||u||.
Apply Vector Inner Product to minimise the cost function, to get the Large Margin Decision Boundary
We can rewrite the cost function of SVM in a way that uses Vector Inner Product. In order to minimise the cost function, Vector Inner Product chooses a decision boundary that has the largest margin.
What are Kernels?
We use a kernel in order to develop complex nonlinear classifiers. Without a kernel (sometimes we call it a linear kernel), we can only develop linear classifiers. SVM is about the cost function, whereas Kernel is about the hypothesis function.
How to use Kernels in a hypothesis function?
Without kernel, we would write a polynomial hypothesis function. We can replace the polynomial hypothesis function x’s by f’s. The f’s are the result of kernel. Gaussian Kernel is the most popular kernel that calculates the similarity of two vectors; it returns 1 when two vectors are very similar, and returns 0 when two vectors are very different. Each training example is a n-dimensional vector. We have m vectors in the training set. We can learn parameters θ so that when a vector is similar to certain other vectors, the hypothesis function > 0.
SVM, Logistic Regression, or Neural Network?
It’s not alway obvious to make a choice of the learning algorithm when solving classification problem. Here’s a recommended best practice guide:
Questions
Does it change the hypothesis function of SVM compared to the hypothesis function of Logistic Regression? Why?
It seems - to be verified - that the hypothesis function of SVM becomes: h = 1, when theta-transposex >=0 h = 0, when theta-transposex <0