### Support Vector Machine (SVM) — An Optimisation Mammoth

### Table of Contents :

**1. ****Motivation****2. ****Solution**

2.*a. **Mathematical Modelling of 1st Condition*

2.b. *Mathematical Modelling of 2nd Condition*

2.c. *Combining the Optimization Problems*

2.d. *The Optimization Trick — which no book explains*

2.e. *Primal2Dual Optimization Conver*sion

2.f. *Solving Dual Optimization Problem** — SMO Algorithm***3. ****KeyNote****4. ****Code Implementation**

**Motivation**

Now all the lines in the above diagram do 0 misclassifications i.e. least misclassifications, meaning all these lines are the best lines. But now how should we choose one out of all these lines?

In Linear Classification we unknowingly randomly choose any one of these but now we will devise a way to choose the best out of all these 0 misclassification lines !!!

**Solution**

**Mathematical Modelling the 1st Condition**

First we need to mathematically represent the condition that the line does least misclassifications…. this can be done via following optimisation problem

**Mathematical Modelling the 2nd Condition**

As visualized above the main solution to the problem is to find a middle line that does 0 misclassification out of all the possible 0 misclassification lines, we will try modelling this condition also as an optimisation problem

**Combining the two optimization problem**

Optimization Equation 1

Hence we need to find a way to solve this mammoth deadly-looking optimization equation !!!

**THE OPTIMIZATION TRICK — which no book explains**

Researchers concluded that the above optimization problem is **UNsolvable** and hence they need to apply some tricks to solve it. Using some tricks they concluded that the above optimization equation is equivalent to solving the following optimization equation

Optimization Equation 2

Now we will see the proof for the above statement i.e. we will see how to convert the above optimization equation 1 to this optimization equation 2, which is rarely mentioned in any book… !!

Now we already know a fact from class 10th mathematics that if we multiply a line equation by a scalar, it does not change. For instance

2x + 4y + 5 = 0 || 5 * (2x + 4y + 5) = 0 || 17 * (2x + 4y + 5) = 0

All these equations are of the same line ….. !!! Hence in the above representation let’s multiply the line equation by 1 / 84 and see what happens

Now researchers thought of using this powerful strategy to change the above ‘mathematical formulation of 2nd condition’ as follows !!!!

Hence the optimization problem statement changes from

#### Solving the Optimization Equation by converting it from PRIMAL to its DUAL form

Above we have ended up setting an equivalence between the following two optimization problems

Optimization Problem 1Optimization Problem 2

Now let’s see how to solve the optimization problem 2 !!

Now this is a constrained non-linear optimization problem and more specifically it is a **convex quadratic programming problem (QPP)**cause all constraints are linear and the objective function is a quadratic convex function. There are several methods to solve this kind of optimization problem but they are computationally inefficient hence we will calculate the **DUAL** of this optimization problem and try to solve the dual.

Please refer to my optimization theory posts to understand how to write dual of a constrained non-linear primal optimization problem. Hence the dual of above primal problem goes as follows ….

#### Solving Dual Optimization Problem

Now the above dual optimization problem is also a constrained non-linear optimization problem and more specifically it is a convex quadratic programming problem (QPP).

Here, a question should arise in your mind, i.e. primal was also a convex QPP and so is dual, then what was the benefit of converting primal to dual? Dual has one resource constraint while the primal had m resource constraints thus making pirmal problem computationally more difficult to solve compared to dual !!

There are many was to solve this kind of convex QPP based dual problem, one of the most famous algorithm to solve is using SMO algorithm. Using this algorithm you will find optimal values of lambdas….

### Advantages over other Classification Algorithms

Works very well in cases where we have a large number of features(columns) because here time complexity of the algorithm does not depend on the no of features that you have in your datasetWorks well if we have less no of datapoints (rows) because we ultimately care only about the support vectors which are very small. Say you have only 1000 data points then only 10 out of 1000 data points turns out to be support vectors.Works very well if you have outliers in your dataset because such outliers will have lambda = 0 i.e. they would have been support vectors and hence would not affect the decision boundary.

### Disadvantages over other Classification Algorithms

SVM doesn’t provide a *posterior probability *which is required for many applicationsInefficient on a large dataset

### Code Implementation

I am still working on the custom implementation, you can find here a simple library-based implementation

Support Vector Machine (SVM) — An Optimisation Mammoth was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

Level Up Coding – Medium