Support Vector Machine (SVM) — An Optimisation Mammoth

Table of Contents :

1. Motivation
2. Solution
2.a. Mathematical Modelling of 1st Condition
2.b.
Mathematical Modelling of 2nd Condition
2.c.
Combining the Optimization Problems
2.d.
The Optimization Trick — which no book explains
2.e.
Primal2Dual Optimization Conversion
2.f.
Solving Dual Optimization Problem — SMO Algorithm
3. KeyNote
4. Code Implementation

Motivation

Now all the lines in the above diagram do 0 misclassifications i.e. least misclassifications, meaning all these lines are the best lines. But now how should we choose one out of all these lines?

In Linear Classification we unknowingly randomly choose any one of these but now we will devise a way to choose the best out of all these 0 misclassification lines !!!

Solution

Mathematical Modelling the 1st Condition

First we need to mathematically represent the condition that the line does least misclassifications…. this can be done via following optimisation problem

Mathematical Modelling the 2nd Condition

As visualized above the main solution to the problem is to find a middle line that does 0 misclassification out of all the possible 0 misclassification lines, we will try modelling this condition also as an optimisation problem

Combining the two optimization problem

Optimization Equation 1

Hence we need to find a way to solve this mammoth deadly-looking optimization equation !!!

THE OPTIMIZATION TRICK — which no book explains

Researchers concluded that the above optimization problem is UNsolvable and hence they need to apply some tricks to solve it. Using some tricks they concluded that the above optimization equation is equivalent to solving the following optimization equation

Optimization Equation 2

Now we will see the proof for the above statement i.e. we will see how to convert the above optimization equation 1 to this optimization equation 2, which is rarely mentioned in any book… !!

Now we already know a fact from class 10th mathematics that if we multiply a line equation by a scalar, it does not change. For instance

2x + 4y + 5 = 0 || 5 * (2x + 4y + 5) = 0 || 17 * (2x + 4y + 5) = 0

All these equations are of the same line ….. !!! Hence in the above representation let’s multiply the line equation by 1 / 84 and see what happens

Now researchers thought of using this powerful strategy to change the above ‘mathematical formulation of 2nd condition’ as follows !!!!

Hence the optimization problem statement changes from

Solving the Optimization Equation by converting it from PRIMAL to its DUAL form

Above we have ended up setting an equivalence between the following two optimization problems

Optimization Problem 1Optimization Problem 2

Now let’s see how to solve the optimization problem 2 !!

Now this is a constrained non-linear optimization problem and more specifically it is a convex quadratic programming problem (QPP)cause all constraints are linear and the objective function is a quadratic convex function. There are several methods to solve this kind of optimization problem but they are computationally inefficient hence we will calculate the DUAL of this optimization problem and try to solve the dual.

Please refer to my optimization theory posts to understand how to write dual of a constrained non-linear primal optimization problem. Hence the dual of above primal problem goes as follows ….

Solving Dual Optimization Problem

Now the above dual optimization problem is also a constrained non-linear optimization problem and more specifically it is a convex quadratic programming problem (QPP).

Here, a question should arise in your mind, i.e. primal was also a convex QPP and so is dual, then what was the benefit of converting primal to dual? Dual has one resource constraint while the primal had m resource constraints thus making pirmal problem computationally more difficult to solve compared to dual !!

There are many was to solve this kind of convex QPP based dual problem, one of the most famous algorithm to solve is using SMO algorithm. Using this algorithm you will find optimal values of lambdas….

Advantages over other Classification Algorithms

Works very well in cases where we have a large number of features(columns) because here time complexity of the algorithm does not depend on the no of features that you have in your datasetWorks well if we have less no of datapoints (rows) because we ultimately care only about the support vectors which are very small. Say you have only 1000 data points then only 10 out of 1000 data points turns out to be support vectors.Works very well if you have outliers in your dataset because such outliers will have lambda = 0 i.e. they would have been support vectors and hence would not affect the decision boundary.

Disadvantages over other Classification Algorithms

SVM doesn’t provide a posterior probability which is required for many applicationsInefficient on a large dataset

Code Implementation

I am still working on the custom implementation, you can find here a simple library-based implementation

Tabular-Cross-Sectional-Modelling/modelling/classification/SVM.ipynb at main · khetansarvesh/Tabular-Cross-Sectional-Modelling

Support Vector Machine (SVM) — An Optimisation Mammoth was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.