What Is Support Vector Machine

Support Vector Machine using Python
In the previous article, we studied the K-Means Clustering. One thing that I believe is that if we can correlate anything with us or our lives, there are greater chances of understanding the concept. So I will try to explain everything by relating it to humans.

What is a Support Vector Machine?

Support vector machines (SVMs, also supporting vector networks) in machine learning are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Provided a set of training instances, each classified as belonging to one or the other of two groups, a training algorithm SVM generates a template that sells new cases for one or the other group, which renders it a non-probabilistic linear binary classifier.

A model from SVM describes the examples as points in space and is distributed to separate the examples of the different groups through a very broad void. There are then projected new instances in the same area and a range dependent on the portion of the distance in which they fall is predicted.

A hyperplane or a number of hyperplanes in a small or infinite space may be created by the support- computer and can be used for the graduation or reversal or other tasks such as the identification of outliers. The hyperplanes with the largest distance from the closest training data point of any class (so-called functional margin) intuitively achieve a good separation, as the wider the margin, the lower the generalization error of the classification system.

When data are unlisted, supervised education is not available and there is a need for an unsupervised learning approach that attempts to find a natural grouping of data and then map new data to the groups that are formed.

Key Terms
1. Kernel * A kernel refers to a feature that converts the data into a wide space for solving the problem.
* A linear or non-linear kernel function may be used. Kernel methods are a type of pattern analysis algorithm.
* The kernel’s primary role is to accept data as input and convert it into the appropriate output types.
* In statistics, the mapping feature “core” measures the values of two-dominal data in a three-dimensional spatial format and describes them.

2. Regularization * The regularization function is also named the C function in the sklearn library of python, which guides the help vector machine to optimally define each training date it wants to prevent.
* Such a supporting vector machine example will auto-optimization if large numbers of the C parameter are used if all training data points are correctly segregated and classification is collected the hyperplanes margin that is smaller. * To order to obtain very tiny numbers, the algorithm can often consider that the hyperplane is a larger range and certain data points may be misclassified by the hyperplane.

3. Gamma * This tuning function reiterates the length of the effect of a single data display. The low values are ‘far’ and the higher values are ‘near’ to the aircraft.
* In measurement for a separation line, the data points with low gamma are called and are far from the possible hyper-plane separation line.
* In comparison, the high range is used in the measurement of the hyper-plane separation line which applies to the points that are similar to the expected hyper-plane line.

4. Margin * The gap is last but not least. It’s also a significant tuning parameter & an integral function of a vector holder classification system.
* The margin is the division of the line closest to the data points of the segment. In a support vector algorithm, it is necessary to have a good and proper margin. When the difference between the two data groups is greater, a strong gap is called.
* To a strong range, the corresponding data points stay in class and thus do not move over to another level.
* Also, the class data points will have a reasonable margin preferably at the same distance from either side of the separator panel.

5. Hyperplane * A hyperplane is a linear, n-1 dimensional subset of this space, which splits the space into two divided parts in an n-dimensional Euclidean space. * For two dimensions the hyperplane is a separating line.
* For three dimensions a plane with two dimensions divides the 3d space into two parts and thus acts as a hyperplane.
* Thus for a space of n dimensions, we have a hyperplane of n-1 dimensions separating it into two parts.

1. Classification SVM type 1 (also known as C-SVM classification)
2. Classification SVM type 2 (also known as nu-SVM classification)
3. Regression SVM type 1 (also known as epsilon-SVM regression)
4. Regression SVM type 2 (also known as nu-SVM regression)

Types of kernels
1. Linear kernel
2. Polynomial kernel
3. Radial basis function kernel (RBF)/ Gaussian Kernel
4. Sigmoid Kernel
5. Nonlinear Kernel

Advantages/Features of SVM
1. It is really effective in a higher dimension.
2. Effective when the number of features is more than training examples.
3. Best algorithm when classes are separable
4. The hyperplane is affected by only the support vectors thus outliers have less impact.
5. SVM is suited for extreme case binary classification.

Disadvantages/Shortcomings of SVM
1. For the larger dataset, it requires a large amount of time to process.
2. It does not perform well in case of overlapped classes.
3. Selecting, appropriately hyperparameters of the SVM that will allow for sufficient generalization performance.
4. Selecting the appropriate kernel function can be tricky.

Real-World Applications of SVM
1. Face detection SVM classifies parts of the image as a face and non-face and creates a square boundary around the face.

2. Text and hypertext categorization SVMs allow Text and hypertext categorization for both inductive and transductive models. They use training data to classify documents into different categories. It categorizes on the basis of the score generated and then compares with the threshold value.

3. Classification of images Use of SVMs provides better search accuracy for image classification. It provides better accuracy in comparison to the traditional query-based searching techniques.

4. Bioinformatics It includes protein classification and cancer classification. We use SVM for identifying the classification of genes, patients on the basis of genes and other biological problems.

5. Protein fold and remote homology detection Apply SVM algorithms for protein remote homology detection.

6. Handwriting recognition We use SVMs to recognize handwritten characters used widely.

7. Generalized predictive control(GPC) Use SVM based GPC to control chaotic dynamics with useful parameters.

Like in other ML algorithms we find the best fit, in SVM we try to find a hyperplane with the maximum margin or distance, hence it is also a type of “maximum-margin” classification.

Let us try to understand SVM with the help of a mathematical example, for this example,

Data Points Class (-2, 4) -1 (4,1) -1 (1,6) 1 (2,4) 1 (6,2) 1 Now before I go on further I want you to first know:

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for “maximum-margin” classification, most notably for support vector machines (SVMs).

The equation is given as:

where c is the loss function, x the sample, y is the true label, f(x) the predicted label.

You see a plus at the end, it means that the hinge error can never be negative, mathematically I can be expressed as:

The regularizer balances between margin maximization and loss. The regularizer controls the trade-off between achieving a low training error and a low testing error that is the ability to generalize your classifier to unseen data. As a regularizing parameter we choose 1/epochs, so this parameter will decrease, as the number of epochs increases.

The SVM algorithm chooses a particular weight vector, that which gives rise to the “maximum margin” of separation

Now coming back to the math of SVM, generation of the hyperplane we can have 2 scenarios

1. Misclassification, i.e. the point is not classified correctly

2. Correct Classification

In the case of misclassification, we use the following to update the weights:

and in case of correct classification, we use the following to update the weights:

* η is the learning rate
* λ is the regularizer

After doing the calculations, I came up with the following prediction function:

f(x) = (x, (1.56, 3.17))- 11.12

i.e f(x) = 1.56*x + 3.17*y -11.12

* (1.56, 3.17) is the weight vector
* 11.12 is the bias term

Note: I will not be going into depth on how I got this, you can do the calculations by yourself or you can use sklearn to do it for you.

Now let us check the accuracy of the prediction function so calculated

1. -2*1.56 + 4*3. .12

taking the sign out, we get -1, which is the correct class

2. 4*1.56 + 1*3. .12

taking the sign out, we get -1, which is the correct class

3. 1*1.56 + 6*3. .12

taking the sign out, we get +1, which is the correct class

4. 2*1.56 + 4*3. .12

taking the sign out, we get +1, which is the correct class

5. 6*1.56 + 2*3. .12

taking the sign out, we get +1, which is the correct class

So until now, we tested the hyperplane equation on the training data. Now its time to give some never seen before data to the model

Test Data = (3, 5), (-2, 3)

1. 3*1.56 + 5*3. .12

Taking the sign out, we get +1, which is the correct class

2. -2*1.56 + 3*3. .12

Taking the sign out, we get -1, which is the correct class

Python Implementation of SVM
1. Using Functions
Let us now take a look at how can we implement SVM from scratch. In the following example, we will take dummy data. I have taken the code reference from the repository. 1. %matplotlib inline
2. import matplotlib.pyplot as plt
3. from matplotlib import style
4. style.use(‘ggplot’)
5. import numpy as np
6. class SVM(object):
7. def __init__(self,visualization=True):
8. self.visualization = visualization
9. self.colors = {1:’r’,-1:’b’}
10. if self.visualization:
11. self.fig = plt.figure()
12. self.ax = self.fig.add_subplot(1,1,1)
13. def fit(self,data):
14. self.data = data
15. opt_dict = {}
16. transforms = [[1,1],[-1,1],[-1,-1],[1,-1]]
17. all_data = np.array([])
18. for yi in self.data:
19. all_data = np.append(all_data,self.data[yi])
20. self.max_feature_value = max(all_data)
21. self.min_feature_value = min(all_data)
22. all_data = None
23. step_sizes = [self.max_feature_value * 0.1,
24. self.max_feature_value * 0.01,
25. self.max_feature_value * 0.001,]
26. b_range_multiple = 5
27. b_multiple = 5
28. latest_optimum = self.max_feature_value*10
29. for step in step_sizes:
30. w = np.array([latest_optimum,latest_optimum])
31. optimized = False
32. while not optimized:
33. for b in np.arange(-1*self.max_feature_value*b_range_multiple,
34. self.max_feature_value*b_range_multiple,
35. step*b_multiple):
36. for transformation in transforms:
37. w_t = w*transformation
38. found_option = True
39. for i in self.data:
40. for xi in self.data[i]:
41. yi=i
42. if not yi*(np.dot(w_t,xi)+b)>=1:
43. found_option=False
44. if found_option:
45. opt_dict[np.linalg.norm(w_t)]=[w_t,b]
46. if w[0]<<>0:
47. optimized=True
48. print(“optimized a step”)
49. else:
50. w = w-step
51. norms = sorted([n for n in opt_dict])
52. opt_choice = opt_dict[norms[0]]
53. self.w=opt_choice[0]
54. self.b=opt_choice[1]
55. latest_optimum = opt_choice[0][0]+step*2
56. def predict(self,features):
57. classification = np.sign(np.dot(np.array(features),self.w)+self.b)
58. if classification!=0 and self.visualization:
59. self.ax.scatter(features[0],features[1],s=200,marker=’*’,c=self.colors[classification])
60. return (classification,np.dot(np.array(features),self.w)+self.b)
61. def visualize(self):
62. [[self.ax.scatter(x[0],x[1],s=100,c=self.colors[i]) for x in data_dict[i]] for i in data_dict]
63. def hyperplane(x,w,b,v):
64. return (-w[0]*x-b+v)/w[1]
65. hyp_x_min= self.min_feature_value*0.9
66. hyp_x_max = self.max_feature_value*1.1
67. pav1 = hyperplane(hyp_x_min,self.w,self.b,1)
68. pav2 = hyperplane(hyp_x_max,self.w,self.b,1)
69. self.ax.plot([hyp_x_min,hyp_x_max],[pav1,pav2],’k’)
70. nav1 = hyperplane(hyp_x_min,self.w,self.b,-1)
71. nav2 = hyperplane(hyp_x_max,self.w,self.b,-1)
72. self.ax.plot([hyp_x_min,hyp_x_max],[nav1,nav2],’k’)
73. db1 = hyperplane(hyp_x_min,self.w,self.b,0)
74. db2 = hyperplane(hyp_x_max,self.w,self.b,0)
75. self.ax.plot([hyp_x_min,hyp_x_max],[db1,db2],’y–‘)
76. data_dict = {-1:np.array([[1,7],[2,8],[3,8]]),1:np.array([[5,1],[6,-1],[7,3]])}
77. svm.fit(data=data_dict)
78. svm.visualize()

OUTPUT(-1.0, -1. )