Support Vector Machine SVM is a linear classifier. We can consider SVM for linearly separable binary sets. The goal is to design a hyperplane (is a subspace whose dimension is one less than that of its ambient space. If a space is 3-dimensional then its hyperplanes are the 2-dimensional planes). The hyperplane classifies all the training vectors in two classes. We can have many possible hyperplanes that are able to classify correctly all the elements in the feature set, but the best choice will be the hyperplane that leaves the Maximum Margin from both classes. With Margins we mean the distance between the hyperplane and the closest elements from the hyperplane.
data(iris)
summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
# library(ggplot2)
# qplot(Petal.Length, Petal.Width, data=iris, color = Species)
We are using the iris dataset with 4 numerical variables and 1 factor which has 3 levels as described above. We can also see that the numerical variables have different ranges, it is a good pratice to normalize the data. We create classification machine learning model that help us to predict the correct species. From the graph above, we can see there is a separation based on the Species, for example setosa species is very far from the other two groups, and between versicolor and virginica there is a small overlap.
With Support Vector Machine SVM we are looking for optimal separating hyperplane between two classes. And to do that SMV maximize the margin around the hyperplane. The point that lie on the boundary ar called Support Vectors, and the middle line is the Seprarating Hyperplane. In situatins where we are not able to obtain a linear separator, the data are projected into a higher dimentional space, so that, data points, can become linearly separable. In this case, we use the the Kernel Trick, using the Gaussian Radial Basis Function.
library(e1071)
mymodel <- svm(Species~., data=iris)
summary(mymodel)
Call:
svm(formula = Species ~ ., data = iris)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
gamma: 0.25
Number of Support Vectors: 51
( 8 22 21 )
Number of Classes: 3
Levels:
setosa versicolor virginica
# Plot two-dimensional projection of the data with highlighting classes and support vectors
# The Species classes are shown in different shadings
plot(mymodel, data=iris,
Petal.Width~Petal.Length,
slice = list(Sepal.Width=3, Sepal.Length=4)) # specify a list of named values for the dimensions held constant
# Confusion Matrix and Missclassification Error
pred <- predict(mymodel, iris)
tab <- table(Predicted = pred, Actual = iris$Species)
tab
Actual
Predicted setosa versicolor virginica
setosa 50 0 0
versicolor 0 48 2
virginica 0 2 48
# Missclassification Rate
1-sum(diag(tab))/sum(tab)
[1] 0.02666667
As we can see from the result above, we use Gaussian Radial Basis Function, cost is the constaint violation. The two-dimensional plot above, is a projection of the data with highlighting classes and support vectors. The Species classes are shown in different shadings. Inside the blue class setosa we have 8 points depicted with a cross, and these are the suppor vectors for setosa. Similarly, we have points depicted with red cross points for versicolor, and green cross points for virginica.
From the Confusion Matrix above, we have only 2 observation missclassified for versicolor, and 2 observation missclassified for virginica. We have also a missclassification rate, of 2.6%. If we try to use SVM with a linear kernel (not shown here), instead of a SVM with a radial kernel, the missclassification rate is a bit higher.
mymodel <- svm(Species~., data=iris,
kernel = "polynomial")
plot(mymodel, data=iris,
Petal.Width~Petal.Length,
slice = list(Sepal.Width=3, Sepal.Length=4))
pred <- predict(mymodel, iris)
tab <- table(Predicted = pred, Actual = iris$Species)
1-sum(diag(tab))/sum(tab)
[1] 0.04666667
If we also try to use a SVM with a polynomial kernel, as we can see from the graph above, the missclassification rate is increased to 4.6%.
We can try to tune the model in order to have better classification rate. Tune is also called hyperparameter optimization, and it helps to select the best model.
# Tuning
set.seed(123)
tmodel <- tune(svm, Species~., data=iris,
ranges = list(epsilon = seq(0,1,0.1), # sequence from 0 to 1 with an icrement of 0.1
cost = 2^(2:7))) # cost captures the cost of constant violatio
# if cost is too high, we have penalty for non-separable points, and the model store too many support vectors
plot(tmodel)
We use epsilon and cost as tune paramentrs. The cost parameter captures the cost of constant violatio. If cost is too high, we have penalty for non-separable points, and as a consequence we have a model that store too many support vectors, leading to overfitting. On the contrary, if cost is too small, we may end up with underfitting.
The value of epsilon defines a margin of tolerance where no penalty is given to errors. In fact, in SVM we can have hard or soft margins, where soft allow observations inside the margins. Soft margin is used when two classes are not linearly separable.
the plot here above gives us the performance evaluation of SMV for the epsilon and cost parameters. Darker regions means better results, and so lower misclassification error. By interpreting this graph we can choose the best model parameters.
mymodel <- tmodel$best.model
summary(mymodel)
Call:
best.tune(method = svm, train.x = Species ~ ., data = iris, ranges = list(epsilon = seq(0,
1, 0.1), cost = 2^(2:7)))
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 8
gamma: 0.25
Number of Support Vectors: 35
( 6 15 14 )
Number of Classes: 3
Levels:
setosa versicolor virginica
plot(mymodel, data=iris,
Petal.Width~Petal.Length,
slice = list(Sepal.Width=3, Sepal.Length=4))
pred <- predict(mymodel, iris)
tab <- table(Predicted = pred, Actual = iris$Species)
1-sum(diag(tab))/sum(tab)
[1] 0.01333333
Fomr the summary above, now we have 35 support vectors: 6 for setosa, 15 for versicolor, and 14 for virginica. The graph here above expain the result obtained with the best model. Looking at the confusion matrix and missclassification error, we can see that only 2 observations are missclassified and the missclassification error is 1.3% which is significant less from what the got earlier.