To estimate our model more precisely we will get some metrics, namely precision, recall, f1-score, and support. These metrics will be obtained for each class and average/total for all classes.

Precision is given by the formula \begin{equation} p = {t_p/(t_p + f_p)} \end{equation} Recall \begin{equation} r = {t_p/(t_p + f_n)} \end{equation}

with $ t_p, f_p, f_n $ being number of true positives, false positives, and false negatives, respectively.

f1-score defines weighted harmonic mean of the precision and recall, 1 means the best score and 0 means the worst one.

Support is the number of occurrences of each class in y_true.

For example,

# Import necessary library and obtain classification report
from sklearn.metrics import classification_report
print(classification_report(result['contraceptive'], result['prediction']))

             precision    recall  f1-score   support

          1       0.56      0.78      0.65       169
          2       0.50      0.20      0.28        76
          3       0.49      0.40      0.44       124

avg / total       0.52      0.53      0.50       369

As you can see, unfortunately, the metrics of the SVM are not very high. So, we should tune the model a little.

Tuning the parameters¶

# Import necessary library
from sklearn.model_selection import GridSearchCV

# Get the best parameters for the model
parameters = {
    'kernel': ['linear', 'rbf'],
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1]
}
gridforest = GridSearchCV(model, parameters, cv=3, n_jobs=-1, verbose=1)
gridforest.fit(X_train, y_train)
gridforest.best_params_

Fitting 3 folds for each of 32 candidates, totalling 96 fits

[Parallel(n_jobs=-1)]: Done  96 out of  96 | elapsed:   14.4s finished

{'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}

The obtained parameters can be passed to the algorithm and after training, we will be able to make a new prediction and compare its result with the previous ones. If you do this, you will see that precision, recall, and f1-score increased which means that algorithm with newly tuned hyperparameters has a higher accuracy of prediction.

metric_b means before tuning {'C': 1, 'gamma': 1, 'kernel': 'rbf'}

metric_a means after tuning {'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}

          precision_b  precision_a  recall_b   recall_a  f1-score_b   f1-score_a   support_b   support_a

      1       0.56        0.70        0.78       0.58       0.65         0.63         169        169
      2       0.50        0.43        0.20       0.30       0.28         0.36          76         76
      3       0.49        0.48        0.40       0.68       0.44         0.56         124        124

avg / total   0.52        0.57        0.53       0.56       0.50         0.55         369        369

Now, let's visualize some results.

Visualization¶

Note, that the target depends on nine features. But we can't visualize a 10-d plot; therefore, we will only plot two dimensions: The results will plot the dependence of the contraceptive method compared to the two dimonesions 1. the women age and 2. the number of children.

Visualization help functions¶

# Create a mesh of points to plot in
def make_meshgrid(x, y, h = .02):
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    return xx, yy

# Plot the decision boundaries for a classifier
def plot_contours(ax, model, xx, yy, **params):
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out

Visualization of the result¶

# Import library for visualization
import matplotlib.pyplot as plt

# Take two defined features
X = contraceptive[['w_age', 'children']]

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=101)
   
# Train the model
model = svm.SVC(C=10, kernel='rbf', gamma=0.01)
model.fit(X_train, y_train)

SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

# Visualize
X0, X1 = X_test['w_age'], X_test['children']
xx, yy = make_meshgrid(X0, X1)
plot_contours(plt, model, xx, yy, cmap=plt.cm.RdYlGn, alpha=0.8)
plt.scatter(X0, X1, c = y_test['contraceptive_method'], cmap=plt.cm.RdYlGn, s=20, edgecolors='k')

# Highlight support vectors
sv_indices = model.support_ 
plt.scatter(X0[sv_indices], X1[sv_indices], color='white', alpha=0.15, s=100)

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xlabel('Age')
plt.ylabel('Children')
plt.title('SVM with RBF kernel')

plt.show()

As you can see, there are a lot of misclassifications and this correlates with the classification report.

Margin vs. misclassification trade-off¶

In SVM the points of different classes are separated by the hyperplane, and this hyperplane must be chosen in such a way that the margin between the classes should be maximal. But if the margin will be thick we will obtain more misclassifications. This phenomenon is called "Margin vs. misclassification trade-off". This trade-off is regulized by the 'C' parameter. If 'C' is high, we will have a thin margin and fewer misclassifications, and the opposite situation if 'C' is low. To explore the trade-off, we will plot the classifications for the different 'C' values.

# Take two defined features
X = contraceptive[['w_age', 'children']]

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=101)

# Regularisation parameter
Cs = [0.1, 1, 10, 100]

# Visualize results for the different values of regularisation parameter
i = 1
plt.subplots_adjust(left=0.1, right=2, bottom=0.1, top=1, wspace=0.4, hspace=0.6)
for c in Cs:
    model = svm.SVC(C=c, kernel='rbf', gamma=0.01)
    model.fit(X_train, y_train)
    title = 'SVM with RBF kernel C=' + str(c)
    plt.subplot(2, 2, i)
    X0, X1 = X_test['w_age'], X_test['children']
    xx, yy = make_meshgrid(X0, X1)
    plot_contours(plt.subplot(2, 2, i), model, xx, yy, cmap=plt.cm.RdYlGn, alpha=0.8)
    plt.subplot(2, 2, i).scatter(X0, X1, c=y_test['contraceptive_method'], cmap=plt.cm.RdYlGn,
                                 s=20, edgecolors='k')
    
    # Highlight support vectors
    sv_indices = model.support_ 
    plt.subplot(2, 2, i).scatter(X0[sv_indices], X1[sv_indices], color='white', alpha=0.15, s=100)
    
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.xlabel('Age')
    plt.ylabel('Children')
    plt.title(title)
    i = i + 1

plt.show()

As we can see from these figures, regularisation parameter has an optimal value which equals to 10 in our case.

Conclusion¶

SVM is a great choice, when you have many features. But ideally you test several alogorithms and pick the best performing choice.

	w_age	w_education	h_education	children	w_religion	w_working	h_occupation	standart_of_living	contraceptive_method
0	24	2	3	3	1	1	2	3	1
1	45	1	3	10	1	1	3	4	1
2	43	2	3	7	1	1	3	4	1
3	42	3	2	9	1	1	3	3	1
4	36	3	3	8	1	1	3	2	1

	w_age	w_education	h_education	children	w_religion	w_working	h_occupation	standart_of_living	contraceptive	prediction
212	26	3	4	3	1	1	3	2	1	3
545	44	4	4	7	1	1	2	4	2	3
236	37	2	3	5	1	0	3	2	1	1
115	37	4	4	1	1	1	1	4	1	1
1163	41	3	4	3	0	1	2	4	1	2

Support Vector Machines (SVM) in Python

Loading and preparing data¶

Train model and make prediction¶