Skip to content

Using a custom rbf kernel function for sklearn's SVC is way faster than built-in methodΒ #21410

Open
@NilsWinter

Description

Describe the bug

I've noticed a rather strange behavior when using Scikit-Learn's SVC implementation. Using the built-in rbf kernel with SVC is slower by magnitudes than passing a custom rbf function to SVC().

From what I could see and understand so far, the only difference between the two versions is that in the built-in rbf case, not sklearn but libsvm will compute the kernel. Passing a dedicated kernel function as hyperparameter to SVC() leads to the computation of the kernel inside sklearn, not in libsvm. The results are identical, but the latter case takes only a fraction of the computation time.

I've created a toy dataset that mimics the data I am currently working on. This probably only becomes relevant with larger datasets. There, the discrepancy in computation time likely increases.

Does anyone know why this is happening? Is this the expected behavior?

Steps/Code to Reproduce

import numpy as np
from time import time
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.metrics.pairwise import rbf_kernel
from sklearn.metrics import accuracy_score

# create toy data
n_features = 1000
n_samples = 10000
n_informative = 10
X, y = make_classification(n_samples, n_features, n_informative=n_informative)
gamma = 1 / n_features

# fit SVC with built-in rbf kernel
svc_built_in = SVC(kernel='rbf', gamma=gamma)
np.random.seed(13)
t1 = time()
svc_built_in.fit(X, y)
acc = accuracy_score(y, svc_built_in.predict(X))
print("Fitting SVC with built-in kernel took {:.1f} seconds".format(time()-t1))
print("Accuracy: {}".format(acc))

# fit SVC with custom rbf kernel
svc_custom = SVC(kernel=rbf_kernel, gamma=gamma)
np.random.seed(13)
t1 = time()
svc_custom.fit(X, y)
acc = accuracy_score(y, svc_custom.predict(X))
print("Fitting SVC with a custom kernel took {:.1f} seconds".format(time()-t1))
print("Accuracy: {}".format(acc))

Expected Results

I would have assumed that both versions should run in the same time.

Actual Results

Fitting SVC with built-in kernel took 58.6 seconds
Accuracy: 0.9846
Fitting SVC with a custom kernel took 3.2 seconds
Accuracy: 0.9846

Versions

System:
python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
executable: /home/nwinter/anaconda3/envs/mmll_gists/bin/python
machine: Linux-5.11.0-37-generic-x86_64-with-glibc2.17

Python dependencies:
pip: 21.2.4
setuptools: 58.0.4
sklearn: 1.0
numpy: 1.21.3
scipy: 1.7.1
Cython: None
pandas: 1.3.4
matplotlib: 3.4.3
joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions