Using a custom rbf kernel function for sklearn's SVC is way faster than built-in methodΒ #21410
Description
Describe the bug
I've noticed a rather strange behavior when using Scikit-Learn's SVC implementation. Using the built-in rbf kernel with SVC is slower by magnitudes than passing a custom rbf function to SVC().
From what I could see and understand so far, the only difference between the two versions is that in the built-in rbf case, not sklearn but libsvm will compute the kernel. Passing a dedicated kernel function as hyperparameter to SVC() leads to the computation of the kernel inside sklearn, not in libsvm. The results are identical, but the latter case takes only a fraction of the computation time.
I've created a toy dataset that mimics the data I am currently working on. This probably only becomes relevant with larger datasets. There, the discrepancy in computation time likely increases.
Does anyone know why this is happening? Is this the expected behavior?
Steps/Code to Reproduce
import numpy as np
from time import time
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.metrics.pairwise import rbf_kernel
from sklearn.metrics import accuracy_score
# create toy data
n_features = 1000
n_samples = 10000
n_informative = 10
X, y = make_classification(n_samples, n_features, n_informative=n_informative)
gamma = 1 / n_features
# fit SVC with built-in rbf kernel
svc_built_in = SVC(kernel='rbf', gamma=gamma)
np.random.seed(13)
t1 = time()
svc_built_in.fit(X, y)
acc = accuracy_score(y, svc_built_in.predict(X))
print("Fitting SVC with built-in kernel took {:.1f} seconds".format(time()-t1))
print("Accuracy: {}".format(acc))
# fit SVC with custom rbf kernel
svc_custom = SVC(kernel=rbf_kernel, gamma=gamma)
np.random.seed(13)
t1 = time()
svc_custom.fit(X, y)
acc = accuracy_score(y, svc_custom.predict(X))
print("Fitting SVC with a custom kernel took {:.1f} seconds".format(time()-t1))
print("Accuracy: {}".format(acc))
Expected Results
I would have assumed that both versions should run in the same time.
Actual Results
Fitting SVC with built-in kernel took 58.6 seconds
Accuracy: 0.9846
Fitting SVC with a custom kernel took 3.2 seconds
Accuracy: 0.9846
Versions
System:
python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
executable: /home/nwinter/anaconda3/envs/mmll_gists/bin/python
machine: Linux-5.11.0-37-generic-x86_64-with-glibc2.17
Python dependencies:
pip: 21.2.4
setuptools: 58.0.4
sklearn: 1.0
numpy: 1.21.3
scipy: 1.7.1
Cython: None
pandas: 1.3.4
matplotlib: 3.4.3
joblib: 1.1.0
threadpoolctl: 3.0.0
Built with OpenMP: True