[RFC] remove 'categorical_feature' and 'feature_name' parameters in cv() and train() #6435
Description
Proposal
I'm requesting comment on the following proposal:
- remove keyword argument
categorical_feature
fromcv()
andtrain()
in the R and Python packages - remove keyword argument
feature_name
fromcv()
andtrain()
in the Python package - remove keyword argument
colnames
fromcv()
andtrain()
in the R package
And doing all of these only after the packages issuing deprecation warnings for 2-3 releases.
Summary
Both the R and Python packages expose functions cv()
(for cross-validation) and train()
(for regular entire-dataset training). These functions require a LightGBM Dataset
object.
The Dataset
object holds attributes categorical_features
and feature_names
, and allows setting those via constructor keyword arguments and set_{attr}()
methods.
Despite that, these cv()
and train()
functions also take categorical_features
and feature_names
as keyword arguments.
Python cv()
LightGBM/python-package/lightgbm/engine.py
Lines 569 to 570 in 92a8741
Python train()
LightGBM/python-package/lightgbm/engine.py
Lines 62 to 63 in 92a8741
R-package cv()
Lines 90 to 91 in 92a8741
R-package train()
LightGBM/R-package/R/lgb.train.R
Lines 57 to 58 in 92a8741
These keyword arguments aren't providing any value, in my opinion. Their values are just forwarded along to calls like this:
LightGBM/python-package/lightgbm/engine.py
Lines 738 to 740 in 92a8741
Which at best is redundant with the Dataset
class, and at worst could lead to runtime exceptions (if the Dataset
has already been constructed).
Motivation
Would simplify the library's interface without any loss of functionality.
If this proposal is accepted, the Dataset
class would be the only place that this information is provided to train()
and cv()
.
References
Inspired by this post I noticed on Stack Overflow: https://stackoverflow.com/questions/78383840/in-lightgbm-why-do-the-train-and-the-cv-apis-accept-categorical-feature-argument/78405996#78405996
xgboost
does not expose such arguments in train()
(code link) or cv()
(code link).
These arguments have been part of the API since September 2017: ef77806#diff-9bd633ead0bdfe9540c42a618efd9e559cca16c522ad844a09fcf4ffc7d6e84c.
Activity