Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature importance type in saved model file #3220

Merged
merged 11 commits into from
Jul 15, 2020

Conversation

guolinke
Copy link
Collaborator

No description provided.

@StrikerRUS
Copy link
Collaborator

@guolinke Don't you want to expose this parameter to C API? With 3.0 I guess it's good time to break some code (if any). Then wrappers will be able to provide this param directly in function like start/end iterations

def dump_model(self, num_iteration=None, start_iteration=0):
"""Dump Booster to JSON format.
Parameters
----------
num_iteration : int or None, optional (default=None)
Index of the iteration that should be dumped.
If None, if the best iteration exists, it is dumped; otherwise, all iterations are dumped.
If <= 0, all iterations are dumped.
start_iteration : int, optional (default=0)
Start index of the iteration that should be dumped.
Returns
-------
json_repr : dict
JSON format of Booster.
"""

@guolinke
Copy link
Collaborator Author

Thanks @StrikerRUS , I change all related functions.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the R unit tests are failing. To get better logs, you can do this:

Rscript build_r.R
cd R-package/tests/
Rscript testthat.R

If you don't have an R environment available easily let me know and I can run that for you.

R-package/R/lgb.Booster.R Outdated Show resolved Hide resolved
R-package/R/lgb.Booster.R Outdated Show resolved Hide resolved
R-package/R/lgb.Booster.R Outdated Show resolved Hide resolved
src/c_api.cpp Outdated Show resolved Hide resolved
@guolinke
Copy link
Collaborator Author

ping @imatiach-msft for swig fix

@imatiach-msft
Copy link
Contributor

@guolinke this looks great!

Copy link
Contributor

@imatiach-msft imatiach-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@StrikerRUS StrikerRUS force-pushed the feature-importance-type-cli branch from b0bea13 to a275533 Compare July 13, 2020 18:59
@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jul 13, 2020

@guolinke In my commit 4f3803d I
replaced cout type with split type according to the existing terminology:

/*!
* \brief Get model feature importance.
* \param handle Handle of booster
* \param num_iteration Number of iterations for which feature importance is calculated, <= 0 means use all
* \param importance_type Method of importance calculation:
* - 0 for split, result contains numbers of times the feature is used in a model;
* - 1 for gain, result contains total gains of splits which use the feature
* \param[out] out_results Result array with feature importance
* \return 0 when succeed, -1 when failure happens
*/

Log::Fatal("Unknown importance type: only support split=0 and gain=1");

added defines in C API for better user experience and changed in Python package param type from int to string for better usage and to unify with existing codebase:
importance_type : string, optional (default='split')
The type of feature importance to be filled into ``feature_importances_``.
If 'split', result contains numbers of times the feature is used in a model.
If 'gain', result contains total gains of splits which use the feature.

importance_type : string, optional (default="split")
How the importance is calculated.
If "split", result contains numbers of times the feature is used in a model.
If "gain", result contains total gains of splits which use the feature.

@StrikerRUS StrikerRUS force-pushed the feature-importance-type-cli branch from a275533 to 4f3803d Compare July 13, 2020 19:09
Comment on lines +537 to +538
// desc = **Note**: can be used only in CLI version
int saved_feature_importance_type = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guolinke I added CLI-only flag. Should it be marked as [no-save] as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, CLI-only is needed. As for [no-save], I think it is better not marked, as user may want to know which feature importance type is saved in the model file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Agree with you!

@StrikerRUS
Copy link
Collaborator

Fixed #3011.

@guolinke
Copy link
Collaborator Author

Thank you @StrikerRUS for the doc improvement, as always!

@StrikerRUS
Copy link
Collaborator

@jameslamb Is something left for R-package?

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R side looks good to me! It would be good to have an R test for using gain-based feature importance but I can do that in a later PR

@StrikerRUS StrikerRUS merged commit 87d4648 into master Jul 15, 2020
@StrikerRUS
Copy link
Collaborator

@jameslamb Thank you! I think it's better to have a separate PR for R tests.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants