Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix number of features mismatching in continual training (fix #5156) #5157

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add suggestion for the warning
  • Loading branch information
shiyu1994 committed May 30, 2022
commit d719d78d867532267e4c7f28fa7bb6ba6f273eb0
1 change: 1 addition & 0 deletions src/boosting/gbdt_model_text.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -678,6 +678,7 @@ std::vector<double> GBDT::FeatureImportance(int num_iteration, int importance_ty
if (warn_about_feature_number) {
Log::Warning("Only %d features found in dataset for continual training, but at least %d features found in initial model.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add some information here that can help users understand what they should do about this warning? Or is this warning just being raised for the purpose of the unit test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added

Log::Warning("Please check the number of features used in continual training.");

in d719d78.

Do you think it is appropriate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! But I don't think "please check" will be any more informative for users, and I think that providing two separate WARN-level log messages for the same problem might be confusing.

Would you consider this?

Log::Warning(
    "Only %d features found in dataset, but at least %d features found in initial model. "
    "If this is intentional and the features in dataset match those used to train the initial model, no action is required."
    "If not, training continuation may fail or produce bad results."
    "To suppress this warning, provide a dataset with at least %d features"
    static_cast<int>(feature_importances.size()),
    max_feature_index_found + 1,
    max_feature_index_found + 1
)

static_cast<int>(feature_importances.size()), max_feature_index_found + 1);
Log::Warning("Please check the number of features used in continual training.");
}
return feature_importances;
}
Expand Down