[tabular] Add memory check prior to merging out-of-fold predictions / fitting WE. #4350
Open
Description
On large multiclass datasets such as dionis
with C=355 classes, out-of-memory errors can occur in the following cases:
- Fitting stacker models with N base models.
- Fitting weighted ensembles with N base models.
In both cases this results in X having an extra N*C features, or an extra N*C*4 bytes
per sample S (4 because of use of float32
). This can lead to cases where AutoGluon works fine on lower time limits, but on higher time limits it fits more base models and then gets an out-of-memory error due to too many features in X in the stack layers.
For example, given S=1,000,000 rows of data and C=355 classes, each base model would increase memory usage of X by 1.42 GB.
On a system with 32 GB of memory, if 25 base models were used, this would require 35.5 GB of memory, leading to an out-of-memory error.
- Logic should be added that computes this expected memory usage via
S*C*4 bytes
. - Next, the logic will define N where N is the number of base models allowed for memory safety. (The logic could be something like
N*S*C*4 bytes + X_og_mem_usage < 25% of memory
). - Then, base models can be sorted by validation score, and the top N are used, with the rest discarded.
- The logic could mimic the logic in
AbstractTrainer
wheninfer_limit
is specified.
Metadata
Assignees
Type
Projects
Status
No status