You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background: I recently observed large RAM usage on a model, and opened an issue on pymc discourse, and in that thread Tomás Capretto responded:
"On the other hand, Bambi relies on formulae to generate design matrices, which turns out to generate a regular matrix for group-specific effects. In this case, it’s a very large matrix with almost all zeros, so a sparse matrix would have been better. So I think this explains the large memory comsumption. It is something we still need to improve on our end. However, this matrix is not directly used in the PyMC model, we use slicing to select only non-zero values."
Today this issue hit me again, and this time the RAM requirements were absurd:
my_model = bmb.Model("deprived.of.education ~ (sex|country) + (sex|cluster.id.unique) + per.cent.muslim.in.country*sex*gdp.log + per.cent.hindu.in.country*sex*gdp.log + per.cent.muslim.in.cluster*sex*wealth.at.cluster.level + per.cent.hindu.in.cluster*sex*wealth.at.cluster.level + wealth * sex * religion + urbrur", df, family="bernoulli", dropna=True)
Automatically removing 739811/2772473 rows from the dataset.
Unexpected error while trying to evaluate a Variable. <class 'numpy.core._exceptions._ArrayMemoryError'>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\hanse\miniconda3\lib\site-packages\bambi\models.py", line 147, in __init__
self._design = design_matrices(formula, data, na_action, 1, extra_namespace)
File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\matrices.py", line 523, in design_matrices
design = DesignMatrices(description, data, env)
File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\matrices.py", line 54, in __init__
self.model.eval(data, env)
File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\terms.py", line 1261, in eval
term.set_data(encoding)
File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\terms.py", line 665, in set_data
self.factor.set_data(True) # Factor is a categorical term that always spans the intercept
File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\terms.py", line 468, in set_data
component.set_data(spans_intercept_)
File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\variable.py", line 109, in set_data
self.eval_categoric(self._intermediate_data, spans_intercept)
File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\variable.py", line 166, in eval_categoric
value = self.contrast_matrix.matrix[x.codes]
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 857. GiB for an array with shape (2032662, 113158) and data type int32
There are two random slopes in the model, the first grouping factor has 88 levels, and the second grouping factor has 145861 levels, so the model in itself is not very big. I am happy to provide the data if it is useful.
The text was updated successfully, but these errors were encountered:
Background: I recently observed large RAM usage on a model, and opened an issue on pymc discourse, and in that thread Tomás Capretto responded:
"On the other hand, Bambi relies on formulae to generate design matrices, which turns out to generate a regular matrix for group-specific effects. In this case, it’s a very large matrix with almost all zeros, so a sparse matrix would have been better. So I think this explains the large memory comsumption. It is something we still need to improve on our end. However, this matrix is not directly used in the PyMC model, we use slicing to select only non-zero values."
Today this issue hit me again, and this time the RAM requirements were absurd:
There are two random slopes in the model, the first grouping factor has 88 levels, and the second grouping factor has 145861 levels, so the model in itself is not very big. I am happy to provide the data if it is useful.
The text was updated successfully, but these errors were encountered: