Use a sparse matrix for group specific effects #545

hans-ekbrand · 2022-07-12T17:44:52Z

Background: I recently observed large RAM usage on a model, and opened an issue on pymc discourse, and in that thread Tomás Capretto responded:

"On the other hand, Bambi relies on formulae to generate design matrices, which turns out to generate a regular matrix for group-specific effects. In this case, it’s a very large matrix with almost all zeros, so a sparse matrix would have been better. So I think this explains the large memory comsumption. It is something we still need to improve on our end. However, this matrix is not directly used in the PyMC model, we use slicing to select only non-zero values."

Today this issue hit me again, and this time the RAM requirements were absurd:

my_model = bmb.Model("deprived.of.education ~ (sex|country) +    (sex|cluster.id.unique) + per.cent.muslim.in.country*sex*gdp.log + per.cent.hindu.in.country*sex*gdp.log + per.cent.muslim.in.cluster*sex*wealth.at.cluster.level + per.cent.hindu.in.cluster*sex*wealth.at.cluster.level + wealth * sex * religion + urbrur", df, family="bernoulli", dropna=True)

Automatically removing 739811/2772473 rows from the dataset.
Unexpected error while trying to evaluate a Variable. <class 'numpy.core._exceptions._ArrayMemoryError'>

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\hanse\miniconda3\lib\site-packages\bambi\models.py", line 147, in __init__
    self._design = design_matrices(formula, data, na_action, 1, extra_namespace)
  File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\matrices.py", line 523, in design_matrices
    design = DesignMatrices(description, data, env)
  File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\matrices.py", line 54, in __init__
    self.model.eval(data, env)
  File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\terms.py", line 1261, in eval
    term.set_data(encoding)
  File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\terms.py", line 665, in set_data
    self.factor.set_data(True)  # Factor is a categorical term that always spans the intercept
  File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\terms.py", line 468, in set_data
    component.set_data(spans_intercept_)
  File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\variable.py", line 109, in set_data
    self.eval_categoric(self._intermediate_data, spans_intercept)
  File "C:\Users\hanse\miniconda3\lib\site-packages\formulae\terms\variable.py", line 166, in eval_categoric
    value = self.contrast_matrix.matrix[x.codes]
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 857. GiB for an array with shape (2032662, 113158) and data type int32

There are two random slopes in the model, the first grouping factor has 88 levels, and the second grouping factor has 145861 levels, so the model in itself is not very big. I am happy to provide the data if it is useful.

The text was updated successfully, but these errors were encountered:

tomicapretto added the enhancement label Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a sparse matrix for group specific effects #545

Use a sparse matrix for group specific effects #545

hans-ekbrand commented Jul 12, 2022 •

edited

Loading

Use a sparse matrix for group specific effects #545

Use a sparse matrix for group specific effects #545

Comments

hans-ekbrand commented Jul 12, 2022 • edited Loading

hans-ekbrand commented Jul 12, 2022 •

edited

Loading