Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added model_builder as directed in PR #6023 on pymc #64

Merged
merged 39 commits into from
Sep 13, 2022

Conversation

5hv5hvnk
Copy link
Member

@5hv5hvnk 5hv5hvnk commented Aug 9, 2022

No description provided.

@twiecki
Copy link
Member

twiecki commented Aug 13, 2022

Also, I wonder if we should have the data be passed to fit() rather than __init__() to mimick sklearn's API.

@cluhmann
Copy link
Member

If that's an option, then you might consider passing sampler_config to fit() instead of __init__() as well.

pymc_experimental/model_builder.py Show resolved Hide resolved
pymc_experimental/model_builder.py Outdated Show resolved Hide resolved
pymc_experimental/model_builder.py Outdated Show resolved Hide resolved
Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several good ideas in here! With more generaizability I could see some of my models/pipelines be compatible.

I particularly like the ID part.
I'm sceptical about pickling though.

Adding tests that show some of the typical use cases would help tremendously. Maybe take some examples from the documentation and implement them using the class?

pymc_experimental/model_builder.py Show resolved Hide resolved
pymc_experimental/model_builder.py Show resolved Hide resolved
@michaelosthege
Copy link
Member

Also, I wonder if we should have the data be passed to fit() rather than __init__() to mimick sklearn's API.

Maybe this distinction could be made for immutable VS. mutable data? All immutable things go to the constructor, while mutable data are passed to the fit and predict methods..

@cluhmann
Copy link
Member

That would certainly make sense, but I think it comes down to whether we want to adhere to some existing standard (e.g. sklearn) or do what would make sense in a vacuum. My recollection is that sklearn model constructors don't really do much except build you an empty shell of a model (so model-building arguments are passed at that point). Expecting different types of data (which I think are already confusing to some users) to be passed at different points of the pipeline seems like it could be non-intuitive to many users. Obviously pymc models are different from sklearn models in many ways, but just something to keep in mind.

@twiecki
Copy link
Member

twiecki commented Aug 14, 2022

Sklearn passes global parameters to init, I think with the configuration that's implemented here it's the same thing (and then we don't need a dict but can just turn that into kwargs).

Also, it feels a bit more consistent to pass data to fit() and predict(), rather than to init and predict().

@tomicapretto
Copy link

There are several good ideas in here! With more generaizability I could see some of my models/pipelines be compatible.

I particularly like the ID part. I'm sceptical about pickling though.

Adding tests that show some of the typical use cases would help tremendously. Maybe take some examples from the documentation and implement them using the class?

I'm curious about pickling. I'm aware pickling is not the best (or even a good?) alternative to permanently store PyMC models. But I don't know of a suitable alternative. Do you have something in mind?

@twiecki
Copy link
Member

twiecki commented Aug 15, 2022

Pickling is more of a secondary use-case. The primary use-case is: user has access to model class and just loads the netcdf file. In that case, no pickling is required. However, if, for some reason, someone also wants to ship the model class, pickling is supported for that. No access to the model class is required for that. We can turn it off by default.

@twiecki
Copy link
Member

twiecki commented Aug 15, 2022

@tomicapretto I'm curious about Bambi which shares similarities (intentionally). Do you think this would be useful for it?

@michaelosthege
Copy link
Member

I'm curious about pickling. I'm aware pickling is not the best (or even a good?) alternative to permanently store PyMC models. But I don't know of a suitable alternative. Do you have something in mind?

I liked the way how Keras stored models in HDF5 and I wonder if a similar thing could be done for an Aesara graph. IIRC they encoded not the code, but the import route of custom layers, thereby even supporting custom neural net layers as long as they can be re-imported.
The Aesara equivalent would be the Ops from which the apply nodes and variables are created.

This is of course not easy at all, and a deep understanding of Aesara is needed.

PyMC-only attributes of the model would have to be included in addition to the Aesara graph. But that's almost trivial compared to serializing arbitrary Aesara graphs to HDF5 or a similar format.

@michaelosthege
Copy link
Member

@5hv5hvnk plesae make sure to run the pre-commit locally (pre-commit run --all) and use numpydoc style for the docstrings

@michaelosthege michaelosthege changed the title Added model_builder ad directed in PR #6023 on pymc Added model_builder as directed in PR #6023 on pymc Aug 31, 2022
Example:

def _data_setter(self, data : pd.DataFrame):
with self.model:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing indent

@twiecki
Copy link
Member

twiecki commented Sep 6, 2022

Saving attrs does not seem to work currently: arviz-devs/arviz#2109

@twiecki
Copy link
Member

twiecki commented Sep 9, 2022

Why aren't the tests being run? CC @ricardoV94 @michaelosthege

@5hv5hvnk
Copy link
Member Author

Also fixes #81

pymc_experimental/model_builder.py Show resolved Hide resolved
pymc_experimental/model_builder.py Show resolved Hide resolved
pymc_experimental/model_builder.py Outdated Show resolved Hide resolved
pymc_experimental/model_builder.py Outdated Show resolved Hide resolved
@twiecki twiecki merged commit 549b7fb into pymc-devs:main Sep 13, 2022
@twiecki
Copy link
Member

twiecki commented Sep 13, 2022

Congrats @5hv5hvnk!!! This was a big one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants