-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added model_builder as directed in PR #6023 on pymc #64
Conversation
Also, I wonder if we should have the data be passed to |
If that's an option, then you might consider passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several good ideas in here! With more generaizability I could see some of my models/pipelines be compatible.
I particularly like the ID part.
I'm sceptical about pickling though.
Adding tests that show some of the typical use cases would help tremendously. Maybe take some examples from the documentation and implement them using the class?
Maybe this distinction could be made for immutable VS. mutable data? All immutable things go to the constructor, while mutable data are passed to the |
That would certainly make sense, but I think it comes down to whether we want to adhere to some existing standard (e.g. sklearn) or do what would make sense in a vacuum. My recollection is that sklearn model constructors don't really do much except build you an empty shell of a model (so model-building arguments are passed at that point). Expecting different types of data (which I think are already confusing to some users) to be passed at different points of the pipeline seems like it could be non-intuitive to many users. Obviously pymc models are different from sklearn models in many ways, but just something to keep in mind. |
Sklearn passes global parameters to init, I think with the configuration that's implemented here it's the same thing (and then we don't need a dict but can just turn that into kwargs). Also, it feels a bit more consistent to pass data to fit() and predict(), rather than to init and predict(). |
I'm curious about pickling. I'm aware pickling is not the best (or even a good?) alternative to permanently store PyMC models. But I don't know of a suitable alternative. Do you have something in mind? |
Pickling is more of a secondary use-case. The primary use-case is: user has access to model class and just loads the netcdf file. In that case, no pickling is required. However, if, for some reason, someone also wants to ship the model class, pickling is supported for that. No access to the model class is required for that. We can turn it off by default. |
@tomicapretto I'm curious about Bambi which shares similarities (intentionally). Do you think this would be useful for it? |
I liked the way how Keras stored models in HDF5 and I wonder if a similar thing could be done for an Aesara graph. IIRC they encoded not the code, but the import route of custom layers, thereby even supporting custom neural net layers as long as they can be re-imported. This is of course not easy at all, and a deep understanding of Aesara is needed. PyMC-only attributes of the model would have to be included in addition to the Aesara graph. But that's almost trivial compared to serializing arbitrary Aesara graphs to HDF5 or a similar format. |
@5hv5hvnk plesae make sure to run the |
pymc_experimental/model_builder.py
Outdated
Example: | ||
|
||
def _data_setter(self, data : pd.DataFrame): | ||
with self.model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing indent
Saving attrs does not seem to work currently: arviz-devs/arviz#2109 |
Why aren't the tests being run? CC @ricardoV94 @michaelosthege |
Also fixes #81 |
Congrats @5hv5hvnk!!! This was a big one. |
No description provided.