-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] inspectable set-valued domains for distributions #292
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
I have now had time to review and think about this design.
I think it's great, of course still work in progress.
I want to add three things to think about - these are not change requests but thinking points.
- How do we use this from
BaseDistribution
? I was thinking about two attributes -domain
andsupport
. Thesupport
is the locus of non-zero mass/density, domain is the full range. For now,domain
is perhaps redundant, but it will get useful for categorical rv.- a technical point, how would this look like? I would use a property and return a clone each time.
- an open question that I have no good answer for - how do we deal with the a dataframe/array case of distribution? Suppose we have the
Uniform
distribution, it has lower and upper interval. But this can be different for each index. So what issupport
in this case? Some kind of array set? You haveProduct
, but I was wondering whether there should be a more direct support of the array case - building it into the core API rather than by composition, like withBaseDistribution
. - I was also thinking about the Lebesgue decomposition, i.e., continuous part plus discrete part. We need to express both domains, somehow, and make them queriable.
- finally, this is more of a crazy idea: should we decorate all functions with their range and domain? You could think of
getattr(my_distr, "pdf")
returning not a method but an object, likeloc
andiloc
inpandas
orBaseDistribution
. This could have attributes, so doing sth likemy_dist.pdf.domain
andmy_dist.pdf.range
will return a set object. What do you think of it? Maybe it is overengineered, I would not do it right away anyway.
I think using
I don't have an answer for that. What can I suggest is that we start to introducing the domain of a distribution for easy distribution (Normal for example) and then we progress with more challenging one. In this way we can understand what we have to change to serve at best our final scope.
It is an interesting Idea. I'm thinking, probability density function, logarithmic probability function and cumulative density function have aways the same domain right? Therefore, this decorator it is not helpful to define domains for function associate to distributions (perhaps I'm wrong and there is some case where it can be used for that). On the other side, this could be a nice API for the final user :-) Should I finalise the class and then we can work on a branch where we add the domain/support for distribution already coded? What do you think? |
Hm, ok. I suppose that works, incremental design is better than overdesign.
For the full domain, yes (the reals), but supports for the discrete part would be most interesting here, since that is not directly inspectable right now. I agree though that starting simple would be best.
Agreed, let's see whether this works or whether we hit limitations. |
Hey @fkiraly I worked a little on the PR request and I think it is ready for a review. Let me know what do you think. Some remarks:
|
Yes, I agree with developing this off I would suggest, let's make a branch on Also, it would be nice to have a short write-up of your plans for the next steps, perhaps in the issue #244. I'm happy to halp with that or set up a meeting. |
done, there is now a branch called |
Perfect! Should I close this PR and checkout the new branch? I will take sometime to write in #244 what are the plans for the future, such that we can discuss ;-) |
I would say, let's leave this branch and PR as is, as it "freezes" the current state. Perhaps turn it into a draft. Further, I think the tests are failing due to lack of |
ok perfect. This PR is draft now and I will open another PR into the other branch to fix the tests. The tests were passing locally. I should give a look on how you implement the tests in the other modules. |
What were you trying for this? Have you tried |
I implemented just unit tests. I was just running them using pytest directly. Should I use the machinery developed for the |
It's automatically applied by the CI, and you can mimick that by using |
Sorry i really don't get 100% how it works. Do you have an example? I was trying to understand the tests for distribuitions but with not so much success. |
Sure! here is some explanation on the test classes: https://www.sktime.net/en/latest/developer_guide/testing_framework.html#an-illustrative-example Instances of objects are constructed from the parameter settings returned in |
Is it mandatory to use this topology of testing if we are extending from |
Yes, in the sense that |
Last question: there is a way to trigger the tests in the exact same way they are triggered in the workflow? Such that i can check that everything works before opening a Pr |
Yes, running Installing |
Reference Issues/PRs
First version fro implementation of #244
What does this implement/fix? Explain your changes.
The PR implements a first version for set-valued domains for distributions.
What should a reviewer concentrate their feedback on?
Did you add any tests for the change?
There are no tests for the changes. This is a point in the PR checklist. For instance, this is just a. draft PR to discuss a first approach distributions domains.
Any other comments?
As said, this is a draft PR to start a discussion about domains for distributions.
For instance, the most important domains for distributions that
skpro
needs areInterval
)Finite
)Product
)Each of these class implements the methods:
__contains__
which can be used to elegantly check if an element is in the domain of an intervalboundary
which can be used for integrationI found no other method that can be useful, for now. Perhaps, a couple of example could clarify the situation.
About Infinite sets of point. They can be of two kinds:
Which of them do we need? There are examples of distributions which needs these sets?
Finally, I did not implement tags, because of two reasons:
PR checklist
How to: add yourself to the all-contributors file in the
skpro
root directory (not theCONTRIBUTORS.md
). Common badges:code
- fixing a bug, or adding code logic.doc
- writing or improving documentation or docstrings.bug
- reporting or diagnosing a bug (get this pluscode
if you also fixed the bug in the PR).maintenance
- CI, test framework, release.See here for full badge reference
_tags