-
-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labeling restructure #1201
Labeling restructure #1201
Conversation
I'm not sure about this. I kind of want to use 0-based indexing with pystan, and just have a summary with "correct" index. Would this force me to use 1-based indexing? |
Do you only want to see 1 based indexing in summary? or in plots too? Under the proposal above the coordinate values would be 1-base indexes, to use 0-based indexing you'd have to use |
Yeah, I forget isel. Yeah let's go forward with this |
Codecov Report
@@ Coverage Diff @@
## main #1201 +/- ##
==========================================
- Coverage 90.25% 90.00% -0.25%
==========================================
Files 105 107 +2
Lines 11419 11528 +109
==========================================
+ Hits 10306 10376 +70
- Misses 1113 1152 +39
Continue to review full report at Codecov.
|
d6a85ff
to
9175645
Compare
There are still some Tasks that will be left pending for other PRs are:
|
@OriolAbril Can we create issue tickets for the remaining things and merge? Would hate for you to get stuck with a merge conflict as so many files have changed |
I could be merged, but this introduces quite some changes in labeling and index_origin usage that affect most of the library. I think it would be better to wait until it has been properly reviewed. Most of the changes are reorganizing imports, so rebasing should be automatic or not too difficult, I don't mind a couple rebases until it gets some reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, added some comments.
Let's discuss how can we handle those comments
@@ -1045,6 +1040,11 @@ def summary( | |||
Dimensions specification for the variables to be used if the ``fmt`` is ``'xarray'``. | |||
credible_interval: float, optional | |||
deprecated: Please see hdi_prob | |||
order | |||
deprecated: order is now ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is order removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is there a way to select order for unwrapped values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having informative labels, order is not really necessary (also it is not present in any plot and nobody has complained). High dimension and confusing cases should probably better use dim: coord_value
format or something like this.
One of the goals here is to make summary and plots use the same labeling and between removing order from summary or adding order to plots I went for removing order from summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is fine.
At some point, we probably want to have some way to sort the results (plots/summary etc)
For summary basic pandas works, so I think we don't need to implement anything, but for plots maybe we do need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weren't we talking about returning xarray's of axes (or other backend entities) as a way to do this? TBH, I don't know if it's possible to move around the subplots in matplotlib -- if this can be done, then the xarray solution would work.
Hi, are there things we can do to help get this one merged? Looks like a lot of work |
I think the current code is mostly ok and is an improvement over the actual one, not sure it's the way to go though, it still has some clear limitations and it's hard to customize the label formatting. |
this can wait, right |
Yes |
I have rebased the PR and continued the work started here. I proposed to use a Labeller class. The latest commit adds the base class and some subcasses I think will be interesting. I have added some screenshots below, please review. BaseLabeller (used by default)DimCoordLabellerNoRepeatLabellerMapLabeller |
Updated several plots to see how labeling behaved in different ranges, and while I was at it I added a legend argument to The rest of the plots work more or less like summary above but with plots, note that the screenshots have the wrong default though, we are now using |
What kinds of things do you think might need to be updated on the ArviZ.jl side? |
There are two changes that I think will affect the Julia side, on the user side it should be backwards compatible in the sense that the noticeable changes will be fixing issues.
From here on, users can choose what to use when labeling, coords, indexes, nothing at all... and the default is set to always use coord values. index_origin is then enforced at idata creation. Therefore, stan users will set index_origin=1 and will get coordinate values starting from 1. They can then subset their data, chains, draws... with To keep showing 1 based indexing in plots, the julia converters should be updated to enforce index_origin at creation.
I don't know if these are used at all on the julia side, but if they are everything will break :/ If you point me to the relevant pieces of code I can help with updating |
Nice! Currently ArviZ.jl handles this by setting the
No, we don't use these on the Julia side, though ArviZPlots currently does. Should be easy to update though; I'll follow how you do that for arviz's plots implementations. |
The rcParam is kept so users can set the default there and get the right coord values without having to set index_origin every single time they call the converter.
They are mostly called to then loop over them so I have been updating the |
Ready to merge as soon as tests pass! |
Adds Labeller base class and some useful subclasses Updated code in summary to use as example of Labeller class capabilities
* Base index_origin and make_label changes * Labeller proposal commit Adds Labeller base class and some useful subclasses Updated code in summary to use as example of Labeller class capabilities * update ess and autocorr plots * update tests * update density and bpv plots * update mpl forestplot * update more plots * update more plots and add some docs * finish rough update of everything * finetuning * fix some tests * more fixes * last fixes? well... until docs * more test fixes * fix summary test * add some documentation * black * update changelog * doc tweaks * update changelog
Description
Enforcing and maintaining
index_origin
at label display has proved to be plenty of work and to not give the best of results. I propose to useindex_origin
at InferenceData creation and then always display coordinate values. I think this will simplify the code significantly without many drawbacks. Moreover, this will make all representations of the data coherent, from xarray html repr to ArviZ plots.Therefore, the aims of this PR are:
az.summary
, fixes Use coordinate names in posterior summary() #1091 and fixes arviz.summary() indexes parameters from 0, instead of taking the index described in coordinates #1499make_label
andxarray_var_iter
to prevent cyclic importsindex_origin
from stats and plotting functions. I basically affects 2 functions.az.summary
utils.flatten_inference_data_to_dict
andto_cds
-> do in another PR, probably better to wait until we have sound converters for dict and dataframe? @percygautam @ahartikainenareProposal belowvar_name: coord_val
,var_name [coord_val]
,var_name [dim: coord_val]
and the like. Maybe alabel_str.format(var_name=var_name, dim=dim, coord_val=coord_val)
is enough?Proposal: Labeler class
I propose to use a labeler class with a handful of methods to cover the different labeling tasks. The base task is done with 2 functions, 1st apply the dim+coord labeling function looping over all present dims and 2nd combining the result of this function with the variable name. This could be a single
make_label
function, but I think a class will be better for extensibility, moreover, a class can also handle labeling when multiple models are present (i.e. plot_forest) and the different needs between plot labels and summary labels.Checklist