Labeling restructure #1201

OriolAbril · 2020-05-22T23:25:42Z

Description

Enforcing and maintaining index_origin at label display has proved to be plenty of work and to not give the best of results. I propose to use index_origin at InferenceData creation and then always display coordinate values. I think this will simplify the code significantly without many drawbacks. Moreover, this will make all representations of the data coherent, from xarray html repr to ArviZ plots.

Therefore, the aims of this PR are:

Use coordinate values in az.summary, fixes Use coordinate names in posterior summary() #1091 and fixes arviz.summary() indexes parameters from 0, instead of taking the index described in coordinates #1499
- Required restructuring make_label and xarray_var_iter to prevent cyclic imports

Remove index_origin from stats and plotting functions. I basically affects 2 functions.
- az.summary
- utils.flatten_inference_data_to_dict and to_cds -> do in another PR, probably better to wait until we have sound converters for dict and dataframe? @percygautam @ahartikainen
Use index_origin in default coord generation. Closes integrate index_origin rcParam #984
Think (maybe implement) ways for users to customize generated labels options that seem interesting to have ~~are var_name: coord_val, var_name [coord_val], var_name [dim: coord_val] and the like. Maybe a label_str.format(var_name=var_name, dim=dim, coord_val=coord_val) is enough?~~ Proposal below

Proposal: Labeler class

I propose to use a labeler class with a handful of methods to cover the different labeling tasks. The base task is done with 2 functions, 1st apply the dim+coord labeling function looping over all present dims and 2nd combining the result of this function with the variable name. This could be a single make_label function, but I think a class will be better for extensibility, moreover, a class can also handle labeling when multiple models are present (i.e. plot_forest) and the different needs between plot labels and summary labels.

Checklist

Follows official PR format
New features are properly documented (with an example if appropriate)?
Includes new or updated tests to cover the new feature
Code style correct (follows pylint and black guidelines)
Changes are listed in changelog

ahartikainen · 2020-05-22T23:32:11Z

I'm not sure about this.

I kind of want to use 0-based indexing with pystan, and just have a summary with "correct" index.

Would this force me to use 1-based indexing?

OriolAbril · 2020-05-22T23:59:12Z

Do you only want to see 1 based indexing in summary? or in plots too?

Under the proposal above the coordinate values would be 1-base indexes, to use 0-based indexing you'd have to use isel instead of sel. Otherwise, to me it feels very weird to not be able to use the labels shown in summary to index which is what currently happens with index_origin=1.

ahartikainen · 2020-05-23T05:33:32Z

Yeah, I forget isel. Yeah let's go forward with this

codecov · 2020-05-26T23:41:19Z

Codecov Report

Merging #1201 (423d8a1) into main (29b35ce) will decrease coverage by 0.24%.
The diff coverage is 90.93%.

@@            Coverage Diff             @@
##             main    #1201      +/-   ##
==========================================
- Coverage   90.25%   90.00%   -0.25%     
==========================================
  Files         105      107       +2     
  Lines       11419    11528     +109     
==========================================
+ Hits        10306    10376      +70     
- Misses       1113     1152      +39

Impacted Files	Coverage Δ
arviz/plots/backends/bokeh/compareplot.py	`90.56% <ø> (-0.51%)`	⬇️
arviz/plots/backends/bokeh/distcomparisonplot.py	`100.00% <ø> (ø)`
arviz/plots/backends/matplotlib/compareplot.py	`86.04% <ø> (-0.92%)`	⬇️
arviz/plots/khatplot.py	`94.87% <ø> (ø)`
arviz/plots/plot_utils.py	`86.59% <ø> (-2.78%)`	⬇️
arviz/plots/separationplot.py	`73.68% <ø> (ø)`
arviz/utils.py	`89.25% <ø> (-0.04%)`	⬇️
arviz/plots/backends/matplotlib/loopitplot.py	`86.36% <36.36%> (-12.41%)`	⬇️
arviz/labels.py	`70.37% <70.37%> (ø)`
arviz/sel_utils.py	`88.23% <88.23%> (ø)`
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29b35ce...d64a5fd. Read the comment docs.

OriolAbril · 2020-06-03T21:56:35Z

There are still some from_xyz functions to update, but I think the bulk of the PR is done. I think it is particularly relevant for users of index_origin which would mostly be @ahartikainen and @sethaxen. I think this will force ArviZ.jl to do a couple updates (I can help) and I do not expect too many things to break but it would be great to double check and more importantly to make sure this approach will help better integrate index_origin argument.

Tasks that will be left pending for other PRs are:

update utils.flatten_inference_data_to_dict and to_cds to use coord labels (if possible)
add function to customize label format
testing?
- Not sure if a couple tests on generate_dims_coords with index origin would be enough or if from_xyz should also be tested. any preference/idea @canyon289 ?

canyon289 · 2020-06-07T13:08:46Z

@OriolAbril Can we create issue tickets for the remaining things and merge? Would hate for you to get stuck with a merge conflict as so many files have changed

OriolAbril · 2020-06-07T13:16:51Z

I could be merged, but this introduces quite some changes in labeling and index_origin usage that affect most of the library. I think it would be better to wait until it has been properly reviewed.

Most of the changes are reorganizing imports, so rebasing should be automatic or not too difficult, I don't mind a couple rebases until it gets some reviews

ahartikainen

Hi, added some comments.

Let's discuss how can we handle those comments

arviz/data/base.py

arviz/sel_utils.py

ahartikainen · 2020-06-08T07:47:53Z

arviz/stats/stats.py

@@ -1045,6 +1040,11 @@ def summary(
        Dimensions specification for the variables to be used if the ``fmt`` is ``'xarray'``.
    credible_interval: float, optional
        deprecated: Please see hdi_prob
+    order
+        deprecated: order is now ignored.


Why is order removed?

or is there a way to select order for unwrapped values?

I think that having informative labels, order is not really necessary (also it is not present in any plot and nobody has complained). High dimension and confusing cases should probably better use dim: coord_value format or something like this.

One of the goals here is to make summary and plots use the same labeling and between removing order from summary or adding order to plots I went for removing order from summary.

Ok, this is fine.

At some point, we probably want to have some way to sort the results (plots/summary etc)
For summary basic pandas works, so I think we don't need to implement anything, but for plots maybe we do need

Weren't we talking about returning xarray's of axes (or other backend entities) as a way to do this? TBH, I don't know if it's possible to move around the subplots in matplotlib -- if this can be done, then the xarray solution would work.

canyon289 · 2020-07-29T05:15:50Z

Hi, are there things we can do to help get this one merged? Looks like a lot of work

OriolAbril · 2020-07-29T22:50:19Z

I think the current code is mostly ok and is an improvement over the actual one, not sure it's the way to go though, it still has some clear limitations and it's hard to customize the label formatting.

ahartikainen · 2020-09-23T13:19:31Z

this can wait, right

OriolAbril · 2020-09-23T14:01:58Z

Yes

OriolAbril · 2021-02-18T00:20:47Z

I have rebased the PR and continued the work started here. I proposed to use a Labeller class. The latest commit adds the base class and some subcasses I think will be interesting. I have added some screenshots below, please review.

BaseLabeller (used by default)

DimCoordLabeller

NoRepeatLabeller

MapLabeller

arviz/labels.py

arviz/sel_utils.py

arviz/plots/bpvplot.py

arviz/plots/backends/bokeh/bpvplot.py

OriolAbril · 2021-02-19T00:44:11Z

Updated several plots to see how labeling behaved in different ranges, and while I was at it I added a legend argument to plot_forest. I have set it to true by default, which combined with #1527 leaves this as default:

The rest of the plots work more or less like summary above but with plots, note that the screenshots have the wrong default though, we are now using var_name[coord], var_name[dim: coord] or var_name[dim#idx].

sethaxen · 2021-02-19T21:24:55Z

I think it is particularly relevant for users of index_origin which would mostly be @ahartikainen and @sethaxen. I think this will force ArviZ.jl to do a couple updates (I can help) and I do not expect too many things to break but it would be great to double check and more importantly to make sure this approach will help better integrate index_origin argument.

What kinds of things do you think might need to be updated on the ArviZ.jl side?

OriolAbril · 2021-02-19T21:46:23Z

What kinds of things do you think might need to be updated on the ArviZ.jl side?

There are two changes that I think will affect the Julia side, on the user side it should be backwards compatible in the sense that the noticeable changes will be fixing issues.

index_origin things. Up until this PR, when no coord values were provided, the converters generated automatic integer values starting from 0. Then, plots and stats had an index_origin parameter that added 1 to interger coord values so that even though the coord values started at 0, plot labels would start at 1.

From here on, users can choose what to use when labeling, coords, indexes, nothing at all... and the default is set to always use coord values. index_origin is then enforced at idata creation. Therefore, stan users will set index_origin=1 and will get coordinate values starting from 1. They can then subset their data, chains, draws... with .sel using 1-based indexing or with .isel using 0-based indexing.

To keep showing 1 based indexing in plots, the julia converters should be updated to enforce index_origin at creation.

is about index based labeling. Up until now, the info was simply not available to plotting functions, now it is, which means that xarray_var_iter and xarray_sel_iter return 4 elements now instead of 3.

I don't know if these are used at all on the julia side, but if they are everything will break :/

If you point me to the relevant pieces of code I can help with updating

sethaxen · 2021-02-19T22:15:18Z

index_origin things. Up until this PR, when no coord values were provided, the converters generated automatic integer values starting from 0. Then, plots and stats had an index_origin parameter that added 1 to interger coord values so that even though the coord values started at 0, plot labels would start at 1.

From here on, users can choose what to use when labeling, coords, indexes, nothing at all... and the default is set to always use coord values. index_origin is then enforced at idata creation. Therefore, stan users will set index_origin=1 and will get coordinate values starting from 1. They can then subset their data, chains, draws... with .sel using 1-based indexing or with .isel using 0-based indexing.

To keep showing 1 based indexing in plots, the julia converters should be updated to enforce index_origin at creation.

Nice! Currently ArviZ.jl handles this by setting the data.index_origin rcParam upon loading arviz:
https://github.com/arviz-devs/ArviZ.jl/blob/v0.4.11/src/setup.jl#L55
If the user doesn't supply index_origin upon creation, does it default to the rcParam, or is the rcParam going away?

is about index based labeling. Up until now, the info was simply not available to plotting functions, now it is, which means that xarray_var_iter and xarray_sel_iter return 4 elements now instead of 3.

I don't know if these are used at all on the julia side, but if they are everything will break :/

No, we don't use these on the Julia side, though ArviZPlots currently does. Should be easy to update though; I'll follow how you do that for arviz's plots implementations.

OriolAbril · 2021-02-19T22:32:28Z

If the user doesn't supply index_origin upon creation, does it default to the rcParam, or is the rcParam going away?

The rcParam is kept so users can set the default there and get the right coord values without having to set index_origin every single time they call the converter.

No, we don't use these on the Julia side, though ArviZPlots currently does. Should be easy to update though; I'll follow how you do that for arviz's plots implementations.

They are mostly called to then loop over them so I have been updating the for var_name, sel, values in xarray_... to for var_name, sel, isel, values in xarray_.... Then var_names, sel, isel are passed to the labeller class so it has everything available for labeling.

OriolAbril · 2021-02-20T07:25:58Z

Ready to merge as soon as tests pass!

Adds Labeller base class and some useful subclasses Updated code in summary to use as example of Labeller class capabilities

* Base index_origin and make_label changes * Labeller proposal commit Adds Labeller base class and some useful subclasses Updated code in summary to use as example of Labeller class capabilities * update ess and autocorr plots * update tests * update density and bpv plots * update mpl forestplot * update more plots * update more plots and add some docs * finish rough update of everything * finetuning * fix some tests * more fixes * last fixes? well... until docs * more test fixes * fix summary test * add some documentation * black * update changelog * doc tweaks * update changelog

OriolAbril force-pushed the labels branch from 390fc06 to 6ac1faa Compare May 26, 2020 23:26

OriolAbril force-pushed the labels branch 2 times, most recently from d6a85ff to 9175645 Compare June 3, 2020 21:06

ahartikainen reviewed Jun 8, 2020

View reviewed changes

OriolAbril mentioned this pull request Jul 19, 2020

Add custom methods for InferenceData #1300

Merged

5 tasks

aloctavodia mentioned this pull request Dec 2, 2020

Refactor how variables are added to the model object bambinos/bambi#277

Merged

5 tasks

OriolAbril mentioned this pull request Jan 19, 2021

arviz.summary() indexes parameters from 0, instead of taking the index described in coordinates #1499

Closed

Base automatically changed from master to main January 26, 2021 19:44

OriolAbril force-pushed the labels branch from 0d81701 to 480d99a Compare February 16, 2021 22:06

OriolAbril commented Feb 18, 2021

View reviewed changes

arviz/labels.py Outdated Show resolved Hide resolved

arviz/sel_utils.py Outdated Show resolved Hide resolved

OriolAbril force-pushed the labels branch from bf05f4e to 04acadb Compare February 18, 2021 16:10

OriolAbril commented Feb 18, 2021

View reviewed changes

arviz/plots/bpvplot.py Outdated Show resolved Hide resolved

arviz/plots/backends/bokeh/bpvplot.py Outdated Show resolved Hide resolved

OriolAbril force-pushed the labels branch from 5a20a8e to ae4585a Compare February 19, 2021 00:39

This was referenced Feb 20, 2021

Add tests to arviz.labels module #1574

Closed

Add docstrings to arviz.labels #1575

Open

OriolAbril force-pushed the labels branch from 31c8095 to 2ab531c Compare February 20, 2021 07:25

OriolAbril mentioned this pull request Feb 21, 2021

Add interactive legend to bokeh forestplot #1572

Closed

OriolAbril added 20 commits February 21, 2021 21:21

Base index_origin and make_label changes

bee2c6c

Labeller proposal commit

b7e9b7c

Adds Labeller base class and some useful subclasses Updated code in summary to use as example of Labeller class capabilities

update ess and autocorr plots

e9307f8

update tests

fe28ed5

update density and bpv plots

ebfeba5

update mpl forestplot

2db06a4

update more plots

4ab0018

update more plots and add some docs

57cff40

finish rough update of everything

c5ebc74

finetuning

d44b7ca

fix some tests

f13bb05

more fixes

c1f7bd4

last fixes? well... until docs

aef27fe

more test fixes

0ba50bb

fix summary test

7c36c2a

add some documentation

07eeea0

black

2e54c79

update changelog

feb56f3

doc tweaks

c1dbab5

update changelog

d64a5fd

OriolAbril force-pushed the labels branch from 2ab531c to d64a5fd Compare February 21, 2021 19:25

OriolAbril merged commit 226d141 into main Feb 21, 2021

OriolAbril deleted the labels branch February 21, 2021 22:59

OriolAbril mentioned this pull request Jun 22, 2021

az.summary ignores the parameter coords #1732

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Labeling restructure #1201

Labeling restructure #1201

OriolAbril commented May 22, 2020 •

edited

Loading

ahartikainen commented May 22, 2020

OriolAbril commented May 22, 2020

ahartikainen commented May 23, 2020

codecov bot commented May 26, 2020 •

edited

Loading

OriolAbril commented Jun 3, 2020

canyon289 commented Jun 7, 2020

OriolAbril commented Jun 7, 2020

ahartikainen left a comment

ahartikainen Jun 8, 2020

ahartikainen Jun 8, 2020

OriolAbril Jun 8, 2020

ahartikainen Jun 10, 2020

rpgoldman Jun 14, 2020

canyon289 commented Jul 29, 2020

OriolAbril commented Jul 29, 2020

ahartikainen commented Sep 23, 2020

OriolAbril commented Sep 23, 2020

OriolAbril commented Feb 18, 2021

OriolAbril commented Feb 19, 2021

sethaxen commented Feb 19, 2021

OriolAbril commented Feb 19, 2021

sethaxen commented Feb 19, 2021

OriolAbril commented Feb 19, 2021 •

edited

Loading

OriolAbril commented Feb 20, 2021

Labeling restructure #1201

Labeling restructure #1201

Conversation

OriolAbril commented May 22, 2020 • edited Loading

Description

Proposal: Labeler class

Checklist

ahartikainen commented May 22, 2020

OriolAbril commented May 22, 2020

ahartikainen commented May 23, 2020

codecov bot commented May 26, 2020 • edited Loading

Codecov Report

OriolAbril commented Jun 3, 2020

canyon289 commented Jun 7, 2020

OriolAbril commented Jun 7, 2020

ahartikainen left a comment

Choose a reason for hiding this comment

ahartikainen Jun 8, 2020

Choose a reason for hiding this comment

ahartikainen Jun 8, 2020

Choose a reason for hiding this comment

OriolAbril Jun 8, 2020

Choose a reason for hiding this comment

ahartikainen Jun 10, 2020

Choose a reason for hiding this comment

rpgoldman Jun 14, 2020

Choose a reason for hiding this comment

canyon289 commented Jul 29, 2020

OriolAbril commented Jul 29, 2020

ahartikainen commented Sep 23, 2020

OriolAbril commented Sep 23, 2020

OriolAbril commented Feb 18, 2021

BaseLabeller (used by default)

DimCoordLabeller

NoRepeatLabeller

MapLabeller

OriolAbril commented Feb 19, 2021

sethaxen commented Feb 19, 2021

OriolAbril commented Feb 19, 2021

sethaxen commented Feb 19, 2021

OriolAbril commented Feb 19, 2021 • edited Loading

OriolAbril commented Feb 20, 2021

OriolAbril commented May 22, 2020 •

edited

Loading

codecov bot commented May 26, 2020 •

edited

Loading

OriolAbril commented Feb 19, 2021 •

edited

Loading