NetEmd with large locations 

If we use locations with very large values calling the functions in different orders can give different answers. For example, in theory all should be equivalent (assuming the same distribution):

min_emd of two variance 1 histograms (i.e. calling normalise_hist_variance)) using optimise
net_emd of two variance 1 histograms (i.e. calling normalise_hist_variance)) using optimise

min_emd of two variance 1 histograms (i.e. calling normalise_hist_variance)) using optimiseRonly
net_emd of two variance 1 histograms (i.e. calling normalise_hist_variance)) using optimiseRonly

min_emd of two mean 0 variance 1 histograms (i.e. calling normalise_hist_variance)) using optimise
net_emd of two mean 0 variance 1 histograms (i.e. calling normalise_hist_variance)) using optimise

min_emd of two mean 0  variance 1 histograms (i.e. calling normalise_hist_variance)) using optimiseRonly
net_emd of two mean 0 variance 1 histograms (i.e. calling normalise_hist_variance)) using optimiseRonly

However, because of the floating point problems with the variance normalisation, this can cause issues. Importantly it is not always obvious which of the methods will fail.

There are three solutions:

1.) Leave it as is, in theory the user should not be normalising histograms and this is an edge case caused by big numbers

2.) Mean centre when normalising by variance, it is quite arb. that we fix the mean to the same value anyway.

3.) Change the dhist object to have a mean offset so that we can get the best of both worlds, but it will require a not small adjustment.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetEmd with large locations #121

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development